Tuesday, September 2, 2014

Brute-force: introduction to hacking

In late August of 2014, a large set of celebrity information was hacked, with the most "newsworthy" material being nude or explicit photos. Per this article, the security hole existed in Apple's iCloud (specifically, the Find My iPhone feature) which allowed potential hackers to use "brute-force" attacks to gain entry to user accounts.

So, what is brute-force? Stated simply, if you are trying to open a numerical combination lock with 4 digits (0-9 making up 10 possibilities) and you don't know the code, you can try any combination until it opens: 1111, 2918, 3345, etc
The number of possibilities, by using the concept of permutations, is 10*10*10*10 = 10^4 = 10,000
Meaning that given enough time and finger strength, you WILL break the code in 10,000 tries or less (5000 on average).
Brute-force hacking is the most simple form of hacking there is, and usually takes the longest. Other methodologies may or may not be detailed in the future.

10,000 tries is quite a lot - which is why bike thieves usually use a hammer instead


If this code were a digital password, one could use a computer program or internet script to automatically input the 10,000 different combinations to gain access to the protected content. A computer, being much more powerful and fast than the average typing human, could knock this task out in a few hours (a maximum of 10,000 seconds or about 2 hrs and 45 mins), if we assume 1 second per try. However, per Wikipedia, good "cracking" programs can submit attempted passwords at a rate of 100+ million per second.

Consider most websites which require you to have a password of a minimum of 8 characters, using lowercase (26), uppercase (26), digits (10) and special characters such as % ^ & @ * etc (let's say 15 - it can vary per website). Note that this is assuming the English/Latin language alphabet base. The amount of password combinations for a password of exactly 8 characters is thus:

(26+26+10+15)^8 = 1.2 x 10^15 combinations. Dividing by 100 million, or 1x10^8 =
1.2 x 10^7 seconds to break the combination = 143 days. This number further increases if you have the option of using 9, 10, 11 etc characters. Likewise, if you limit yourself to only 8 lowercase letters and no digits or special symbols, your password will take 35 minutes to crack, given that the program attempts only lowercase letters first. This underscores the need for a "strong password".

So does this mean every password can be hacked given enough time? Well, yes. But, like your normal phone screen lock, trying too many wrong passwords results in the user being locked out from trying again - an important security feature. Unfortunately, this feature was neglected in just ONE Apple application which required a sign-in. So, given a celebrity's AppleID (usernames and email address are not exactly private most of the time), the hackers went to work.

So, what have we learned here?
1. Buy Android
2. Use strong passwords
3. Read my blog

(See - Apple's rebuttal)

Sunday, August 10, 2014

Blood, sweat, tears, and cryptography

AES (Advanced Encryption Standard) is quite a bit more powerful than the methods described in my previous blog post. It is widely used among private companies and governments to encrypt text, passwords, and file contents. There are several other encryption standards, which I won't delve into. This website provides a free tool to code and decode messages using AES, and so do many others.

On a personal note, those close to me will know that the past two years have brought some of the biggest challenges in my life (don't mean to be dramatic, I swear!), and also some fun memorable moments To all who have been there in some way or another - I would like to thank you in the form of an encrypted message. All you have to do is use the links with your initials to find my individualized salute to you. Oh, and you all have to ask for your individualized link & passwords

JH
FP
PN
TS
JV
LH
NT
JG
ISO
VT



Cryptography, part one

In an alternate universe, after the British used careful analysis of social networks (great short article, do read) to narrow in on Paul Revere, the colonial hero did not have much time to deliver his famous message. Knowing he may be caught at any moment, he decided to encrypt his message. That is, to turn words into code

Paul Revere - silversmith, patriot, and amateur code boy


 His first attempt was rather weak - turning all letters to corresponding numbers.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z *
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Figure 1
His message now looked like this

T H E * B R I T I S H * A R E * C O M I N G
20 8 5 27 2 18 9 20 9 19 8 27 1 18 5 27 3 15 13 9 14 7
Figure 2

This "code" could easily be intercepted by anyone who read the numbers.  So he tried again, with a slightly more complex replacement - he split the alphabet into 3, and gave each letter a number and a symbol, going backwards:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z *
9* 8* 7* 6* 5* 4* 3* 2* 1* 9^ 8^ 7^ 6^ 5^ 4^ 3^ 2^ 1^ 9$ 8$ 7$ 6$ 5$ 4$ 3$ 2$ 1$
Figure 3

The message could now be written and passed on more discreetly, but it was still not secure. If the police caught Revere and intercepted his letter, it would be only a short matter of time until they figure out the one-to-one correspondence between letter and code. How could Revere ensure a more random coding and translation? Since he did not have access to modern computing, the answer was matrices.

Matrices can be used as a mathematical basis for cryptology. Using a numerical message, such as in Figure 2, we can use matrix multiplication to "jumble up" numbers to a substantial degree of randomness, making coded messages more difficult to decipher. Besides the message itself, the process requires an encoding matrix, which must be square in shape and invertible (if you are rusty on matrix algebra, don't worry about this part). The larger the square encoding matrix, the more secure the encryption. We will use the following 3x3 matrix below, with message matrix split into columns of 3 for multiplication purposes.

7 2 1
0 3 -1
-3 4 -2
With encoding matrix above, multiply by message matrix below. The message matrix is the original numerical message, [20,8,5,27,2,18...] transposed into columns of 3
20 27  9  19  1  27  13  7
8  2  20  8  18  3  19  27
5 18 9  27  5  15  14  27

The following matrix results: 
161 211 112 176 48 210 143 130
19 -12   51   -3   49   -6   43  54
-38 -109 35 -79  59  -99   9 33

Paul Revere can now write these numbers down on paper, and the code won't be as obvious. For one thing, we are not using 1-27 anymore, and there is no one-to-one correspondence for letters. However, to solve the code, Revere's compatriots will have the key, or the original 3x3 encoding matrix, and multiply its inverse by Revere's transformed new matrix. I won't bother showing these steps - the result, as we've said before, will alert them just in time:

T H E * B R I T I S H * A R E * C O M I N G
20 8 5 27 2 18 9 20 9 19 8 27 1 18 5 27 3 15 13 9 14 7

Monday, July 14, 2014

Find love and success with the help of SQL

This blog post took a while to get off the ground. When it comes to data analysis, the first step is finding valid data, and the next is putting it into a usable format. Afterwards, it's all a walk in the park. I chose to play around with SQL (Structured Query Language, an efficient programming language that is used with databases - organized storage places of large amounts of data) using the data set from the 2010 United States Census.

Why? Mostly because I'm sick and tired of seeing articles such as "Best 20 Cities to Live in Your 20s". So, I decided to do my own version with some basic database wizardry.

Step 1: Finding the data
This was fairly simple using a Google search, as the data is readily available from a government website. However, I could not find it in SQL Server, MySQL, or even PostgreSQL format, and had to settle for an Access database (see here). Also unfortunately, I had to download each state's files separately and load it onto the database. This was manual work, but I had help from a blog post I found with some great instructions.

Step 2: Getting the data into the tool you want
I chose SQL Server 2012. The import wizard made this pretty easy.

Step 3: Manipulate the data and extract useful information. See rest of blog post, but it's pretty much summarized in this picture


Step 4: Profit (one hopes)

Alright, let's get to it. Assume you are a bright-eyed, 22-year old male college graduate looking to relocate for your first real job. You have the following requirements on where you want to live:

- High concentration of Hispanic population, because you love nothing more than a good Cuban sandwich
- A high female-to-male ratio in your age group, because dating is important
- You absolutely MUST live in Texas, because everything is bigger there. Preferably, you wish to live in a city (for our purposes, population > 250,000)

Let's go ahead and crunch that into SQL Server:



As you can see, young women are the most plentiful in Forth Worth, but not by that much - merely a 21:20 ratio. You may have better luck trying another state. However, you don't have to go far to find a heavily Hispanic area, as only Plano is under 25% Hispanic of the major Texas cities. Note that we can only order the results by one criterion, and I chose to order by descending Female-to-Male ratio. A more novel approach would be assigning weights to each category (let's say you care about the opposite gender only 3 on a scale of 10 and about the ethnicity of your neighborhood about 6 on a scale of 10) and computing a total score that more accurately reflects your needs. Unfortunately, the US Census either does not ask or does not make readily available other important social markers which would really be of use. Some examples include median household income, job availability, air & water quality, or perhaps even happiness index.

Databases store any sort of useful data, and SQL helps us retrieve it. This can be anywhere from stock market history to advanced sports statistics.

If you ever need to make a complex life decision, crunching the numbers might not seem sexy, but you never know when it could be helpful

Tuesday, July 1, 2014

An introduction to Linux - what and why?

If you're like me, you may have done some projects on an Amazon's cloud services, Amazon EC2. Amazon's web services are increasing rapidly in popularity, mostly because of the large availability of cheap hosting and computer workstations they offer. Most of these servers that can be rented run on Linux. So that begs the question - what is Linux? What is Unix? Heck, is it Unix or UNIX? Did someone mention Ubuntu? Okay, let's dive in.

UNIX is an operating system. An operating system, put simply, is a software that manages how the computer hardware is managed and interacts with other software. This includes scheduling tasks, resource management, and security features. A common example of this is Microsoft's Windows 8 OS for computers or the iOS system for mobile phones by Apple. While UNIX is almost non-existent in the consumer realm for personal computers, it has many features and applications that have made it widely-used in business computing, especially with servers and mainframes. Later operating systems that were based on UNIX include, among others, Linux.

Linux is open source, so it is free and works on a wide variety of systems, and users around the world can share and modify code for their own purposes, creating a huge community of developers. Compared to the dominant Windows platforms, some claim Linux also has superior performance speed and less proliferation of viruses and other threats. Different flavors of Linux include Ubuntu, Debian, and Red Hat, to name a few... but Ubuntu is the most popular.

Many times you'll be working in the Linux terminal. Refer to the picture below -
Anyone born after 1990 will see this for the first time and think: "shit, now what?"

Linux OS does have a GUI you can work out of, but the terminal remains popular. It works like the Windows command line, but with a much more exhaustive command vocabulary. This is also called "the shell". For my fellow struggling young developers, if this is a lot of information to take in all at once, don't worry. I was so confused by this at first that I thought Unix was the command line language for Linux, and that "bash" was something you did to the keyboard when too many errors come up.

I will explore more of this in a future post, including some popular terminal commands. If you have any comments let me know!

PS - shoutout to a good friend who helped explain some of this all to me earlier this week

Sunday, June 29, 2014

Why Americans don't eat horse meat, and the difference between C# and VB.NET

If you followed the news in 2013, you remember the horse meat scandal that reverberated mostly in Europe - meat labeled as 100% beef in supermarkets, restaurants and fast food eateries was found to contain horse meat, either in traces or in large percentages.

Why as this a big deal? Mislabeling food is a safety issue, and it also goes against basic modern consumer rights. However, the uproar at the time would have us think that horse meat is subpar, unhealthy, or even hazardous. This is of course untrue, as meat from horses is eaten is several parts around the world, and considered a delicacy in some.

Americans and Western Europeans largely view horses as powerful, intelligent animals that are meant to be kept as pets rather than ground up into sausages. But why is that? My father, a champion of common sense, put it as such: historically, cattle and pig are much more docile and easier to herd, are cheaper to maintain, and overall produce more meat than other animals (these are reasons we don't farm horses, rabbits, or grizzly bears). As such, we've become a society that overwhelmingly eats chicken, beef, and pork over other meats.

This brings me to another interesting distinction I had a hard time understanding lately. From many job postings I've explored, it seems that much of the windows-based software shops prefer C# instead of VB.NET. Of course, these are both .NET languages and thus incredibly similar, besides syntactical differences and other minor differences (see this stackoveflow article for more). Being a VB.NET developer, this irked me a bit. This question came up with a hiring manager recently, and he explained another facet of the C# preference very eloquently. We all know that C# evolved from C++ while VB.NET evolved from Visual Basic. Among colleges teaching computer science, C++ has historically been the language of choice, while Visual Basic was often taught in the business school. Therefore, seeing C# on a resume has a connotation of greater technical knowledge.

I can confirm this looking at the curriculum of the great University of Florida, my alma mater. My comp sci friends all took C++ (with a few stragglers who opted for Java), although I myself, as an industrial engineer, had VB on the track. Not many business majors I knew were required to take programming.

So, it would seem historical preference affects us in different ways. Bon apetit, and happy coding.


Saturday, June 28, 2014

Sometimes in life, we need to take a Risk

I'm a Florida boy, through and through. After graduating in late 2012, I settled into a comfortable, well-paying first job in Tampa. After one year, I decided to leave. A few months later, I'm in Boston living on savings (and occasionally a friend's couch). Why? Because I felt I could do more. I felt I needed to do more.

I don't romanticize struggle or hardship, like many young graduates do. Steve Jobs famously lived on the edge of poverty and homelessness before making it big with Apple, and his story serves as an inspiration to many for following their passion in start up ventures. That's not the life for me. But I do romanticize learning.

This blog is dedicated partially as a digital portfolio of my projects, and also for my current thoughts. We begin with the classic board game Risk, which I recreated in Excel VBA - first as a school assignment, but then as an expansive personal project. You could say this was truly my first software project.

Initial board setup


The game is based on shape clicks which connect to macros. Based on previous actions, the program correctly tracks which phase of the game we are on, and what each click triggers

Active gameplay

The attack/defend algorithm works based on the classic dice roll rules in the original Risk, and is quite simple. To make things easier on record keeping, we easily leverage Excel's spreadsheet structure as a primitive database to store records

Hidden worksheet which tabulates player/territory data


I'm constantly updating this project, and currently working on a simplified AI so single-player mode would be possible. If anyone wishes to challenge their friends, or take a look at some of my code, the game is available. Click here to download!


Happy New Year and Don't Blow Yourself Up: Exploring National Injury Data with Python

Every year, hundreds of thousands of Americans go to the Emergency Room for various injuries resulting from common consumer products. Do you...