Linear Classifiers

4. August 2010 22:47 by Seth in Machine Learning  //  Tags: ,   //   Comments (3)

Thinking in Points

Consider this little plot generated by python.

points

There are a series of example points (blue and red) along with three test points (green x). Consider the following question:

If you were to guess the color of 3 new points (1, 2, and 3) what would your guess be?

Instinctively we guess 1 belongs to the red class and 3 belongs to the blue class. 2 seems to be a toss-up. How did we come up with this?

Separators

I am guessing that we instinctively made a fake line through the points and decided that everything to the left of the line has to be blue and everything to the right of the line has to be red.

line

This line separates the data into two distinct classes (red and blue). Notice that in this case (if we were “strict” in our adherence to the separator, we would make some mistakes: even with the example points. Using this separator we would misclassify 3 or four red points as well as 4-ish blue points.

Generalization

Here then is the point. Given the examples, we came up with a line to represent the entire problem of separating blue and red points using their respective coordinates. In this case each example had two features (x and y coordinates) and a single label (red or blue). With these we generalized the problem into a single line which would be used to make all future decisions.

Linearly Separable

What happens if our problem looks like this:

non-sep

Notice that there is no line that will properly separate this data. Our linear model of binary classification kind of breaks down (kind of). Instinctively though, we could theoretically draw a circle around the red points and then say everything inside the circle is red and everything out of it is blue.

Tricks

In this particular case, we can trick it into being linearly separable. Let's do the following: take each (x, y) pair and make it 3d by plotting (x2, y2, x * y). This is what it looks like:

non-sep-3d1

Shifting the same image to the left a little:

non-sep-3d2

In this case we can separate the blue's and red's using a plane (which is also linear). When we squish it back down to 2d, we get a funny linear separator generated from a projection from 2d space to 3d space and then back down:

non-sep-circle

A circle!

In Summary

The main idea of this post is getting us to think about how humans think about separating things (in our case binary classification). If we can project our instincts into math, then we can make algorithms learn (or generalize) from data just like we do. The machine learning library I am building is an attempt to abstract the math behind it. The question now becomes: how can we change our object oriented examples into x and y?

Addendum

I thought it would be nice to include the python code (remebering that these points are randomly generated so they will be slightly different than my diagrams).

from numpy import *
from pylab import *
import mpl_toolkits.mplot3d.axes3d as p3

# Linearly Separable
def exampleGraph():
	X1 = rand(50,2) + .3
	X2 = rand(50,2) - .3
	plot(X1[:,0], X1[:,1], 'ro', X2[:,0], X2[:,1], 'bo')
	show()
	return X1, X2

# Not linearly separable (using squared features)
def exampleGraph2():
	X1 = (rand(100,2) - .5) * 5
	X2 = (rand(50,2) - .5)
	plot(X1[:,0], X1[:,1], 'bo', X2[:,0], X2[:,1], 'ro')
	show()
	# 3d stuff
	a = vstack([X1[:,0] ** 2, X1[:,1] ** 2, X1[:,0] * X1[:,1]]).T
	b = vstack([X2[:,0] ** 2, X2[:,1] ** 2, X2[:,0] * X2[:,1]]).T
	fig = figure()
	ax = p3.Axes3D(fig)
	ax.plot(a[:,0], a[:,1], a[:,2], 'bo')
	ax.plot(b[:,0], b[:,1], b[:,2], 'ro')
	show()
	return X1, X2

Supervised Learning - Classification

12. July 2010 23:17 by Seth in Machine Learning, .NET  //  Tags: ,   //   Comments (9)

Supervised Learning

In supervised learning, the algorithm is given labeled examples in order to come up with an appropriate model that defines the data and can also correctly label future examples correctly (or adequately). Supervised learning can be grouped into the following depending on the actual label type:

  1. Binary Classification (think yes/no)
  2. Multi-class classification (any answer from a finite set)
  3. Rgression (any answer from an infinite set)

In the machine library I am trying to put together, each of the three groups mentioned above can be separated into distinct .NET data types as follows:

  1. Binary Classification (bool)
  2. Multi-Class Classification (enum)
  3. Regression (double, float, int, decimal, long, etc...)

As mentioned in my earlier post (with a minor breaking change), classes (which is how we generally describe our data or examples) can be decorated as follows:

public class Student
{
	[Feature]
	public string Name { get; set; }

	[Feature]
	public Grade Grade { get; set; }

	[Feature]
	public double GPA { get; set; }

	[Feature]
	public int Age { get; set; }

	[Feature]
	public bool Tall { get; set; }

	[Feature]
	public int Friends { get; set; }

	[Label]
	public bool Nice { get; set; }
}

Why the breaking change from Learn to Label? In the machine learning literature, the examples all have features as well as a label. The features are the data that is used to generalize based upon the appropriate label (which turns out to be the answer). Notice in the case above, we are using 6 features to learn a boolean label. In the way its been set up, this would be an example of binary classification.

Binary Classification

In the case of our student class, we are trying to learn whether a particular student is nice or not given their Name, Grade, GPA, Age, Tallness, and number of Friends. Eventually, the library will automatically detect which type of learning it needs to do, but for now, here is how we generate the model:

Student[] students = Student.GetData();

// test point
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Generate(students);

s = predictor.Predict(s);

In essence, we get a bunch of students and spin up a new student on which we will run predictions. The classification algorithm used in this case is the Perceptron algorithm (more on this later). Once the model is generated, we can run a prediction by simply passing in the new student and the predictor fills in the appropriate property. Magic! This is coming from a guy whose magic repretoire only includes making a coin disappear by dropping it on the floor as well as the "I-can-pull-my-finger-off" trick that only amuses my 5 year old. It is actually using some really simple math to find a way to seperate the examples.

Reusing what you've learned

Once you've generated the model, it would be a waste to have to regenerate it for every subsequent run of the program. As such, there is a way to save the model and later reuse it:

var model = new PerceptronModel();
var predictor = model.Generate(students);
predictor.Save(path);
...
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Load(path);
predictor.Predict(s);

As one of my goals is to actually help out in the understanding of these models, the serialized xml also includes some information regarding your data (although it is not needed for the actual algorithm:



  
    
      435.552223888056
      -4.9275362318840576
      -123.6006996501749
      50.744252873563212
      -45.477261369315343
      -62.145927036481758
    
  
  -11.525237381309346
  
  
    
      Friends
      Tall
      Age
      GPA
      Grade
      Name
    
    Nice
  

Notice that in this particular model, the portion with the largest number (435.5522) corresponds to the Friends feature. This means that the number of friends (multiplied by 4, more on this too later) is a strong indicator of niceness.

In Summary

The neatest thing about these things is how creepily acurate they are! Next time, I will try to show exactly what the perceptron (or any linear classifier for that matter) is actually doing. Please drop me a line if you have any questions

What is Machine Learning?

30. June 2010 22:10 by Seth in .NET, Machine Learning  //  Tags: ,   //   Comments (0)

Introduction

I had the priviledge of presenting at CodeStock. It was absolutely great. I was surprised and humbled at the reception of my session regarding Machine Learning. As such, I wanted to do a series of posts regarding what it is I wish to accomplish.

Machine Learning is Hard

Because the stuff is so intriguing, I have spent the last number of years trying to figure the stuff out! I would certainly not classify myself as an expert (by any means), but I think I have a general idea of the field.

Machine learning can be seperated into roughly 3 classifications:

  1. Supervised Learning - learning from labeled examples
  2. Unsupervised Learning - learning from unlabeled examples
  3. Other - Hybrid of the above two, structured prediction, reinforcement learning, etc.

The (perceived) source of difficulty in these three areas is the enormous amount of math, probability, and statistics that is needed in order to begin solving problems via this approach. I remember sitting with my adviser where we both wondered alound: "Why aren't these things used more?" I thought I had an idea that might work. Before we get to that, I though I would share the general idea of machine learning.

A better way

Our standard approach to solving programming problems is to sit down and come up with a series of finite steps that lead to a solution (an algorithm so-to-speak). The idea behind machine learning is to simply provide the machine with data and let it decide how to solve the problem. Although this sounds mildly creepy, the principles behind this type of approach are not at all foreign to us.

In summary, (as stated by my adviser): we are "replacing humans writing code with humans supplying data" and letting the machine decide the best approach.

Skynet (or generalization)

If you are currently fearing for your life because you think we are sowing the seeds of our future destruction, fear not. The main impetus behind machine learning is generalization. In other words, can we find patterns that can be exploited in the data to produce the desired results? Absolutely! Will those patterns always have the ability to produce the desired results? Unfortunately not. It turns out the machine will only be as smart as you!. The machine cannot generalize useful information from unimportant data.

Show us something already!

The second source of difficulty is knowing how to use the data we have to generalize properly. This process is often referred to as feature selection. What pieces of data do we use to create the model? How do we convert these features into the corresponding mathematical representations?

This is were our ideas came in. As a developer, we always represent our data in classes. Always have and always will (well... see F#). Is there an easy way to represent things in classes whereby a machine learning algorithm can automatically make the appropriate conversions to the mathematical representations and also select the appropriate learning algorithm? I think so; take a look:

public class Student
{
	[Feature]
	public string Name { get; set; }

	[Feature]
	public Grade Grade { get; set; }

	[Feature]
	public double GPA { get; set; }

	[Feature]
	public int Age { get; set; }

	[Feature]
	public bool Tall { get; set; }

	[Feature]
	public int Friends { get; set; }

	[Learn]
	public bool Nice { get; set; }
}

The idea here is that we have a student with 6 features (Name, Grade, GPA, Age, Tall, Friends) and want to learn whether the features can predict whether he or she is Nice (target). In an effort to help the machine learn this, we will provide it with a list of students with all properties filled out, and ask it to predict the niceness of future students. This is the representation I have chosen for automatic feature selection. I would love feedback on this representation. The purpose of this representation is to attempt to reduce the friction in using ml algorithms because of the difficulty of feature selection and feature conversions.

Wrapping it up

The current incarnation of this code can be found at http://machine.codeplex.com/. There is also a drop with this very example. In future posts I will get into the general ideas behing supervised and unsupervised learning as well as the particular algorithms I have implemented to test this theory. I would love your feedback.

Does it work?

I got an interesting set of emails (only 3 days after my demo) where there were already some truly novel things being done using this approach to machine learning. Needless to say I was very impressed! Perhaps they will let me use their particular implementations as a case study. In short - YES! This stuff totally works! All I have done is create a new way of interacting with tried and proven machine algorithms.