Supervised Learning - Classification

by Seth 12. July 2010 23:17

Supervised Learning

In supervised learning, the algorithm is given labeled examples in order to come up with an appropriate model that defines the data and can also correctly label future examples correctly (or adequately). Supervised learning can be grouped into the following depending on the actual label type:

  1. Binary Classification (think yes/no)
  2. Multi-class classification (any answer from a finite set)
  3. Rgression (any answer from an infinite set)

In the machine library I am trying to put together, each of the three groups mentioned above can be separated into distinct .NET data types as follows:

  1. Binary Classification (bool)
  2. Multi-Class Classification (enum)
  3. Regression (double, float, int, decimal, long, etc...)

As mentioned in my earlier post (with a minor breaking change), classes (which is how we generally describe our data or examples) can be decorated as follows:

public class Student
{
	[Feature]
	public string Name { get; set; }

	[Feature]
	public Grade Grade { get; set; }

	[Feature]
	public double GPA { get; set; }

	[Feature]
	public int Age { get; set; }

	[Feature]
	public bool Tall { get; set; }

	[Feature]
	public int Friends { get; set; }

	[Label]
	public bool Nice { get; set; }
}

Why the breaking change from Learn to Label? In the machine learning literature, the examples all have features as well as a label. The features are the data that is used to generalize based upon the appropriate label (which turns out to be the answer). Notice in the case above, we are using 6 features to learn a boolean label. In the way its been set up, this would be an example of binary classification.

Binary Classification

In the case of our student class, we are trying to learn whether a particular student is nice or not given their Name, Grade, GPA, Age, Tallness, and number of Friends. Eventually, the library will automatically detect which type of learning it needs to do, but for now, here is how we generate the model:

Student[] students = Student.GetData();

// test point
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Generate(students);

s = predictor.Predict(s);

In essence, we get a bunch of students and spin up a new student on which we will run predictions. The classification algorithm used in this case is the Perceptron algorithm (more on this later). Once the model is generated, we can run a prediction by simply passing in the new student and the predictor fills in the appropriate property. Magic! This is coming from a guy whose magic repretoire only includes making a coin disappear by dropping it on the floor as well as the "I-can-pull-my-finger-off" trick that only amuses my 5 year old. It is actually using some really simple math to find a way to seperate the examples.

Reusing what you've learned

Once you've generated the model, it would be a waste to have to regenerate it for every subsequent run of the program. As such, there is a way to save the model and later reuse it:

var model = new PerceptronModel();
var predictor = model.Generate(students);
predictor.Save(path);
...
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Load(path);
predictor.Predict(s);

As one of my goals is to actually help out in the understanding of these models, the serialized xml also includes some information regarding your data (although it is not needed for the actual algorithm:



  
    
      435.552223888056
      -4.9275362318840576
      -123.6006996501749
      50.744252873563212
      -45.477261369315343
      -62.145927036481758
    
  
  -11.525237381309346
  
  
    
      Friends
      Tall
      Age
      GPA
      Grade
      Name
    
    Nice
  

Notice that in this particular model, the portion with the largest number (435.5522) corresponds to the Friends feature. This means that the number of friends (multiplied by 4, more on this too later) is a strong indicator of niceness.

In Summary

The neatest thing about these things is how creepily acurate they are! Next time, I will try to show exactly what the perceptron (or any linear classifier for that matter) is actually doing. Please drop me a line if you have any questions

Tags: ,

Machine Learning | .NET

Comments

7/19/2010 8:56:23 PM #

Tyler

I heard your show on DNR, and am interested to see where you will take this project.  

Any chance you will be building a SVM as one of the classifiers?  I find this to be the hardest thing to find a good library for in .NET... (libsvm wrappers are ok, but not the best) even though it seems to be the best classifier to use.  Also, while many people easily build their own linear classifiers or perceptron NNs, the required math for an SVM seems really hairy for the non domain expert.  

Tyler | Reply

7/19/2010 10:25:36 PM #

Seth

That's planned for the fall when I'm back in school. I wanted to do SVM Classification and SVM Regression. I wanted to first prove that my feature representation structures/conversions did the right thing (I think they do). So for the summer I want to finish decision trees (including boosting and bagging) and then move on the SVM's in the fall. Any suggestions/help would be much appreciated!

Seth | Reply

7/20/2010 10:28:14 AM #

anonymous

Is there some simple book which describe algorithm behind "nice student classification" example?

anonymous | Reply

7/20/2010 5:51:59 PM #

seth

That is really the problem actually. There is no beginning book on this stuff (as far as I know). I personally use Pattern Recognition and Machine Learning and The Elements of Statistical Learning: Data Mining, Inference, and Prediction. I have a hard time understanding them (hence my desire to actually code the algorithms). I'd love to help though in whatever way I can.

seth | Reply

7/24/2010 11:11:20 PM #

Svart

Hi Seth,
Great post, I first heard about the project on DNR (great show too!)

I'm working on a application about content discovery, where we fetch similar content(from various sources), based on user's taste. I intend to use your library along with some string similarity algorithm.

thanks

Svart | Reply

7/26/2010 8:11:16 PM #

seth

Thanks! Let me know how this turns out.

seth | Reply

8/13/2010 1:02:42 PM #

Robert Friberg

Nice article, thanks...


Good article and clean code!

I've done my share of ML coding using c#, java, perl and python. The 3 latter have some great libraries but .net is missing, imho, a somewhat complete and comprehensible machine learning library.

Have you considered opening up the project for contributions? I have some instance based learners, naive bayes, clusterers and some evolutionary stuff that I would consider sharing.

Anyone else interested in contributing? Contact me on twitter: @robertfriberg

Robert Friberg Sweden | Reply

8/14/2010 10:58:48 AM #

seth

I would love to take contributions! The basic premise behind the library is automatic feature selection using an object oriented approach: in other words, abstract out the learning mechanism and instead provide the representational mechanism.

seth United States | Reply

8/13/2010 1:07:45 PM #

Robert Friberg


Some excellent introductory books for those asking:

"Data Mining: Practical Machine Learning Tools and Techniques"
http://www.cs.waikato.ac.nz/ml/weka/

"Programming Collective Intelligence"
by Toby Segaran

Robert Friberg Sweden | Reply

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading



About the author

356044 My name is Seth Juarez. I currently reside in Salt Lake City and develop web applications for my church.

I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I recently completed my Masters Degree at the University of Utah and am continuing on to a PhD in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently am working on a .NET library meant to simplify the usage of the common machine learning algorithms.

I've been married now for 8 years to a fabulously beautiful girl and have two wonderful daughters and a son.

RecentComments

Comment RSS