Supervised Learning - Classification

by Seth 12. July 2010 23:17

Supervised Learning

In supervised learning, the algorithm is given labeled examples in order to come up with an appropriate model that defines the data and can also correctly label future examples correctly (or adequately). Supervised learning can be grouped into the following depending on the actual label type:

  1. Binary Classification (think yes/no)
  2. Multi-class classification (any answer from a finite set)
  3. Rgression (any answer from an infinite set)

In the machine library I am trying to put together, each of the three groups mentioned above can be separated into distinct .NET data types as follows:

  1. Binary Classification (bool)
  2. Multi-Class Classification (enum)
  3. Regression (double, float, int, decimal, long, etc...)

As mentioned in my earlier post (with a minor breaking change), classes (which is how we generally describe our data or examples) can be decorated as follows:

public class Student
{
	[Feature]
	public string Name { get; set; }

	[Feature]
	public Grade Grade { get; set; }

	[Feature]
	public double GPA { get; set; }

	[Feature]
	public int Age { get; set; }

	[Feature]
	public bool Tall { get; set; }

	[Feature]
	public int Friends { get; set; }

	[Label]
	public bool Nice { get; set; }
}

Why the breaking change from Learn to Label? In the machine learning literature, the examples all have features as well as a label. The features are the data that is used to generalize based upon the appropriate label (which turns out to be the answer). Notice in the case above, we are using 6 features to learn a boolean label. In the way its been set up, this would be an example of binary classification.

Binary Classification

In the case of our student class, we are trying to learn whether a particular student is nice or not given their Name, Grade, GPA, Age, Tallness, and number of Friends. Eventually, the library will automatically detect which type of learning it needs to do, but for now, here is how we generate the model:

Student[] students = Student.GetData();

// test point
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Generate(students);

s = predictor.Predict(s);

In essence, we get a bunch of students and spin up a new student on which we will run predictions. The classification algorithm used in this case is the Perceptron algorithm (more on this later). Once the model is generated, we can run a prediction by simply passing in the new student and the predictor fills in the appropriate property. Magic! This is coming from a guy whose magic repretoire only includes making a coin disappear by dropping it on the floor as well as the "I-can-pull-my-finger-off" trick that only amuses my 5 year old. It is actually using some really simple math to find a way to seperate the examples.

Reusing what you've learned

Once you've generated the model, it would be a waste to have to regenerate it for every subsequent run of the program. As such, there is a way to save the model and later reuse it:

var model = new PerceptronModel();
var predictor = model.Generate(students);
predictor.Save(path);
...
Student s = new Student { Name = "Seth", Age = 30, Friends = 16, GPA = 4.0, Grade = Grade.A, Tall = true };

var model = new PerceptronModel();
var predictor = model.Load(path);
predictor.Predict(s);

As one of my goals is to actually help out in the understanding of these models, the serialized xml also includes some information regarding your data (although it is not needed for the actual algorithm:



  
    
      435.552223888056
      -4.9275362318840576
      -123.6006996501749
      50.744252873563212
      -45.477261369315343
      -62.145927036481758
    
  
  -11.525237381309346
  
  
    
      Friends
      Tall
      Age
      GPA
      Grade
      Name
    
    Nice
  

Notice that in this particular model, the portion with the largest number (435.5522) corresponds to the Friends feature. This means that the number of friends (multiplied by 4, more on this too later) is a strong indicator of niceness.

In Summary

The neatest thing about these things is how creepily acurate they are! Next time, I will try to show exactly what the perceptron (or any linear classifier for that matter) is actually doing. Please drop me a line if you have any questions

Tags: ,

Machine Learning | .NET

What is Machine Learning?

by Seth 30. June 2010 22:10

Introduction

I had the priviledge of presenting at CodeStock. It was absolutely great. I was surprised and humbled at the reception of my session regarding Machine Learning. As such, I wanted to do a series of posts regarding what it is I wish to accomplish.

Machine Learning is Hard

Because the stuff is so intriguing, I have spent the last number of years trying to figure the stuff out! I would certainly not classify myself as an expert (by any means), but I think I have a general idea of the field.

Machine learning can be seperated into roughly 3 classifications:

  1. Supervised Learning - learning from labeled examples
  2. Unsupervised Learning - learning from unlabeled examples
  3. Other - Hybrid of the above two, structured prediction, reinforcement learning, etc.

The (perceived) source of difficulty in these three areas is the enormous amount of math, probability, and statistics that is needed in order to begin solving problems via this approach. I remember sitting with my adviser where we both wondered alound: "Why aren't these things used more?" I thought I had an idea that might work. Before we get to that, I though I would share the general idea of machine learning.

A better way

Our standard approach to solving programming problems is to sit down and come up with a series of finite steps that lead to a solution (an algorithm so-to-speak). The idea behind machine learning is to simply provide the machine with data and let it decide how to solve the problem. Although this sounds mildly creepy, the principles behind this type of approach are not at all foreign to us.

In summary, (as stated by my adviser): we are "replacing humans writing code with humans supplying data" and letting the machine decide the best approach.

Skynet (or generalization)

If you are currently fearing for your life because you think we are sowing the seeds of our future destruction, fear not. The main impetus behind machine learning is generalization. In other words, can we find patterns that can be exploited in the data to produce the desired results? Absolutely! Will those patterns always have the ability to produce the desired results? Unfortunately not. It turns out the machine will only be as smart as you!. The machine cannot generalize useful information from unimportant data.

Show us something already!

The second source of difficulty is knowing how to use the data we have to generalize properly. This process is often referred to as feature selection. What pieces of data do we use to create the model? How do we convert these features into the corresponding mathematical representations?

This is were our ideas came in. As a developer, we always represent our data in classes. Always have and always will (well... see F#). Is there an easy way to represent things in classes whereby a machine learning algorithm can automatically make the appropriate conversions to the mathematical representations and also select the appropriate learning algorithm? I think so; take a look:

public class Student
{
	[Feature]
	public string Name { get; set; }

	[Feature]
	public Grade Grade { get; set; }

	[Feature]
	public double GPA { get; set; }

	[Feature]
	public int Age { get; set; }

	[Feature]
	public bool Tall { get; set; }

	[Feature]
	public int Friends { get; set; }

	[Learn]
	public bool Nice { get; set; }
}

The idea here is that we have a student with 6 features (Name, Grade, GPA, Age, Tall, Friends) and want to learn whether the features can predict whether he or she is Nice (target). In an effort to help the machine learn this, we will provide it with a list of students with all properties filled out, and ask it to predict the niceness of future students. This is the representation I have chosen for automatic feature selection. I would love feedback on this representation. The purpose of this representation is to attempt to reduce the friction in using ml algorithms because of the difficulty of feature selection and feature conversions.

Wrapping it up

The current incarnation of this code can be found at http://machine.codeplex.com/. There is also a drop with this very example. In future posts I will get into the general ideas behing supervised and unsupervised learning as well as the particular algorithms I have implemented to test this theory. I would love your feedback.

Does it work?

I got an interesting set of emails (only 3 days after my demo) where there were already some truly novel things being done using this approach to machine learning. Needless to say I was very impressed! Perhaps they will let me use their particular implementations as a case study. In short - YES! This stuff totally works! All I have done is create a new way of interacting with tried and proven machine algorithms.

Tags: ,

.NET | Machine Learning

Creating Advanced ASP.NET MVC Controls (Part 3, A Scheduler)

by Seth 18. August 2009 20:16

Purpose

This is part 3 of a series going through the process of creating an advanced control for the ASP.NET MVC system. I've decided to create a schedule control that allows a user to schedule and item on a calendar control as well as add some meta-data information to the scheduled date. Together with the debugger we have built, this should not be too difficult

Getting Started

Whenever I start building a new control, I simply go commando-style and write the html, css, and javascript first to make sure everything looks good on that end. This helps me with debugging.

The Markup

First things first: the html. I like to do calendars this way:

<table class="scheduler_month">
	<tr class="scheduler_month_header">
		<th colspan="7">September 2009</th>
	</tr>
	<tr class="scheduler_days_header">
		<td>Sun</td>
		<td>Mon</td>
		<td>Tue</td>
		<td>Wed</td>
		<td>Thu</td>
		<td>Fri</td>
		<td>Sat</td>
	</tr>
	<tr class="scheduler_month_days">
		<td class="scheduler_month_invalid_day"></td>
		<td class="scheduler_month_invalid_day"></td>
		<td class="scheduler_month_day">1</td>
		<td class="scheduler_month_day">2</td>
		<td class="scheduler_month_day">3</td>
		<td class="scheduler_month_day">4</td>
		<td class="scheduler_month_day">5</td>
	</tr>
	<tr class="scheduler_month_days">
		<td class="scheduler_month_day">6</td>
		<td class="scheduler_month_day">7</td>
		<td class="scheduler_month_day">8</td>
		<td class="scheduler_month_day">9</td>
		<td class="scheduler_month_day">10</td>
		<td class="scheduler_month_day">11</td>
		<td class="scheduler_month_day">12</td>
	</tr>
	<tr class="scheduler_month_days">
		<td class="scheduler_month_day">13</td>
		<td class="scheduler_month_day">14</td>
		<td class="scheduler_month_day">15</td>
		<td class="scheduler_month_day">16</td>
		<td class="scheduler_month_day">17</td>
		<td class="scheduler_month_day">18</td>
		<td class="scheduler_month_day">19</td>
	</tr>
	<tr class="scheduler_month_days">
		<td class="scheduler_month_day">20</td>
		<td class="scheduler_month_day">21</td>
		<td class="scheduler_month_day">22</td>
		<td class="scheduler_month_day">23</td>
		<td class="scheduler_month_day">24</td>
		<td class="scheduler_month_day">25</td>
		<td class="scheduler_month_day">26</td>
	</tr>
	<tr class="scheduler_month_days">
		<td class="scheduler_month_day">27</td>
		<td class="scheduler_month_day">28</td>
		<td class="scheduler_month_day">29</td>
		<td class="scheduler_month_day">30</td>
		<td class="scheduler_month_invalid_day"></td>
		<td class="scheduler_month_invalid_day"></td>
		<td class="scheduler_month_invalid_day"></td>
	</tr>
</table>
This comes out looking like:

1stPassCalendar

Styles...

It looks pretty good as a starting point. With a little css we can actually make it look good (Disclaimer: I write programs and thus cannot be trusted with what "looks good"):

.scheduler_month
{
	border: solid 1px #C0C0C0;
	border-collapse: collapse;
}

.scheduler_month_header th
{
	background: #714546;
	color: white;
	height: 20px;
	width: 210px;
	font-size: 12px;
}

.scheduler_days_header td
{
	background: #FFA54C;
	color: white;
	height: 20px;
	width: 30px;
	font-size: 12px;
	text-align: center;
	font-weight: bold;
}

.scheduler_month_invalid_day, .scheduler_month_day
{
	height: 20px;
	width: 30px;
	font-size: 12px;
	text-align: center;
	border: solid 1px #C0C0C0;
}

.scheduler_month_invalid_day
{
	border: none;
	background: #E2E2E2;
}

Here is the outcome:

2stPassCalendar

Functionality?

The goal of the control is to save/edit data for each day and mark the calendar if there is any associated data with the day. In order to do this, we need to have a mini-form to take and display data:

<div id="DayData">
	<div class="label">Date:</div>
	<div id="DayCurrent" class="input"></div>
	<div class="label">Title:</div>
	<div class="input"><input type="text" id="DayTitle" name="DayTitle"/></div>
	<div class="label">Description:</div>
	<div class="input"><textarea id="DayDescription" name="DayDescription"></textarea></div>
	<div class="button">
		<input type="button" value="Cancel" id="DayCancel" name="DayCancel" />
		<input type="button" value="Save" id="DaySave" name="DaySave" />
		<input type="hidden" id="DayId" name="DayId" />
	</div>
</div>

This little number has the visible textboxes where the interaction takes place as well as a hidden field that will allow us to maintain state (DayId). Adding some more styles we end up with:

3rdPassCalendar

Do some work already!

Now for some jQuery magic! We want to display the mini-form, gather data, and persist it (at least on the client side for now). Some JavaScript first:

$('.scheduler_month_day').click(function(event) {
	var id = this.id;

	// proper date object
	var d = convertDate(id);
	
	// put in value (if exists)
	var index = window.Changes.find(function(x) { return x.Id == id; });
	if(index > -1) {
		$('#DayTitle').val(window.Changes[index].Title);
		$('#DayDescription').val(window.Changes[index].Description);
	} else {
		$('#DayTitle').val('');
		$('#DayDescription').val('');
	}

	// set the id to proper cell reference
	$('#DayId').val(this.id);
	$('#DayCurrent').text(d.toDateString());

	// make it look nice when we show it
	if(!$('#DayData').is(':hidden'))
		$('#DayData').fadeOut('fast').hide();

	$('#DayData')
		.css({left: event.clientX + 10, top: event.clientY + 10})
		.fadeIn('slow').show();
});


window.Changes = new Array();
$('#DaySave').click(function() {
	// get values
	var id = $('#DayId').val();
	var title = $('#DayTitle').val();
	var desc = $('#DayDescription').val();
	
	// already in there?
	var index = window.Changes.find(function(x) { return x.Id == id; });

	// do appropriate thing if it already exists
	if(index == -1)
		window.Changes.push({ Id: id, Title: title, Description: desc });
	else
		window.Changes[index] = { Id: id, Title: title, Description: desc };

	$('#' + id).addClass('scheduler_month_day_data');

	// close win
	$('#DayData').fadeOut('fast').hide();

	// make sure everything is ok
	if(window.isDebug)
		_(window.Changes).clear()
			.write('Current Change Set');
});

Explanation

Note that the crux of the code does the following:

  1. Fill in form data (if it exists)
  2. Save or update data (depending on whether or not it exists in the first place)
The magic is on line 31 where there is a global array that maintains all of the changes. This is where the debugger comes in handy. It allows us to visualize what data has been persisted on the client side. On line 8 and 39 there is an interesting function worthy of mentioning. What I have done is "extend" the functionality of the Array object by adding:

Array.prototype.find = function(x) {
    for (var i = 0; i < this.length; i++) {
        if (typeof (x) == 'function' && x(this[i]))
            return i;
        if (this[i] == x)
            return i;
    }
    return -1;
};

The primary job is to figure out if the array has an element in it or not. Either a value or a function can be passed in. Naturally I chose the function parameter since we are dealing with an array of objects. What it does is tell me if the save is an update or an addition. Looking the the control code again, we can see this happening on lines 9 and 42. It decides what to do if the item does not exist by using the DayId (line 18) that we save and later retrieve. Here is a picture of the whole thing (debugger and all):

finalPassCalendar

Where to from here?

So now what? This control is all nice and all, but here are some things left to do:

  1. Make the thing more generic (i.e. it needs to be able to represent any month or number of months)
  2. Get initial data from persistent storage (i.e. auto populate values)
  3. Save changes to persistent storage

A word about the debugger...

I have made some minor changes to the debugger. They revolve mainly around usability. It is also important to note that if you try to visualize deep objects using the debugger, you will stall your browser. It is designed for small-ish things (remember it was like a 2-4 hour thing I made) so if it hangs on you, you've been warned

Code Already

I would love feedback on the code. Are there any ommissions/improvements/rants/raves/etc.? Hope this has been helpful!

Tags: , , ,

Ajax | ASP.NET | JQuery | MVC

About the author

356044 My name is Seth Juarez. I currently reside in Salt Lake City and develop web applications for my church.

I received my Bachelors Degree in Computer Science at UNLV with a Minor in Mathematics. I recently completed my Masters Degree at the University of Utah and am continuing on to a PhD in the field of Computer Science. I currently am interested in Artificial Intelligence specifically in the realm of Machine Learning. I currently am working on a .NET library meant to simplify the usage of the common machine learning algorithms.

I've been married now for 8 years to a fabulously beautiful girl and have two wonderful daughters and a son on the way!

RecentComments

Comment RSS