Making Use of Unstructured Data with Form Recognizer

Every business has their “junk” drawer (or the “share” as Bill/Karen likes to call it). Often hidden in this data wasteland are nuggets of informational gold that could revolutionize the way you do business (we hope). Turns out that the most important information is usually hidden in paper forms (by paper I mean digital paper like an image or pdf file). Microsoft recently released a new service (still in preview) called Form Recognizer designed to make short work of the structured data hidden in these gems. I shall endeavor to take you through the simple process here!
Read more →

3 Tips for Debugging Cloud Scale Machine Learning Workloads

Let’s say you built an amazingly wonderful hand-crafted artisanal convolutional neural network that works beautifully on your hard-drive based dataset. You are now ready to take this masterpiece to the cloud to work on a much larger dataset on beefier machines - you are not looking forward to it.

  • “How do I ship this patchwork conda (or venv) where I’ve installed everything the training/inference code needed AS WELL AS everything else I thought I needed along the way?”
  • “I don’t want to waste cloud compute money on things I’m not sure will work on the first try!!”
  • Basically, “I have already done the work and I’m not interested in the yak shaving portion of the job!”

No need to fear, dear reader, this article is designed to help you move your glorious work to the cloud (and beyond) by leveraging your local environment as if it were the cloud itself.

Read more →

Descriptors in numl

As some of you know I have been working on a machine learning library for .NET called numl. The main purpose of the library is to abstract away some of the mundane issues surrounding setting up the learning problem in the first place. Additionally sometimes the math in machine learning seems to be a bit daunting (some of it is indeed daunting) so the library allows you to either get into the math or trust that these things are implemented and run correctly.

In order to facilitate this type of abstraction I came to realize that the best way to bridge this gap was to use constructions that most would have already either used or understood: classes. The learning problem, as I understood it, was taking a set of things and trying to learn a way to predict a particular aspect of these things. The best approach therefore was to allow for an easy way to markup these things (or classes) in order to produce an efficient technique for setting up the learning problem.

Read more →


It was absolutely a blast to be able to present my new machine learning library at CodeMash this year. One of the key goals of the library is to ensure that it is readily accessible to all of its users. Machine learning can often be an intimidating subject with its esoteric terms and complex math. This library is designed to ease the process of feature selection (more on that later) and training. This is obviously a work in progress and any input is welcome (and wanted). If you’d like to get started head on over to the site to learn how to get started using nuML.
Read more →

Linear Classifiers

Thinking in Points

Consider this little plot generated by python.


Read more →