Getting started with ML.NET in Jupyter Notebooks

For many years .NET developers have been building classic console, desktop or web applications through a stop-modify-recompile-restart cycle. In machine learning and other data-centric tasks this cycle creates so much overhead an delay that it simply doesn’t make sense. Data scientists involved in data analysis, data preparation, and model training prefer a fully interactive environment that allows mixing content, live source code, and program output in a single (web) page that gives immediate feedback when the data or the code changes.

The Jupyter ecosystem provides such an interactive environment, and there’s good news for .NET developers. The list of language kernels supported by Jupyter -including Python, R, Julia, Matlab and many others- has been extended with .NET Core.

The Jupyter Notebook app enables us today to run on-premise interactive machine learning scenarios with ML.NET using C# or F# in a web browser, without bringing software or hardware costs. In this article we try to get you started with developing and running C# machine learning scenarios in Jupyter notebooks. We’re going to assume that you already know the basics of ML.NET.

Installation

Download

There are many ways to install the environment, but you’ll always need these two ingredients:

  • the Jupiter Notebook (a Python program), and
  • dotnet interactive (formerly known as dotnet try), an version of the .NET Core runtime that allows you to run C# and F# as a scripting language.

Here’s probably the easiest way get operational – although leaner procedures may exist:

  1. Install the latest Anaconda:
    Anaconda
  2. Install the latest .NET Core SDK:
    NetSdk

Configure

On the command line, type the following to install dotnet interactive as a global tool:

dotnet tool install -g dotnet-try

Open the Anaconda Prompt and run the following command to register the .NET kernel in Jupyter:

dotnet try jupyter install

Validate

To verify the installation:

  • open the Anaconda3 menu from the start menu,
  • start the Jupyter Notebook app, and
  • press the ‘new’ button in the top right corner to create a new Notebook.

If .NET C# and .NET F# appear in the list of kernels, then you’re ready to go:

NetKernels

You find the most recent installation guide right here.

First steps

Fire it up

Starting a Notebook spawns a language kernel process with two clients

  • a browser with the IDE to edit and run notebooks, and
  • a terminal window displaying the logs of the associated kernel.

Here’ what to expect on your screen:

NotebookAndServer

Hello World

The browser IDE allows you to maintain a list of so-called cells. Each of these that can host documentation (markup or markdown) or source code (in the language of the kernel you selected).

Jupyter Notebooks code cells accept regular C#. Each cell can be individually ran, while the kernel keeps track of the state.

It’s safe to assume that the canonical “Hello World” would look like this:
HelloWorld1

The C# kernel hosts extra functions to control the output of cells, such as the ‘display()’ function:
HelloWorld2

Display() is more than just syntactic sugar for Console.WriteLine(). It can also render HTML, SVG, and Charts. There’s more info on this function right here.

Jupyter Notebook runs a Read-Eval-Print Loop (a.k.a. REPL). Instructions are executed, and expressions are evaluated against the current state that is maintained in and by the kernel. So instead of an instruction to print a string, you can simply type “Hello World” in a cell (without a semicolon). It will we treated as an expression:
HelloWorld3

Loading NuGet packages

Code cells in Jupyter Notebooks can host instructions, expressions, class definitions and functions, but you can also load dll’s from NuGet packages. The #r instruction loads a NuGet package into the kernel:

NuGet

 

#r and using statements do not have to be written at the top of the document – you can just add them whenever you need them in the notebook.

Doing diagrams

One of the NuGet packages that you definitely want to use is XPlot, a data visualization framework written in F#. It allows you create a huge number of chart types delegating the rendering to well-known open source graphing libraries such as Plotly and Google Charts.

Here’s how easy it is to define and render a chart in a C# Jupyter Notebook:

XPlotSample

You’ll find many more examples here.

Running a Canonical Machine Learning Scenario

Over the last couple of months we have been busy creating an elaborated set of machine learning scenario’s in a UWP sample app. In the last couple of weeks we managed to migrate most of these to the Jupyter Notebook, and added some new ones.

Let’s run through one of the new samples. It predicts the quality of white wine based on 11 physicochemical characteristics. The problem is solved as a regression – in our UWP sample app we solved it as a binary and a multiclass classification.

The sample runs a representative scenario of

  • reading the raw data,
  • preparing the data,
  • training the model,
  • calculating and visualizing the quality metrics,
  • calculating and visualizing the feature contributions, and finally
  • predicting.

In this article we won’t dive into the ML.NET details. Here’s how the full scenario looks like in a Jupyter Notebook. :

Regression1

(some helpers were omitted here)Regression2

Regression3

Regression4

Regression5

Regression6

Here’s how the full web page with source and results rendering looks like.

The Jupyter Notebook provides a much more productive environment to create such a scenario than classic .NET application development does. Let’s dive into some of the reasons and take a deeper look into some Jupyter features that make the ML.NET-Jupyter combo attractive to data scientists.

Let’s focus on Data Analysis

Interactive Diagrams

Data analysis requires many types of diagrams, and Jupyter notebooks makes it easy to define and modify these.

Here’s how to import the XPlot NuGet package and render a simple interactive boxplot chart. The rendered diagram highlights the details of the element under the mouse:

InteractiveDiagram

We created a more elaborated example of boxplot analysis right here. Here’s how the resulting diagram looks like:

ComplexBoxPlot

As part of Principal Component Analysis data scientists may want to draw a heat map with the correlation between different features. Here’s how easy it is to create such a chart:

HeatMap

We created a more elaborated example on the well-know Titanic data set right here. This is the resulting diagram:

TitanicHeatMap

These diagrams and many more come out-of-the-box with XPlot.

Object Formatters

Just as developers, data scientists spend most of their time debugging. They continuously need detailed feedback on the work in progress. We already encountered the new display() function that prints the value of an expression. Jupyter notebooks allow you to override the HTML that is printed for a specific class by registering an ObjectFormatter – something that sits between decorating a class with the DebuggerDisplay attribute and writing a custom debugger visualizer.

Let’s write a small –but very useful- example. Here’s how an instance of a ML.NET ConfusionMatrix is displayed by default:
ConfusionMatrix1

That’s pretty confusing, right? [Yes: pun intended!] Let’s fix this.

The few ObjectFormatter examples that we already encountered, were spawning IHtmlContent instances (from ASP.NET Core) that were created and styled through the so-called PocketView API – for which there is no documentation yet. Here’s how to fetch the list of HTML tags that you can create with it, and a small sample on how to apply a style to them:

var pocketViewTagMethods = typeof(PocketViewTags)
    .GetProperties()
    .Select(m => m.Name);
display(pocketViewTagMethods);

var pocketView = table[style: "width: 100%"](tr(td[style:"border: 1px solid black"]("Hello!")));
display(pocketView);

Here’s the result:

PocketView

Here’s how to register a formatter that nicely displays a table with the look-and-feel that a data analyst expects for a (binary!) confusion matrix:

Formatter.Register((df, writer) =>
{
    var rows = new List();

    var cells = new List();
    var n = df.Counts[0][0] + df.Counts[0][1] + df.Counts[1][0] + df.Counts[1][1];
    cells.Add(td[rowspan: 2, colspan: 2, style: "text-align: center; background-color: transparent"]("n = " + n));
    cells.Add(td[colspan: 2, style: "border: 1px solid black; text-align: center; padding: 24px; background-color: lightsteelblue"](b("Predicted")));
    rows.Add(tr[style: "background-color: transparent"](cells));

    cells = new List();
    cells.Add(td[style:"border: 1px solid black; padding: 24px; background-color: #E3EAF3"](b("True")));
    cells.Add(td[style:"border: 1px solid black; padding: 24px; background-color: #E3EAF3"](b("False")));
    rows.Add(tr[style: "background-color: transparent"](cells));

    cells = new List();
    cells.Add(td[rowspan: 2, style:"border: 1px solid black; text-align: center; padding: 24px;  background-color: lightsteelblue"](b("Actual")));
    cells.Add(td[style:"border: 1px solid black; text-align: center; padding: 24px; background-color: #E3EAF3"](b("True")));    
    cells.Add(td[style:"border: 1px solid black; padding: 24px"](df.Counts[0][0]));
    cells.Add(td[style:"border: 1px solid black; padding: 24px"](df.Counts[0][1]));
    rows.Add(tr[style: "background-color: transparent"](cells));

    cells = new List();
    cells.Add(td[style:"border: 1px solid black; text-align: center; padding: 24px; background-color: #E3EAF3"](b("False")));
    cells.Add(td[style:"border: 1px solid black; padding: 24px"](df.Counts[1][0]));
    cells.Add(td[style:"border: 1px solid black; padding: 24px"](df.Counts[1][1]));
    rows.Add(tr(cells));

    var t = table(
        tbody(
            rows));

    writer.Write(t);
}, "text/html");

Here’s how a confusion matrix now looks like – much more intuitive:
ConfusionMatrix2

DataFrame

It’s not always easy to prepare the data for a machine learning pipeline or a diagram, and this is where the new DataFrame API comes in. DataFrame allows you to manipulate tabular in-memory data in a spreadsheet way: you can select, add, and/or filter rows and columns, apply formulas and so on. Here’s how to pull in the NuGet package, and add a custom formatter for the base class. DataFrame is currently only in version 0.2 so you may expect some changes. You may also expect the object formatters to be embedded in future releases:

 

DataFrame1

The DataFrame class knows how to read input data from a CSV:

DataFrame2

In our binary classification sample we use some of the DataFrame methods to replace the “Quality” column holding the taster’s evaluation score (a number from 1 to 10) by a “Label” column with the Boolean indicating whether the wine is good or not (i.e. the score was 6 or higher). Here’s how amazingly easy this is:

var labelCol = trainingData["Quality"].ElementwiseGreaterThanOrEqual(6);
labelCol.SetName("Label");
trainingData.Columns.Add(labelCol);
trainingData.Columns.Remove(trainingData["Quality"]);

Here’s the result (compare the last column with the one in the previous screenshot):

DataFrame3

Since there’s no documentation available yet, we had to dig into the C# source code.

DataFrame is a very promising new .NET API for tabular data manipulation, useful in machine learning and other scenarios.

Sharing is Caring

The source code of our Jupyter notebooks lives here on GitHub (if you like it then you should have put a Star on it). The easiest way to explore the code and the rendered results is via nbViewer.

We also invite you to take a look at these other ML.NET-Jupyter samples from Microsoft and from fellow MVP Alexander Slotte.

Enjoy!

1 thought on “Getting started with ML.NET in Jupyter Notebooks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s