Monthly Archives: August 2019

Machine Learning with ML.NET in UWP: Automated Learning

In this article we take the ML.NET automated machine learning API for a spin to demonstrate how it can be used in a C# UWP app for discovering, training, and fine-tuning the most appropriate prediction model for a specific machine learning use case.

The Automated Machine Learning feature of ML.NET was announced at Build 2019 as a framework that automatically iterates over a set of algorithms and hyperparameters to select and create a prediction model. You only have to provide

  • the problem type (binary classification, multiclass classification, or regression),
  • the quality metric to optimize (accuracy, log loss, area under the curve, …), and
  • a dataset with training data.

The ML.NET automated machine learning functionality is exposed as

This article focuses on the automated ML API, we’ll refer to it with its nickname ‘AutoML’. In our UWP sample app we tried to implement a more or less realistic scenario for this feature. Here’s the corresponding XAML page from that app. It shows the results of a so-called experiment:


Stating the problem

In the sample app we reused the white wine dataset from our binary classification sample. Its raw data contains the values for 11 physicochemical  characteristics -the ‘features’- of white wines together with an appreciation score from 0 to 10 – the ‘label’:


We’ll rely on AutoML to build us a model that uses these physicochemical features to predict the score label. We’ll treat it as a multiclass classification problem where each distinct score value is considered a category.

Creating the DataView

AutoML requires you to provide an IDataView instance with the training data, and optionally one with test data. If the latter is not provided, it will split the training data itself. For the training data, a TextLoader on a .csv file would do the job: by default AutoML will use all non-label fields as feature, and create a pipeline with the necessary components to fill out missing values and transform everything to numeric fields. In a real world scenario you would want to programmatically perform some of these tasks yourself – overruling the defaults. That’s what we did in the sample app.

We used the LoadFromTextFile<T>() method to read the data into a new pipeline, so we needed a data structure to describe the incoming data with LoadColumn and ColumnName attributes:

public class AutomationData
    [LoadColumn(0), ColumnName("OriginalFixedAcidity")]
    public float FixedAcidity;

    public float VolatileAcidity;

    public float CitricAcid;

    public float ResidualSugar;

    public float Chlorides;

    public float FreeSulfurDioxide;

    public float TotalSulfurDioxide;

    public float Density;

    public float Ph;

    public float Sulphates;

    public float Alcohol;

    public float Label;

We added a ReplaceMissingValues transformation on the FixedAcidity field to keep control over the ReplacementMode and the column names, and then removed the original column with a DropColumns transformation.

Here’s the pipeline that we used in the sample app to manipulate the raw data:

// Pipeline
IEstimator<ITransformer> pipeline =
        outputColumnName: "FixedAcidity",
        inputColumnName: "OriginalFixedAcidity",
        replacementMode: MissingValueReplacingEstimator.ReplacementMode.Mean)
    // No need to add this, it will be done automatically.
    //    new[]
    //    {
    //        "FixedAcidity",
    //        "VolatileAcidity",
    //        "CitricAcid",
    //        "ResidualSugar",
    //        "Chlorides",
    //        "FreeSulfurDioxide",
    //        "TotalSulfurDioxide",
    //        "Density",
    //        "Ph",
    //        "Sulphates",
    //        "Alcohol"}));

A model is created from this pipeline using the Fit() method, and the Transform() call creates the IDataView that provides the training data to the experiment:

// Training data
var trainingData = MLContext.Data.LoadFromTextFile<AutomationData>(
        path: trainingDataPath,
        separatorChar: ';',
        hasHeader: true);
ITransformer model = pipeline.Fit(trainingData);
_trainingDataView = model.Transform(trainingData);
_trainingDataView = MLContext.Data.Cache(_trainingDataView);

// Check the content on a breakpoint:
var sneakPeek = _trainingDataView.Preview();

Here’s the result of the Preview() call that allows to peek at the contents of the data view:


Keep in mind that AutoML only sees this resulting data view and has no knowledge of the pipeline that created it. It will for example struggle with data views that have duplicate column names – quite common in ML.NET pipelines.

Round 1: Algorithm Selection

Defining the experiment

In the first round of our scenario, we’ll run an AutoML experiment to find one or two candidate algorithms that we would like to explore further. Every experiment category (binary classification, multiclass classification, and regression) comes with its own ExperimentSettings class where you specify things like

  • a maximum duration for the whole experiment (AutoML will complete the test that’s running at the deadline),
  • the metric to optimize for (metrics depend on the category), and
  • the algorithms to use (by default all algorithms of the category are included in the experiment).

The experiment is then instantiated with a call to one of the Create() methods in the AutoCatalog. In the sample app we decided to optimize on Logarithmic Loss: it gives a more nuanced view into the performance then accuracy, since it punishes uncertainty. We also decided to ignore the two FastTree algorithms that are not yet 100% UWP compliant. Here’s the experiment definition:

var settings = new MulticlassExperimentSettings
    MaxExperimentTimeInSeconds = 18,
    OptimizingMetric = MulticlassClassificationMetric.LogLoss,
    CacheDirectory = null

// These two trainers yield no metrics in UWP:

_experiment = MLContext.Auto().CreateMulticlassClassificationExperiment(settings);

Running the experiment

To execute the experiment … just call Execute() on the experiment, providing the data view and an optional progress handler to receive the trainer name and quality metrics after each individual test. The winning model is returned in the BestRun property of the experiment’s result:

var result = _experiment.Execute(
    trainData: _trainingDataView,
    labelColumnName: "Label",
    progressHandler: this);

return result.BestRun.TrainerName;

The progress handler must implement the IProgress interface which declares a Report() method that is called each time an individual test in the experiment finishes. In the sample app we let the MVVM Model implement this interface, and pass the algorithm name and the quality metrics to the MVVM ViewModel via an event. Eventually the diagram in the MVVM View -the XAML page- will be updated.

Here’s the code in the Model:

internal class AutomationModel : 
    // ...

    public event EventHandler<ProgressEventArgs> Progressed;

    // ...

    public void Report(RunDetail<MulticlassClassificationMetrics> value)
        Progressed?.Invoke(this, new ProgressEventArgs
            Model = new AutomationExperiment
                Trainer = value.TrainerName,
                LogLoss = value.ValidationMetrics?.LogLoss,
                LogLossReduction = value.ValidationMetrics?.LogLossReduction,
                MicroAccuracy = value.ValidationMetrics?.MicroAccuracy,
                MacroAccuracy = value.ValidationMetrics?.MacroAccuracy

The next screenshot shows the result of the algorithm selection phase in the sample app. The proposed model is not super good, but that’s mainly our own fault – the wine scoring problem is more a regression than a multiclass classification. If you consider that these models are unaware of the score order (they don’t realize that 8 is better than 7 is better than 6, etcetera – so they also not realize that a score of 4 may be appropriate if you hesitate between a 3 and a 5), then you will realize that they’re actually pretty accurate.

Here’s a graphical overview of the different models in the experiment. Notice the correlation –positive or negative- between the various quality metrics:


Some of the individual tests return really bad models. Here’s an example of an instance with a negative value for the Log Loss Reduction quality metric:


This model performs worse than just randomly selecting a score. Make sure to run experiments long enough to eliminate such candidates.

Round 2: Parameter Sweeping

When using AutoML, we propose to first run a set of high level experiments to discover the algorithms that best suit your specific machine learning problem and, and then run a second set of experiments with a limited number of algorithms –just one or two- to fine-tune their appropriate hyperparameters. Data scientists call this parameter sweeping. For developers the source code for both sets is almost identical. In round 1 we start with all algorithms and Remove() some, and in round 2 we first Clear() the ICollection of trainers first and then Add() the few that we want to evaluate.

Here’s the full parameter sweeping code in the sample app:

var settings = new MulticlassExperimentSettings
    MaxExperimentTimeInSeconds = 180,
    OptimizingMetric = MulticlassClassificationMetric.LogLoss,
    CacheDirectory = null


var experiment = MLContext.Auto().CreateMulticlassClassificationExperiment(settings);

var result = experiment.Execute(
    trainData: _trainingDataView,
    labelColumnName: "Label",
    progressHandler: this);

var model = result.BestRun.Model as TransformerChain<ITransformer>;

Here’s the result of parameter sweeping on the LightGbmMulti algorithm, the winner of the first round in the sample app. If you compare the diagram to the Round 1 values, you’ll observe a general improvement of the quality metrics. The orange Log Loss curve consistently shows lower values:


Not all parameter swiping experiments are equally useful. Here’s the result that compares different parameters for the LbfgsMaximumEntropy algorithm in the sample. All tests return pretty much the same (bad) model. This experiment just confirms that this is not the right algorithm for the scenario:


Inspecting the Winner

After running some experiments you probably want to dive into the details of the winning model. Unfortunately this is the place where the API currently falls short: most if not all of the hyperparameters of the generated models are stored in private properties. Your options to drill down to the details are

  • open the model that you saved (it’s just a .zip file with flat texts in it), or
  • rely on Reflection.

We decided to rely on the Reflection features in the Visual Studio debugger. In all multiclass classification experiments that we did, the prediction model was the last transformer in the first step of the generated pipeline. So in the sample app we assigned this to a variable to facilitate inspection via a breakpoint.

Here are the OneVersusAll parameters of the winning model. They’re the bias, the weights, splits and leaf values of the underlying RegressionTreeEnsemble for each possible score:


That sounds like a pretty complex structure, so let’s shed some light on it. For starters, LightGbmMulti is a so-called One-Versus-All (OVA) algorithm. OVA is a technique to solve a multiclass classification problem by a group of binary classifiers.

The following diagram illustrates using three binary classifiers to recognize squares, triangles or crosses:


When the model is asked to create a prediction, it delegates the question to all three classifiers and then deducts the result. If the answers are

  • I think it’s a square,
  • I think it’s not a triangle, and
  • I think it’s not a cross,

then you can be pretty sure that it’s a square, no?

The winning model in the sample app detected 7 values for the label, so it created 7 binary classifiers. This means that not all the scores from 0 to 10 were given. [This observation made us realize that we should have treated this problem as a regression instead of as a classification.] Each of these 7 binary classifiers is a a LightGBM trainer – a gradient boosting framework that uses tree-based learning algorithms. Gradient boosting is –just like OVA- a technique that solves a problem using multiple classifiers. LightGBM builds a strong learner by combining an ensemble of weak learners. These weak learners are typically decision trees. Apparently each of the 7 classifiers in the sample app scenario hosts an ensemble of 100 trees, each with a different weight, bias, and a set of leafs and split values for each branch.

The following screenshot shows a simpler set of hyperparameters. It’s the result of a parameter sweeping round for the LbfgsMaximumEntropy algorithm, also known as multinomial logistic regression. This one is also a One-Versus-All trainer, so there are again 7 submodels. This time the models are simpler. The algorithm created a regression function for each of the score values. The parameters are the weight of each feature in that function:


At this point in time the API’s main target is to support the Command Line Tool and the Model Builder, and that’s probably why the model’s details are declared private. All of them already appear in the output of the CLI however, so we assume that full programmatic access to the models (and the source code to generate them!) is just a matter of time.


Here’s the overview of a canonical machine learning use case. The phases that are covered by AutoML are colored:


To complete the scenario, all you need to do is

  • make your raw data available in an IDataView-friendly way: a .csv file or an IEnumerable (e.g. from a database query), and
  • consume the generated model in your apps, that’s just three lines of code: load the .zip file, create a prediction engine, call the prediction engine.

AutoML will do the rest. Impressive, no? There’s no excuse for not starting to embed machine learning in your .NET apps…

The Source

The sample app lives here on GitHub.