Monthly Archives: March 2019

Machine Learning with ML.NET in UWP: Feature Correlation Analysis

In this article we show how to perform Feature Correlation Analysis and display the results in a Heat Map in the context of Machine Learning in UWP. It’s the fourth in a series that started here, on implementing Machine Learning scenarios in UWP using Open Source frameworks and components such as

All articles in the series revolve around a single UWP sample app that lives here on GitHub. Here’s how the Feature Correlation Analysis page looks like:


It displays the correlation between different properties in the popular Titanic passengers dataset: age, fare, ticket class, whether the passenger was accompanied with siblings, spouses, parents or children, and whether he or she survived the trip.

The darker red or blue squares on the heat map indicate that the corresponding properties on X and Y axis have a higher correlation with each other. Higher correlation is a warning sign for possible negative impact on the classification model when both features would be added to the training data.

Feature Correlation Analysis

Feature Correlation Analysis in Machine Learning

The topic of this article is Feature Correlation Analysis. Just like in the previous article -Feature Distribution Analysis- we are in the “data preparation” phase of a Machine Learning scenario. We’re not training or even defining models yet, we’re selecting the features to train them with. An ideal feature set contains features that are highly correlated with the classification (in ML.NET terminology: the Label Column), yet uncorrelated to each other.

Identifying the right feature set highly impacts the quality and performance of the subsequent learning and generalization steps. Here are two important reasons not to keep on adding feature columns to a training data set:

  • while the predictive power of a classifier first increases with the number of dimensions/features used, there comes a break point where it decreases again (the so-called Curse of Dimensionality). The training data set is always a finite set of samples with discrete values, while the prediction space may be infinite and continuous in all dimensions. So the more features a data set with fixed a number of samples has, the less representative it may become. Secondly,
  • there’s also the cost incurred by adding features. Two features that are highly correlated with each other don’t add much value to a classifier, but they sure add cost at training, persisting and/or inference time. Machine Learning activities can be pretty resource intensive on CPU, memory, and elapsed time, so it sure makes sense to limit the number or features.

The process of converting a set of observations of possibly correlated variables into a smaller set of values of linearly uncorrelated variables is called Principal Component Analysis. This technique was invented by Karl Pearson, the same person that defined its main instrument: the Pearson correlation coefficient – a measure of the linear correlation between two variables X and Y. Pearson’s correlation coefficient is the covariance of the two variables divided by the product of their standard deviations. It has a value between +1 and -1, where 1 is total positive linear correlation, 0 is no linear correlation, and -1 is total negative linear correlation.

During Principal Component Analysis a matrix is calculated with the correlation between each pair of features. Highly correlated feature pairs are then ‘sanitized’ by removing one or combining both (e.g. by creating a new feature by multiplying the variables’ values). At the same time you also calculate the correlation between each attribute and the output variable (the Label), to select only those attributes that have a moderate-to-high positive or negative correlation (close to -1 or 1) and drop those attributes with a low correlation (value close to zero).

Feature Correlation Analysis with ML.NET and Math.NET

Data Preparation is outside the core business of ML.NET itself, but for retrieving and manipulating the candidate training data we can count one on its most important spin-off components: the DataView API.


We can fetch the samples, optionally filter and add missing data, and then pivot it into arrays of feature values (exactly what we did in the previous article). Al we need is TextLoader to create an IDataView from the data set, get the column names from its Schema, call GetColumn() to get the array, and ‘upgrade’ the data type to double:

var reader = new TextLoader(_mlContext,
                            new TextLoader.Arguments()
                                Separator = ",",
                                HasHeader = true,
                                Column = new[]
                                    new TextLoader.Column("Survived", DataKind.R4, 1),
                                    new TextLoader.Column("PClass", DataKind.R4, 2),
                                    new TextLoader.Column("Age", DataKind.R4, 5),
                                    new TextLoader.Column("SibSp", DataKind.R4, 6),
                                    new TextLoader.Column("Parch", DataKind.R4, 7),
                                    new TextLoader.Column("Fare", DataKind.R4, 9)
var dataView = reader.Read(src);
var result = new List<List<double>>();
for (int i = 0; i < dataView.Schema.ColumnCount; i++)
    var columnName = dataView.Schema.GetColumnName(i);
    result.Add(dataView.GetColumn<float>(_mlContext, columnName).Select(f => (double)f).ToList());

return result;

Now that we have all the feature values in arrays, it’s time to calculate the correlations. The MathNet.Numerics.Statistics.Correlation class from Math.NET hosts implementations for several Pearson, Spearman, and other correlation calculations.

We decided to make a copy of the code that calculates the Pearson correlation between two IEnumerable<double> instances:

/// <summary>
/// Computes the Pearson Product-Moment Correlation coefficient.
/// </summary>
/// <param name="dataA">Sample data A.</param>
/// <param name="dataB">Sample data B.</param>
/// <returns>The Pearson product-moment correlation coefficient.</returns>
/// <remarks>Original Source: </remarks>
public static double Pearson(IEnumerable<double> dataA, IEnumerable<double> dataB)
    var n = 0;
    var r = 0.0;

    var meanA = 0d;
    var meanB = 0d;
    var varA = 0d;
    var varB = 0d;

    using (IEnumerator<double> ieA = dataA.GetEnumerator())
    using (IEnumerator<double> ieB = dataB.GetEnumerator())
        while (ieA.MoveNext())
            if (!ieB.MoveNext())
                throw new ArgumentOutOfRangeException(nameof(dataB), "Array too short.");

            var currentA = ieA.Current;
            var currentB = ieB.Current;

            var deltaA = currentA - meanA;
            var scaleDeltaA = deltaA / ++n;

            var deltaB = currentB - meanB;
            var scaleDeltaB = deltaB / n;

            meanA += scaleDeltaA;
            meanB += scaleDeltaB;

            varA += scaleDeltaA * deltaA * (n - 1);
            varB += scaleDeltaB * deltaB * (n - 1);
            r += (deltaA * deltaB * (n - 1)) / n;

        if (ieB.MoveNext())
            throw new ArgumentOutOfRangeException(nameof(dataA), "Array too short.");

    return r / Math.Sqrt(varA * varB);

For the sake of completeness: that same Math.NET class also hosts code to calculate the whole matrix. For the sample app this would require importing a lot more code (linear algebra classes such as Matrix) or adding the whole NuGet package to the project.

Here’s how the sample app calculates the whole correlation matrix:

// Read data
var matrix = await ViewModel.LoadCorrelationData();

// Populate diagram
var data = new double[6, 6];
for (int x = 0; x < 6; ++x)
    for (int y = 0; y < 5 - x; ++y)
        var seriesA = matrix[x];
        var seriesB = matrix[5 - y];

        var value = Statistics.Pearson(seriesA, seriesB);

        data[x, y] = value;
        data[5 - y, 5 - x] = value;

    data[x, 5 - x] = 1;

All we now need is a way to properly visualize it.

Correlation Heat Maps

A heat map is a representation of data in which the values are represented by colors. They are ideal to highlight patterns and extreme values in rectangular data such as matrixes.

Correlation Heat Maps in Machine Learning

In Machine Learning, heat maps are used to display correlations between feature values. The typical (“Pearson”) color scheme is a gradient that goes from

  • red for high positive correlation (value +1), over
  • white for no correlation (value 0), to
  • blue for high negative correlation (value –1).

Sometimes the values are normalized (brought to a range from 0 to 1) like in the next image. When there are a lot of features, the value labels are omitted in the the diagram. Since correlation is commutative (the correlation between A and B is the same as the correlation between B and A) it suffices to only display half of the matrix, like this:


This diagram also omitted the correlations on the diagonal: the red squares that indicate the full positive correlation between each feature and itself.

Here’s an example of a diagram (from here) showing less features. It’s common to display the whole matrix and the value labels:


The above diagram was created with Python (Pandas and Seaborn) and shows the correlation between all the numerical values in the already mentioned Titanic Passengers dataset.

Here’s the UWP sample app version of the very same diagram, calculated with ML.NET and Math.NET and visualized with OxyPlot:


The small differences in correlations for the Age feature are caused by the sample app not compensating missing values. We have the matrix, let’s plot this diagram.

Correlation Heat Maps with OxyPlot

To draw an OxyPlot diagram, you start with placing a PlotView element in your XAML:

<oxy:PlotView x:Name="Diagram"
                BorderThickness="0" />

Then you can declaratively or programmatically decorate it with a PlotModel and different Axis instances. A correlation heat map uses a CategoryAxis in both dimensions:

plotModel.Axes.Add(new CategoryAxis
    Position = AxisPosition.Bottom,
    Key = "HorizontalAxis",
    ItemsSource = new[]
        "Sib / Sp",
        "Par / Chi",
    TextColor = foreground,
    TicklineColor = foreground,
    TitleColor = foreground

plotModel.Axes.Add(new CategoryAxis
    Position = AxisPosition.Left,
    Key = "VerticalAxis",
    ItemsSource = new[]
        "Parents / Children",
        "Siblings / Spouses",
    TextColor = foreground,
    TicklineColor = foreground,
    TitleColor = foreground

The legend on top of the diagram is an extra LinearColorAxis in the appropriate OxyPalette:

plotModel.Axes.Add(new LinearColorAxis
    // Pearson color scheme from blue over white to red.
    Palette = OxyPalettes.BlueWhiteRed31,
    Position = AxisPosition.Top,
    Minimum = -1,
    Maximum = 1,
    TicklineColor = OxyColors.Transparent

If you’re not entirely satisfied with the color scheme, feel free to create your own custom OxyPalette instance: it’s just a 3-color gradient.

The matrix itself is a HeatMapSeries with 6 values in each dimension, rendered as rectangles:

var heatMapSeries = new HeatMapSeries
    X0 = 0,
    X1 = 5,
    Y0 = 0,
    Y1 = 5,
    XAxisKey = "HorizontalAxis",
    YAxisKey = "VerticalAxis",
    RenderMethod = HeatMapRenderMethod.Rectangles,
    LabelFontSize = 0.12,
    LabelFormatString = ".00"


Diagram.Model = plotModel;

To display the label in the square you need to set the LabelFormatString. This will only be applied if you also set a value to LabelFontSize.

OxyPlot does not support the triangular version of the heat map. Missing values always get the default value and you can’t make them transparent:


To populate the diagram, assign the Data to the series, and refresh the plot:

(plotModel.Series[0] as HeatMapSeries).Data = data;

// Update diagram

Again, here’s the resulting heat map in the sample app:



The dark blue square on the diagram reveals a relatively high negative correlation between the Passenger Class and Ticket Fare. This means that the value for the one can easily be derived from the other – think “first class tickets cost more than second class tickets”. Adding both as a feature would not add more value than adding only one of them.

Data scientists would probably extract a new feature from these two (something like “Luxury”) or would break up “Passenger Class” and “Ticket Fare” in more basic components, like locations on the ship that passengers had access to. Anyway, the heat map clearly highlights the feature combinations that need further analysis.


In this article we used components from ML.NET, Math.NET and OxyPlot to calculate and visualize the correlation heat map on candidate training data for a classification model.

The UWP sample app host more Machine Learning scenarios. It lives here on GitHub.