In this article we’ll describe how to design and build explainable machine learning models with ML.NET. We used a UWP app to host these models, and OxyPlot to create the diagrams to visualize the importance of features for a model and/or a prediction.

We will discuss the following three tasks that you can execute with the ML.NET Explainability API:

- calculate the feature weights for a linear regression model,
- calculate the feature contributions for a prediction, and
- calculate Permutation Feature Importance (PFI) for a model.

The two corresponding pages it the UWP sample app look like this:

In many Machine Learning scenarios it is not only important to have a model that is accurate enough, but also to have one that is interpretable. In Health Care or Finance, models should be transparent enough to explain **why** they made a particular prediction. Conclusions like “You are 75% healthy” or “Your loan application was not approved” may require a better excuse than “because the computer says so”. Model explainability –knowing the importance of its features- is not only useful in justifying predictions, but also in refining the model itself – through feature selection. Investigating explainability allows you to remove features that are not significant for a model, so that you probably end up with shorter training times and less resource intensive prediction making.

The code in this article is built around the 11-features white wine quality dataset that we already covered several times in this article series. We solved it as a binary classification problem and as a multiclass classification problem for AutoML. This time we finally approach it the correct way: as a regression problem where the outcome is a (continuous) numerical value – the *score*– within a range. We’ll use an Stochastic Dual Coordinate Ascent regression trainer as the core of the model pipeline. Let’s first build and train that model.

Here’s the class to store the input data:

public class FeatureContributionData { [LoadColumn(0)] public float FixedAcidity; [LoadColumn(1)] public float VolatileAcidity; [LoadColumn(2)] public float CitricAcid; // More features ... [LoadColumn(9)] public float Sulphates; [LoadColumn(10)] public float Alcohol; [LoadColumn(11), ColumnName("Label")] public float Label; }

As in any ML.NET scenario we need to instantiate an MLContext:

public MLContext MLContext { get; } = new MLContext(seed: null);

Here’s the code to build and train the model. We’ll refine it later in several ways to enhance its explainability:

private IEnumerable<FeatureContributionData> _trainData; private IDataView _transformedData; private ITransformer _transformationModel; private RegressionPredictionTransformer<LinearRegressionModelParameters> _regressionModel; public List<float> BuildAndTrain(string trainingDataPath) { IEstimator<ITransformer> pipeline = MLContext.Transforms.ReplaceMissingValues( outputColumnName: "FixedAcidity", replacementMode: MissingValueReplacingEstimator.ReplacementMode.Mean) .Append(MLContext.Transforms.Concatenate("Features", new[] { "FixedAcidity", "VolatileAcidity", "CitricAcid", "ResidualSugar", "Chlorides", "FreeSulfurDioxide", "TotalSulfurDioxide", "Density", "Ph", "Sulphates", "Alcohol"})) .Append(MLContext.Transforms.NormalizeMeanVariance("Features")); var trainData = MLContext.Data.LoadFromTextFile<FeatureContributionData>( path: trainingDataPath, separatorChar: ';', hasHeader: true); // Keep the data avalailable. _trainData = MLContext.Data.CreateEnumerable<FeatureContributionData>(trainData, true); // Cache the data view in memory. For an iterative algorithm such as SDCA this makes a huge difference. trainData = MLContext.Data.Cache(trainData); _transformationModel = pipeline.Fit(trainData); // Prepare the data for the algorithm. _transformedData = _transformationModel.Transform(trainData); // Choose a regression algorithm. var algorithm = MLContext.Regression.Trainers.Sdca(); // Train the model and score it on the transformed data. _regressionModel = algorithm.Fit(_transformedData); // ... }

# Which features are hot, and which are not

## Feature Weights in linear models

Machine Learning has several techniques for calculating how important features are in explaining/justifying the prediction. When your main algorithm is a linear classifier (e.g. linear regression) then it’s relatively easy to calculate feature contributions. The prediction is the linear combination of the features values, weighted by the model coefficients. So at model level there’s already a notion of feature contribution. In ML.NET these Weights are found in the LinearModelParameters class. This is the base class for all linear model parameter classes, like LinearRegressionModelParameters (used by SDCA) and OlsModelParameters (used by the Ordinary Least Squares trainer).

For any linear model, you can fetch the overall feature weights like this:

// Return the weights. return _regressionModel.Model.Weights.ToList();

Here’s how the weights for the wine quality model look like in a diagram:

The a*lcohol* feature seems to dominate this particular model. Its weight is positive: a higher alcohol percentage results in a higher appreciation score. We may have just scientifically proven that alcohol is important in wine quality perception. In other news there are also at least three characteristics that could be ignored in this model – and maybe up to seven.

## Feature Contribution Calculation

A second component in ML.NET’s Explainability API is a calculator that computes the list of feature contributions for a specific prediction: the FeatureContributionCalculator. Just like the overall feature weights from the previous section, the calculated feature contributions can be positive or negative. The calculator works for a large number of regression, binary classification, and ranking algorithms. The documentation page on the FeatureContributionCalculatingEstimator class contains a list of all compatible trainers. It includes

- all linear models – because they inherently come with feature weights,
- all Generalized Additive Models (GAM) – because their squiggly, wiggly shape functions are created by combining linear models, and
- all tree based models – because they can calculate feature importance based on the values in the decision paths in the tree(s).

Thanks to ML.NET’s modular approach it’s easy to plug a feature contribution calculator into a model pipeline, even if the model is already trained. Here’s how we did this in the sample app.

First we extended the prediction data structure (which originally had only the *Score*) with an array to hold the contribution values for each feature:

public class FeatureContributionPrediction : FeatureContributionData { public float Score { get; set; } public float[] FeatureContributions { get; set; } }

Using the CalculateFeatureContribution() method we created and estimator, and trained it on just one input sample to become a transformer. This adds the calculation to the pipeline and the feature contributions to the output schema. The transformer was then appended to the trained model.

Here’s how that looks like in C#:

private PredictionEngine<FeatureContributionData, FeatureContributionPrediction> _predictionEngine; public void CreatePredictionModel() { // Get one row of sample data. var regressionData = _regressionModel.Transform(MLContext.Data.TakeRows(_transformedData, 1)); // Define a feature contribution calculator for all the features. // 'Train' it on the sample row. var featureContributionCalculator = MLContext.Transforms .CalculateFeatureContribution(_regressionModel, normalize: false) .Fit(regressionData); // Create the full transformer chain. var scoringPipeline = _transformationModel .Append(_regressionModel) .Append(featureContributionCalculator); // Create the prediction engine. _predictionEngine = MLContext.Model.CreatePredictionEngine<FeatureContributionData, FeatureContributionPrediction>(scoringPipeline); }

For testing the model, we didn’t dare to bother you with an input form for 11 features. Instead we added a button that randomly fetches one of the almost 4000 training samples and calculates the score and feature contributions for that sample.

Here’s the code behind this button:

public FeatureContributionPrediction GetRandomPrediction() { return _predictionEngine.Predict(_trainData.ElementAt(new Random().Next(3918))); }

Here’s how the results look like in a plot. This is an example of a linear model, so we can compare the overall model weights with the feature contributions for the specific prediction. We did not normalize the results in the feature contribution calculator configuration (the second parameter in the call) to keep these in the same range as the model weights:

Feature Contribution Calculation also works for models that don’t come with overall weights. Here’s how the results look like for a LightGBM regression trainer – one of the decision tree based algorithms:

Check the comments in the sample app source code for its details. Also in comments, is the code for using an Ordinary Least Squares linear regression trainer. This 18th century algorithm is one is even more biased towards alcohol than our initial SDCA trainer.

For yet another example, check this official sample that adds explainability to a model covering the classic Taxi Fare Prediction scenario.

## Permutation Feature Importance

The last calculator in the ML.NET Explainability API is the most computationally expensive one. It calculates the Permutation Feature Importance (PFI). Here’s how PFI calculation works:

- A baseline model is trained and its main quality metrics (accuracy, R squared, …) are recorded.
- The values of one feature are shuffled or partly replaced by random values – to undermine the relationship between the feature and the score.
- The modified data set is passed to the model to get new predictions and new values for the quality metrics. The result is expected to be worse than the baseline. If your model got better on random data then there was definitely something wrong with it.
- The feature importance is calculated as the degradation of a selected quality metric versus the one in the baseline.
- Steps 2, 3, and 4 are repeated for each feature so that the respective degradations can be compared: the more degradation for a feature, the more the model depends on that feature.

In ML.NET you fire up this process with a call to the PermutationFeatureImportance() method. You need to provide the model, the baseline data set, and the number of permutations – i.e. the number of feature values to replace:

// ... (same as the previous sample) // Prepare the data for the algorithm. var transformedData = transformationModel.Transform(trainData); // Choose a regression algorithm. var algorithm = MLContext.Regression.Trainers.Sdca(); // Train the model and score it on the transformed data. var regressionModel = algorithm.Fit(transformedData); // Calculate the PFI metrics. var permutationMetrics = MLContext.Regression.PermutationFeatureImportance( regressionModel, transformedData, permutationCount: 50);

The call returns an array of quality metric statistics for the model type. For a regression model it’s an array of RegressionMetricStatistics instances – each holding summary statistics over multiple observations of RegressionMetrics. In the sample app we decided R Squared to be the most important quality metric. So the decrease in this value determines feature importance.

We defined a data structure to hold the result:

public class FeatureImportance { public string Name { get; set; } public double R2Decrease { get; set; } }

The list of feature importances is created from the result of the call and visualized in a diagram. Since we decided that R Squared is the main quality metric for our model, we used *RSquared.Mean* (the mean decrease in R Squared during the calculation) as the target value for feature importance:

for (int i = 0; i < permutationMetrics.Length; i++) { result[i].R2Decrease = permutationMetrics[i].RSquared.Mean; }

Here’s how the plot looks like:

We used the same regression algorithm –SDCA- as in the previous samples, so the addiction to alcohol (the feature!) should not come as a surprise anymore.

Here’s a detailed view on one of the metric statistics when debugging:

That is an awful lot of information. Unfortunately the current API keeps most of the details private – well, you can always use Reflection. Access to more details would allow us to create more insightful diagrams like this one (source). It does not only show the mean, but also min and max values during the calculation:

For another example of explaining model predictions using Permutation Feature Importance, check this official ML.NET sample that covers house pricing.

# The Source

In this article we described three components of the ML.NET Explainability API that may help you to create machine learning models that have the ability clarify their predictions. The UWP sample app hosts many more ML.NET scenario’s. It lives here on GitHub.

Enjoy!