ML Interpretability – What it is, and how it affects us

Picture the scene. It’s a young bank manager’s first day on the job, having survived the gruelling interviews and assessment days to make it into Adatis Bank. To start the day, they meet a young couple wishing to buy their first home, a place to start a family in. The meeting goes well, and the bank manager is going to grant the couple a mortgage, but they just have to run the couple’s details into the Loan Assigner 3000, a machine learning application created by the bank’s data science team. This must be passed to get their loan.

A few minutes pass, and suddenly a big red notification pops up on the screen – ‘LOAN REJECTED’. The bank manager starts sweating. They break the bad news to the couple, that they won’t be able to buy the lovely 3 bed semi with a garage. The couple are devastated, and beg for advice on how to get a mortgage. Better credit score? A larger deposit? The bank manager has no idea, the Loan Assigner 3000 doesn’t give an explanation!

In this blog post, we will be looking at what machine learning interpretability is, why it’s so relevant, and some techniques on its implementation.

Machine Learning Interpretability

Machine Learning Interpretability addresses the ease of a decision made by a machine learning model being understood by a human. In a world where black box models, such as neural networks, are becoming more and more widespread, the need to be able to explain their operation becomes greater and greater. These models take the input details, process them, and output a prediction, without any reasoning. The opening example is tongue-in-cheek, but it touches on the potential moral dilemmas that can be created when interpretability is not considered in the development of machine learning based applications.

The use for interpretability extends beyond awkward conversations for bank managers, but into the development of predictive analytics solutions. A machine learning model making discriminatory decisions based on a personal characteristic, such as race or sexual orientation, is not only morally wrong, but suggests that the model could be overfitted to the training data and is ignoring the general trend that we wish to model (learn more here). Interpretability techniques can therefore not just be used for explaining the model’s actions to those affected, but for making more effective and generalisable models.

General Data Protection Regulation (GDPR)

Issues surrounding use of data have not been ignored by legislators, and in Europe the General Data Protection Regulation (GDPR) has introduced significant safeguards for citizens regarding their data and how it is processed, several of which are relevant to machine learning. Article 15, the Right to Access, gives citizens the right to access personal data and information about how said data is being used. Furthermore, GDPR also allow citizens to contest any automated decisions made solely on an algorithmic basis, which in our opening example could lead to Adatis Bank facing litigatory action!

The new dimensions introduced by the development of machine learning, and the legislation surrounding it, have led to machine learning interpretability becoming an active research area, with many techniques being created. Two well established methods, which we will be discussing in this post, are LIME and Shapley values.

Locally Interpretable Model-Agnostic Explanations (LIME)

This technique was first described in 2016 and was created by researchers at the University of Washington in the United States. Its underlying mechanism is surprisingly simple but has been used to explain the behaviour of black box models in a wide range of applications, such as in neural networks for image classification.

The method takes a data point with the label predicted by the black box model, and takes the assumption that the model can be queried as many times as desired. New data points are created by taking the original data point and varying its values slightly many times. These data points are passed through the black box model to obtain the predictions made by the model, and a new data set is then fully assembled. The new data set is then used to train a more interpretable model that describes the behaviour of the black box model. Such interpretable models would typically be a linear regression model for regression tasks, and a decision tree for classification tasks.

In the mortgage example, our manager would now be able to provide an approximation of the decision process that the previously unexplainable model took when processing the couple’s data, providing them with some solace. However, what if the couple wished to know what was the main reason for their loan rejection, in order to allow them to pass the Loan Assigner 3000? This is where techniques based on Shapley values become relevant.

Shapley Values

Shapley values are found within game theory, which is a branch of mathematics concerned with how decision makers should act rationally, and are named after Lloyd Shapley, a mathematician and economist. Shapley values can be applied within machine learning to assign an importance metric to the value of each feature for a data point, but has also seen applications in a wide range of fields, from biology through to economics.

The mathematical mechanism behind Shapley values is relatively complex, so we will not be discussing it here (more information here). The general intuition behind the technique is relatively straightforward to understand however, and follows the mathematical process fairly closely. This example will substitute the black box model for a room, and the features describing each data point for people.

The room is able to display a score, which is dependent on the combination of people inside it, however the process for assigning this score is unknown. We allow people to enter and exit the room one by one, and monitor the score that the room assigns to each combination of people. This process carries on for a long time, and eventually every possible combination of people and their scores have been noted. The relevance of a person to the room’s scoring mechanism (the feature’s importance to the black box model), can then be found by taking the average change in score when that person joins the room.

Returning to our bank manager, they would now be able to tell the couple that the Loan Assigner 3000’s main objection to them was their low credit score, and can sensibly advise them to buy a luxury car on finance and take out several credit cards in order to be granted a mortgage.


In this post, we have explored some of the issues surrounding machine learning, and some of the legal and technical processes that have been used to address these. As machine learning becomes more widely spread, we should expect to see both further problems and solutions created, and machine learning interpretability becoming more and more relevant.

This blog was heavily influenced by the e-Book Interpretable Machine Learning by Christoph Molnar, which includes many more examples and the mathematical details of the techniques described (find more here).