In this talk, Rsqrd AI welcomes Javier Antorán, PhD student at the University of Cambridge! Javier talks about the existing approaches to interpretability and how a new approach using counterfactuals can be used to explain uncertainty.
What is Interpretability?
Machine Learning interpretability is a broad term which is used loosely, but generally refers to the ability to explain or present something in understandable terms to a human. Someone could say an ML model is interpretable if humans are able to understand the reasoning behind its decisions. Others might go a step further and argue that a model is interpretable when humans are able to reproduce its reasoning and thus predict the model’s outputs given its inputs (this is known as forward stimulability). A really good overview of ML interpretability is given by Finale Doshi-Velez and Been Kim.
Existing Approaches to Interpretability
Interpretable Data Driven Decision Making
There are many existing interpretable “data-driven-decision-making” models, like linear regression or decision trees. In these models, it’s very simple to find the reasoning behind a decision. A linear regression’s output is the sum of the weights of its input features, meaning features that have more weight directly have a bigger impact on the output. A decision tree is dictated by yes-no questions and once all the questions have been asked, predictions are made and are constant within the leaf node. It’s fairly simple to trace back a decision up the tree.
Not Very Interpretable Data Driven Decision Making
If one sacrifices interpretability and the ability to easily determine the pretense of an output (ability to correlate where a decision comes from), a wider range of predictive models become available, like neural networks. They can do some cool things like take in the image of a road and output driving commands. Given this, it’s not as easy to predict the decision making for the output of a neural network like linear regression and decision trees.
Once interpretability is sacrificed in applications such as neural networks, how does someone get it back? The most commonly used approach is feature importance, and within the family of feature importance techniques is LIME (Local Interpretable Model-agnostic Explanations). The idea behind LIME is that someone can approximate their nonlinear model with an interpretable linear function. For a more in-depth look at how LIME works, you can go to 5m 05s.
Feature Importance on Images
When approximating with linear models on non-linear functions, it can become meaningless on strongly non-linear functions. Javier goes into detail on an example where LIME falsely classifies the pixels of an image of the number ‘2’ as positive evidence for the number ‘1’. The example can be found at 6m 52s.
The alternative to feature importance is counterfactual explanations. The idea behind counterfactuals is that they capture the notion of how things would have worked out if some factor would’ve been different – the “what-ifs” if a factor were to change. An example that Javier gives is in the medical field when giving a medical diagnosis. Once a final outcome and diagnosis has been resolved from running several tests and procedures, it’s commonplace to ask how an outcome would have been different had the medical staff done something differently. Another example of a counterfactual would be to think about “what if fish could fly? How would our ecosystem be different?”. Here’s an article that goes further in depth on counterfactuals.
The interpretability community has started using counterfactuals to ask questions like what sort of interventions does someone need in order to get a desirable result – what would need to change? In the context of classifiers, it’s questions like:
- What features would I need to remove such that my model’s confidence decreases?
- What features would I need to remove such that my model’s prediction changes?
The cool thing about counterfactuals is that they’re model-agnostic – their explanations have completely clear meanings with no need to approximate like in linear explanations in LIME.
Counterfactuals in Image Classification
Javier gives an example of counterfactual explanations in image classification.
Given a picture of a bird, each pixel can be removed and replaced. With each step, there’s a question that is asked: is it still a picture of a bird? This process can keep going until the model doesn’t think there’s a bird in the picture anymore, creating a collection of pixels that identified the animal. Full explanation can be found at 10m 09s.
Uncertainty in ML
Despite neural networks having gained a lot of popularity recently, a well-known issue is that neural networks are very overconfident and tend to behave erratically in scenarios that are different from the ones that they’ve been trained for
As mentioned, neural networks can seem very confused and place high confidence in decisions a human wouldn’t make. The solution to this is to build systems that are uncertainty aware - systems that can recognize where they could be confused.
Sources of Uncertainty
There are 2 reasons why a model can be uncertain:
- There’s noise in the data (Aleatoric Uncertainty)
- Asking about points very different from the ones in the training set (Epistemic Uncertainty)
Javier goes further in explaining what uncertainty looks like in a detailed example at 12m 53s.
Most existing ML models have ways of expressing noise uncertainty: the probabilities outputted by a regular NN’s softmax activation layer represent noise uncertainty. However, it is much more difficult to capture epistemic or “model uncertainty”. This latter form of uncertainty captures the notion that the data that the model has seen during training was insufficient for it to make a confident prediction about the test point with which it is being queried.
In practice, the more complex the model, the harder it is to capture epistemic uncertainty. Neural networks are very complicated which makes it nearly impossible to quantify what these models “don’t know”. Instead ML practitioners resort to approximations, like MC dropout. There is often a tradeoff between using simpler approximations which are more unreliable but scale to big networks and using more exact approximations that provide better uncertainty estimates.
Uncertainty in Practice
Here are some applications of how uncertainty is used in practice today:
- Robustness: no one would want self-driving cars or medical diagnoses to behave erratically when faced with something unfamiliar (One way to handle a situation like this would be to reject anything unfamiliar and ask for the intervention of a human driver or doctor, respectively)
- Dataset building, safety, and awareness: if there’s a model with uncertainty awareness, it can identify uncertainty in certain groups of data and reevaluate potential biases.
- Active learning: this is where there aren’t many labels but getting them is really expensive. In these situations, we want to identify the points which will be most useful to train our model. It turns out that the most telling points are the points with high estimates of epistemic uncertainty.
To play around with these ideas and implement your own Bayesian neural network, you can go to the GitHub repo here.
Are Models that are Aware of Uncertainty Interpretable?
Uncertainty-aware models are interpretable as regular ML models. If using linear regression as mentioned before, the model will be very interpretable. On the other hand, if using something like a Bayesian neural network, it probably won’t be interpretable.
However, uncertainty can also be used as a tool to help users understand their models’ predictions. Javier gives an example of a neural network identifying polyps in an image found at 19m 50s. Here, the Bayesian NN’s uncertainty around the detected region is more telling than the output from an interpretability technique (LIME).
What Happens when a Model Doesn’t Know the Answer?
Until now, we have just considered interpreting ML predictions where our model is confident. However, interpretability may be needed the most when this is not the case.
If a model decides to reject an option and doesn’t know what to predict: what happens next?
CLUE: Counterfactual Latent Uncertainty Explanations
CLUE is a method that answers the question “what is the smallest change we need to make to an input, while staying in-distribution, such that our model produces more certain predictions?”.
How CLUE would be used in practice is given by the following example:
Given an uncertain image (here the ambiguous image could be a ‘6’ or ‘8’), the reject option can be triggered. Then, CLUE would run and give the nearest certain image. In this example, CLUE takes out a line in the original image and now identifies the image as a ‘6’ with confidence, highlighting what the issue was.
Javier goes further into explaining the algorithm at 27 m.
Presenting CLUE to Users
CLUE is able to show the difference in the original input and its own prediction. Basically, what changed in the original input to make it look like CLUE’s output. In the case of handwritten digits, CLUE shows which strokes can be erased to create a more certain prediction.
CLUE can also be applied to classification, regression, and image data.
CLUE vs Feature Importance (LIME/SHAP)
CLUE’s counterfactual nature allows it to add more information than it starts with from the original input. CLUE can point out pieces (like different strokes in drawn numbers) that can be helpful or harmful to prediction. This is in contrast with feature importance techniques, like LIME or SHAP, which simply highlight different features if they provide evidence for or against a class, incapable of adding new information.
Javier goes over a great example where each of the 3 methods are used to interpret a model’s predictions for some uncertain inputs. Here, traditional approaches to interpretability struggle due to a lack of strong evidence for any class or conflicting evidence for multiple classes. A counterfactual approach allows CLUE to give the most useful observations at 29m 41s.
User Study to Show the Value of Uncertainty Estimates
The authors of CLUE wanted to test and see if any of this information on uncertainty is useful and actually make a difference to practitioners. A human simulability task was set up with the idea that an explanation should give a user enough insight to be able to predict how a model is going to behave in an unforeseen circumstance.
The setup can be found at 31m 10s. There are 4 approaches (groups) tested: CLUE, Random (randomly select points), Sensitivity (similar to CLUE but doesn’t take the step of restricting the hypothesis space of explanations to plausible inputs), and Human (actual people selecting relevant counterfactual explanations). Each approach has 10 “users” that will be shown its explanations. Each group is given some uncertain inputs and their explanations, and then asked to predict if their model will be certain or uncertain on some new inputs.
The results show that users provided with CLUEs did significantly better, with an accuracy of 82.22% over the average accuracy from the other approaches of 58.89%.
A similar experiment was held using images and can be found at 35m 36s.
Feature importance methods such as LIME have been valuable in interpretability, but as demonstrated, has its drawbacks such as an inability to represent extremely non-linear functions. An alternative to feature importance methods are counterfactual explanations that aim to tell how a decision was to change if some input factor were different.
Recently, the field of uncertainty aware NNs has gained a lot of traction. Uncertainty awareness opens up a whole new family of questions relating to ML interpretability which traditional methods are not well suited to answer. CLUE represents a first step towards reconciling ML interpretability and uncertainty. This method has shown success in helping users identify points for which their model is uncertainty and shows great potential in the growing field of interpretability.
Cool Stuff to Check Out
Towards A Rigorous Science of Interpretable Machine Learning by Finale Doshi-Velez and Been Kim
Interesting questions from the video:
- Can you talk a little bit about how CLUE compares to other explainability techniques and what gives it an advantage over others? 38m 38s
- Could counterfactuals be used to understand the effect of missing data and maybe optimal replacement missing data? 45m 16s
- Can you speak a little more on determining distribution data on CLUE? 50m 03s
All information and ideas presented in this post are that of the speaker and the talk.