In this talk, Rsqrd AI welcomes Umang Bhatt, Ph.D. student in the Machine Learning Group at the University of Cambridge and Research Fellow at Partnership on AI! Umang speaks about the challenges in ML explainability and his work in collaboration with Partnership on AI on ML explainability in the industry.
How are existing approaches to explainability used in practice? [Answer: only used by developers]
The following is the work of Umang and multiple collaborators and was also presented at the ACM FAccT 2019 Conference as the paper Explainable Machine Learning in Deployment.
Umang mentions how there’s been massive growth in the literature surrounding XAI (explainable AI). In particular, there has been an increase in proposed algorithms that aim to “explain” machine learning output. With this observation, one of the things Umang’s team set out to do was study how organizations use these algorithms.
For their study, they had semi-structured interviews with 50 people from 30 different organizations, with interviews lasting between 30 minutes - 2 hours.
The team realized there needs to be shared and established language for thinking about explainable AI to be able to have these conversations around the topic. Umang provides the following two definitions:
- Transparency: Providing stakeholders with relevant information about how the model works: this includes documentation of the training procedure, analysis of training data distribution, code releases, feature level explanations, etc.
- (Local) Explainability: Providing insights into a model’s behavior for speciﬁc datapoint(s)
The key point is that explainability is an algorithmic approach to transparency.
Questions That Were Asked
Once the shared language had been established in the interview, the interviewees were asked the following questions:
- What type of explanations have you used (e.g., feature-based, sample-based, counterfactual, or natural language)?
- Who is the audience for the model explanation (e.g., research scientists, product managers, domain experts, or users)?
- In what context have you deployed the explanations (e.g., informing the development process, informing human decision makers about the model, or informing the end user on how actions were taken based on the model’s output)?
Types of Explanations
Umang and his team found certain types of explanations that were very popular:
- Feature importance: using features of relevance (important) over less relevant features (unimportant) to make the right decisions and avoid misleading conclusions like spurious correlations
- Sample importance: attempts to answer the question “which training data points are the most influential when predicting on a new test point?”
- Counterfactuals: attempts to answer the question “what do you need to change about yourself in order to change your outcome with respect to this predictor?”
Who Cares about Explainability?
The next thing they realize is who are the stakeholders that have a vested interest on ML explainability:
- Executives: the team found that corporations, engineering managers, chief data scientists, etc. had experiences where their executives would ask them to start using explainable ML so they can say that their company uses explainable ML
- Engineers: they are the most popular stakeholder. ML engineers and data scientists use these explainability techniques to sanity check their models.
- End users: developing consumable explanations creates trust between end users and AI
- Regulators: regulators, such as laws, diplomats, and the research community, have a vested interest in understanding how XAI works to see how it affects the surrounding community.
The team finalizes 3 key findings of the study.
Explainability is used for debugging internally
As seen earlier, explainable AI is used primarily by ML engineers for ML engineers. They kind of add this layer of XAI on top of their existing pipelines to check their models. These systems don’t find their way to the end user as exclaimed by some XAI literature. The primary uses for these systems that they found were in content moderation and in finance.
Goals of explainability aren’t defined internally
They found that important stakeholders are not involved from the beginning which ultimately makes it hard for explanations to be consumed by the right people. It’s important to engage with these stakeholders and define the purpose and role of these explanations and to establish explainability goals.
Technical limitations make explainability hard to deploy in real-time
There were some frequent limitations that came up in the study.
- Spurious correlations exposed by feature level explanations: if I find a problem with the feature attribution, how do I fix it?
- No causal underpinnings to the models themselves: execs and regulators want to throw things into a model and want to see its impact on the outcome: they expect to see some sort of correlation. The models might not be causal at all and support this system that creates correlations. From thxis stems causal explanations for non-causal models, and disinterest from execs and regulators because there isn’t causal understanding.
- Sample importance is computationally infeasible to deploy at scale: which training points are similar to the model and create similar outcomes? That’s hard to answer, especially at scale.
- Privacy concerns of model inversion exist: if I can give you explanations, I can invert your model and invert your training data and reconstruct it, and it becomes a huge privacy concern.
Can existing explainability tools be used to ensure model unfairness? [Answer: not in their current form]
The work that Umang talks about that answers this question comes from another paper Umang worked on called You Shouldn’t Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods.
The key (and slightly depressing) takeaway from this section that Umang emphasizes is:
Feature importance reveals nothing reliable about model fairness.
Why should people care about good explanations?
Heed the following example from Umang.
Here, there are two mock models with 3 features: 1. wealth 2. age, and 3. time spent on doing some downstream tasks. Say someone uses the same explanation techniques on them.
The explanation on the first model ranks the importance of the features as time> wealth > age. The explanation on the second model ranks the importance of the features as time > wealth, and age is unimportant.
These models can perform the same with the same accuracy, but their explanations are vastly different. In theory, a regulator can come in and say that model A is unfair because it uses a discriminatory attribute, age.
This problem begs the question: how can someone turn model A into model B without losing accuracy?
Attempting to manipulate explanations
There is some work already that’s gone into manipulating explanations. The solution explored in this paper is downgrading explanations with adversarial perturbations on the parameters such that the authors can hide the unfair or discriminatory feature. Or simply explained, attacking the parameters of a model such that the attribution given to a certain feature (like age) decreases.
The setup and method used in the paper can be found at 25:50 in more detail. The goal of the experiment parameterized by theta is to find some delta that gives the aforementioned results. Here, fθ is model A, and fθ+δ is model B.
What Umang and the authors are looking at is importance rankings. Here, they are looking at 14 features. They can take feature importance rankings and just rank features based on the magnitude of attribution given to them. The paper reports the magnitude and ranking of the discriminatory features with respect to different explanation techniques.
The thought behind this is that a regulator can look at the rankings for a discriminatory feature. If the ranking is very high in an explanation such as in model A, the regulator can say that the discriminatory feature is being used too much and that another model would be better. In model B, the authors show that they can flatten out the gradients with respect to multiple explanation techniques and scale back the magnitude of attribution given to a feature.
TLDR; the paper’s adversarial explanation attack does 3 key things: significantly decreases relative importance, generalizes to test points, and transfers across explanation methods, along with the following key findings:
How can we create explainability tools for external stakeholders? [Answer: community engagement and thoughtful deployment]
This work is called Machine Learning Explainability for External Stakeholders, and it was based on a closed convening hosted by the Centre for the Future Intelligence, Partnership on AI, and IBM.
In this activity, 33 participants from 5 countries came together in New York in early February. They were 15 ML experts, 3 designers, 6 legal experts, and 9 policymakers. They came from a wide variety of domain expertise with backgrounds in finance, media, healthcare, and social services. The goal was to facilitate an inter-stakeholder conversation around explainable machine learning.
There are two main takeaways:
- There is a need for community engagement in the development of explainable machine learning.
- There are nuances in the deployment of explainable machine learning.
Umang proposes to tackle these takeaways as questions to be answered by the community and further research.
The following questions should be asked when engaging with the community.
- In which context will this explanation be used? Does the context change the properties of the explanations we expose?
- How should the explanation be evaluated? Both quantitatively and qualitatively
- Can we prevent data misuse and preferential treatment by involving aﬀected groups in the development process?
- Can we educate external stakeholders (and data scientists) regarding the functionalities and limitations of explainable machine learning?
In the same way when considering community engagement, the following important questions should be asked when deploying explainability.
- How does uncertainty in the model and introduced by the (potentially approximate) explanation technique aﬀect the resulting explanations?
- How can stakeholders interact with the resulting explanations? Can explanations be a conduit for interacting with the model?
- How, if at all, will stakeholder behavior change as a result of the explanation shown?
- Over time, how will the explanation technique adapt to changes in stakeholder behavior?
All of these things are completely context dependent, and that’s why it’s important to have community engagement from creating the model to iterating on it.
It’s become critical to be able to explain machine learning and to be able to answer questions on where outputs come from. There’s increasing interest from stakeholders in explainable AI, but it’s a lot easier said than done. While current explainability tools have a long way to go and are not sufficient in their current form, AI practitioners can continue to create meaningful explainability tools for stakeholders through community engagement and thoughtful deployment.
Cool Stuff to Check Out
Interesting questions from the video:
- How strongly do regulators insist on explainable models? Are they open to non-explainable models? And if so, what are the desiderata? 20m 50s
- Is there an agreed method to explain a given model regardless of where it’s deployed? 32m 30s
- What are your thoughts on liability, policy, and safety standards in spaces like self-driving cars in regards to explainable ML? 38m 34s
- What are the overall incentives of using explainable AI? 55m 43s
All information and ideas presented in this post are that of the speaker and the talk.