Bias vs Fairness vs Explainability in AI
Last Updated on August 27, 2021 by Editorial Team
Author(s): Ed Shee
Machine Learning
Over the last few years, there has been a distinct focus on building machine learning systems that are, in some way, responsible and ethical. The terms βBiasβ, βFairnessβ and βExplainabilityβ come up all over the place but their definitions are usually pretty fuzzy and they are widely misunderstood to mean the same thing. This blog aims to clear thatΒ upβ¦
Bias
Before we look at how bias appears in machine learning, letβs start with the dictionary definition for theΒ word:
βinclination or prejudice for or against one person or group, especially in a way considered to beΒ unfairβ
Look! The definition of bias includes the word βunfairβ. Itβs easy to see why the terms bias and fairness get confused for each other aΒ lot.
Bias can impact machine learning systems at pretty much every stage. Hereβs an example of how historical bias from the world around us can creep into yourΒ data:
Imagine youβre building a model to predict the next word in a sequence of text. To make sure youβve got lots of training data, you give it every book written in the last 50 years. You then ask it to predict the next word in this sentence:
βThe CEOs name isΒ ____β.
You then notice, perhaps unsurprisingly, that your model is much more likely to predict male names for the CEO than female ones. What has happened is youβve unintentionally taken the historical stereotypes that exist in our society and baked them into yourΒ model.
Bias doesnβt just occur in the data though, it can appear in the model too. If the data used to test a model doesnβt accurately represent the real world, you end up with whatβs called evaluation bias.
A good example of this would be training a facial recognition system and then using photos from Instagram to test it. Your model might have really high accuracy on the test set but it is likely to underperform in the real world because the majority of Instagram users are between the ages of 18 and 35. Your model is now biased towards that age group and will perform worse on the faces of older or youngerΒ people.
There are actually loads of different types of bias in machine learning, Iβll cover all of those in a separateΒ blog.
The word bias almost always comes with negative connotations but itβs important to note that this isnβt always the case in machine learning. Having prior knowledge of the problem youβre trying to solve can help you to select relevant features during modeling. This introduces human bias but can often speed up or improve the modelingΒ process.
Explainability
Sometimes referred to as interpretability, explainability attempts to explain how a machine learning model makes predictions. It is about interrogating a model, gathering information on why a particular prediction (or series of predictions) was made, and then presenting this information back to humans in a comprehensible manner.
There are typically two situations youβll be in when trying to explain how a modelΒ works:
- Black BoxβββYou have no access or information about the underlying model. The inputs and outputs of the model are all you can use to generate an explanation.
- White BoxβββYou have access to the underlying model so itβs easier to provide information about exactly why a certain prediction wasΒ made.
On the whole, βwhite boxβ models tend to be simpler in design, sometimes deliberately, so that explanations can be easily generated. The downside is that using a simpler, more interpretable model might fail to capture the complexity of the relationships in your data which means you could be faced with a tradeoff between interpretability and model performance.
When doing explainability, weβre typically interested in one of twoΒ things:
- Model ViewβββOverall, what features are more important than others to theΒ model?
- Instance ViewβββFor a particular prediction, what factors contributed?
The techniques used for explainability depend on whether your model is a black box or white box, whether youβre interested in the model view or instance view, and also depends on the type of data youβre exploring. The open source library Alibi does a great job of explaining these techniques in furtherΒ detail.
Personally, I like to think of white-box models as βInterpretabilityβ (because of the requirement for an interpretable model) and black-box models as βExplainabilityβ (because we are attempting to explain the unknown). Sadly, however, there is no official definition and the words are often used interchangeably.
Fairness
Fairness is by far the most subjective of the three terms. As we did for bias, letβs glance at its everyday definition before looking at how itβs applied in machine learning:
βimpartial and just treatment or behaviour without favouritism or discrimination.β
Applying this to the context of machine learning, the definition I like to useΒ is:
βAn algorithm is fair if it makes predictions that do not favour or discriminate against certain individuals or groups based on sensitive characteristics.β
Most definitions youβll see (including mine above) tend to narrow the scope to machine learning that affects humans. Typically this is where AI can have disastrous consequences, and so fairness is super important. Something like a mortgage approval or a healthcare diagnosis is such a life-changing event that itβs critical we handle predictions in a fair and responsible way.
Youβre probably asking yourself βWhatβs a βsensitive characteristicβ though?β which is a very good question. The interpretation of the definition depends heavily on what you class as sensitive. Some obvious examples tend to be things like race, gender, sexual orientation, disability, etcβ¦
One approach is to just remove all βsensitiveβ attributes when building a model. This seems like a sensible thing to do at first but there are actually multiple issues withΒ this:
- The sensitive features might actually be critical to the model. Imagine youβre trying to predict the height a child will be when they are fully grown. Removing sensitive attributes like age and sex will make your predictions useless.
- Fairness is not necessarily about being agnostic. Sometimes itβs important to include sensitive features in order to favor those who might be discriminated against in other features. An example of this is university admissions, where raw grades alone may not be the best way to find the brightest pupils. Those who had access to fewer resources or a lower quality of education might have had better scores otherwise.
- Sensitive features might be hidden in other attributes. It is often possible to determine the values for sensitive features using a combination of non-sensitive ones. For example, an applicantβs full name might allow a machine learning model to infer their race, nationality, orΒ gender.
The reality is that AI fairness is an incredibly difficult field. It requires policymakers to define what βfairβ looks like for each use case which can sometimes be very subjective. Often there is also a trade-off between group fairness and individual fairness. Using the university admissions example from earlier, making your algorithm fairer for an underprivileged group who didnβt have the same educational resources (group fairness) comes at the cost of those who had a good educational background and whose grades are now no longer quite good enough (individual fairness).
Summary
In summary, bias, explainability, and fairness are not the same thing. Whilst trying to explain all or part of a machine learning model, you might find that the model contains bias. The existence of that bias might even mean that your model is unfair. That doesnβt, however, mean that explainability, bias, and fairness are the sameΒ thing.
TL;DR
Bias is a preference or prejudice against a particular group, individual, or feature and comes in manyΒ forms.
Explainability is the ability to explain how or why a model makes a predictions
Fairness is the subjective practice of using AI without favoritism or discrimination, particularly pertaining toΒ humans
Bias vs Fairness vs Explainability in AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI