Bias vs Fairness vs Explainability in AI
Last Updated on August 27, 2021 by Editorial Team
Author(s): Ed Shee
Machine Learning

Over the last few years, there has been a distinct focus on building machine learning systems that are, in some way, responsible and ethical. The terms โBiasโ, โFairnessโ and โExplainabilityโ come up all over the place but their definitions are usually pretty fuzzy and they are widely misunderstood to mean the same thing. This blog aims to clear thatย upโฆ
Bias
Before we look at how bias appears in machine learning, letโs start with the dictionary definition for theย word:
โinclination or prejudice for or against one person or group, especially in a way considered to beย unfairโ
Look! The definition of bias includes the word โunfairโ. Itโs easy to see why the terms bias and fairness get confused for each other aย lot.
Bias can impact machine learning systems at pretty much every stage. Hereโs an example of how historical bias from the world around us can creep into yourย data:
Imagine youโre building a model to predict the next word in a sequence of text. To make sure youโve got lots of training data, you give it every book written in the last 50 years. You then ask it to predict the next word in this sentence:
โThe CEOs name isย ____โ.
You then notice, perhaps unsurprisingly, that your model is much more likely to predict male names for the CEO than female ones. What has happened is youโve unintentionally taken the historical stereotypes that exist in our society and baked them into yourย model.
Bias doesnโt just occur in the data though, it can appear in the model too. If the data used to test a model doesnโt accurately represent the real world, you end up with whatโs called evaluation bias.
A good example of this would be training a facial recognition system and then using photos from Instagram to test it. Your model might have really high accuracy on the test set but it is likely to underperform in the real world because the majority of Instagram users are between the ages of 18 and 35. Your model is now biased towards that age group and will perform worse on the faces of older or youngerย people.
There are actually loads of different types of bias in machine learning, Iโll cover all of those in a separateย blog.
The word bias almost always comes with negative connotations but itโs important to note that this isnโt always the case in machine learning. Having prior knowledge of the problem youโre trying to solve can help you to select relevant features during modeling. This introduces human bias but can often speed up or improve the modelingย process.

Explainability
Sometimes referred to as interpretability, explainability attempts to explain how a machine learning model makes predictions. It is about interrogating a model, gathering information on why a particular prediction (or series of predictions) was made, and then presenting this information back to humans in a comprehensible manner.
There are typically two situations youโll be in when trying to explain how a modelย works:
- Black BoxโโโYou have no access or information about the underlying model. The inputs and outputs of the model are all you can use to generate an explanation.
- White BoxโโโYou have access to the underlying model so itโs easier to provide information about exactly why a certain prediction wasย made.
On the whole, โwhite boxโ models tend to be simpler in design, sometimes deliberately, so that explanations can be easily generated. The downside is that using a simpler, more interpretable model might fail to capture the complexity of the relationships in your data which means you could be faced with a tradeoff between interpretability and model performance.
When doing explainability, weโre typically interested in one of twoย things:
- Model ViewโโโOverall, what features are more important than others to theย model?
- Instance ViewโโโFor a particular prediction, what factors contributed?
The techniques used for explainability depend on whether your model is a black box or white box, whether youโre interested in the model view or instance view, and also depends on the type of data youโre exploring. The open source library Alibi does a great job of explaining these techniques in furtherย detail.
Personally, I like to think of white-box models as โInterpretabilityโ (because of the requirement for an interpretable model) and black-box models as โExplainabilityโ (because we are attempting to explain the unknown). Sadly, however, there is no official definition and the words are often used interchangeably.

Fairness
Fairness is by far the most subjective of the three terms. As we did for bias, letโs glance at its everyday definition before looking at how itโs applied in machine learning:
โimpartial and just treatment or behaviour without favouritism or discrimination.โ
Applying this to the context of machine learning, the definition I like to useย is:
โAn algorithm is fair if it makes predictions that do not favour or discriminate against certain individuals or groups based on sensitive characteristics.โ
Most definitions youโll see (including mine above) tend to narrow the scope to machine learning that affects humans. Typically this is where AI can have disastrous consequences, and so fairness is super important. Something like a mortgage approval or a healthcare diagnosis is such a life-changing event that itโs critical we handle predictions in a fair and responsible way.
Youโre probably asking yourself โWhatโs a โsensitive characteristicโ though?โ which is a very good question. The interpretation of the definition depends heavily on what you class as sensitive. Some obvious examples tend to be things like race, gender, sexual orientation, disability, etcโฆ
One approach is to just remove all โsensitiveโ attributes when building a model. This seems like a sensible thing to do at first but there are actually multiple issues withย this:
- The sensitive features might actually be critical to the model. Imagine youโre trying to predict the height a child will be when they are fully grown. Removing sensitive attributes like age and sex will make your predictions useless.
- Fairness is not necessarily about being agnostic. Sometimes itโs important to include sensitive features in order to favor those who might be discriminated against in other features. An example of this is university admissions, where raw grades alone may not be the best way to find the brightest pupils. Those who had access to fewer resources or a lower quality of education might have had better scores otherwise.
- Sensitive features might be hidden in other attributes. It is often possible to determine the values for sensitive features using a combination of non-sensitive ones. For example, an applicantโs full name might allow a machine learning model to infer their race, nationality, orย gender.
The reality is that AI fairness is an incredibly difficult field. It requires policymakers to define what โfairโ looks like for each use case which can sometimes be very subjective. Often there is also a trade-off between group fairness and individual fairness. Using the university admissions example from earlier, making your algorithm fairer for an underprivileged group who didnโt have the same educational resources (group fairness) comes at the cost of those who had a good educational background and whose grades are now no longer quite good enough (individual fairness).
Summary
In summary, bias, explainability, and fairness are not the same thing. Whilst trying to explain all or part of a machine learning model, you might find that the model contains bias. The existence of that bias might even mean that your model is unfair. That doesnโt, however, mean that explainability, bias, and fairness are the sameย thing.
TL;DR
Bias is a preference or prejudice against a particular group, individual, or feature and comes in manyย forms.
Explainability is the ability to explain how or why a model makes a predictions
Fairness is the subjective practice of using AI without favoritism or discrimination, particularly pertaining toย humans
Bias vs Fairness vs Explainability in AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI