Join thousands of AI enthusiasts and experts at the Learn AI Community.



Bias vs Fairness vs Explainability in AI

Last Updated on August 27, 2021 by Editorial Team

Author(s): Ed Shee

Machine Learning

Photo by Lukas on Unsplash

Over the last few years, there has been a distinct focus on building machine learning systems that are, in some way, responsible and ethical. The terms “Bias”, “Fairness” and “Explainability” come up all over the place but their definitions are usually pretty fuzzy and they are widely misunderstood to mean the same thing. This blog aims to clear that up…


Before we look at how bias appears in machine learning, let’s start with the dictionary definition for the word:

“inclination or prejudice for or against one person or group, especially in a way considered to be unfair”

Look! The definition of bias includes the word “unfair”. It’s easy to see why the terms bias and fairness get confused for each other a lot.

Bias can impact machine learning systems at pretty much every stage. Here’s an example of how historical bias from the world around us can creep into your data:

Imagine you’re building a model to predict the next word in a sequence of text. To make sure you’ve got lots of training data, you give it every book written in the last 50 years. You then ask it to predict the next word in this sentence:

“The CEOs name is ____”.

You then notice, perhaps unsurprisingly, that your model is much more likely to predict male names for the CEO than female ones. What has happened is you’ve unintentionally taken the historical stereotypes that exist in our society and baked them into your model.

Bias doesn’t just occur in the data though, it can appear in the model too. If the data used to test a model doesn’t accurately represent the real world, you end up with what’s called evaluation bias.

A good example of this would be training a facial recognition system and then using photos from Instagram to test it. Your model might have really high accuracy on the test set but it is likely to underperform in the real world because the majority of Instagram users are between the ages of 18 and 35. Your model is now biased towards that age group and will perform worse on the faces of older or younger people.

There are actually loads of different types of bias in machine learning, I’ll cover all of those in a separate blog.

The word bias almost always comes with negative connotations but it’s important to note that this isn’t always the case in machine learning. Having prior knowledge of the problem you’re trying to solve can help you to select relevant features during modeling. This introduces human bias but can often speed up or improve the modeling process.

Photo by Emily Morter on Unsplash


Sometimes referred to as interpretability, explainability attempts to explain how a machine learning model makes predictions. It is about interrogating a model, gathering information on why a particular prediction (or series of predictions) was made, and then presenting this information back to humans in a comprehensible manner.

There are typically two situations you’ll be in when trying to explain how a model works:

  • Black Box — You have no access or information about the underlying model. The inputs and outputs of the model are all you can use to generate an explanation.
  • White Box — You have access to the underlying model so it’s easier to provide information about exactly why a certain prediction was made.

On the whole, “white box” models tend to be simpler in design, sometimes deliberately, so that explanations can be easily generated. The downside is that using a simpler, more interpretable model might fail to capture the complexity of the relationships in your data which means you could be faced with a tradeoff between interpretability and model performance.

When doing explainability, we’re typically interested in one of two things:

  • Model View — Overall, what features are more important than others to the model?
  • Instance View — For a particular prediction, what factors contributed?

The techniques used for explainability depend on whether your model is a black box or white box, whether you’re interested in the model view or instance view, and also depends on the type of data you’re exploring. The open source library Alibi does a great job of explaining these techniques in further detail.

Personally, I like to think of white-box models as “Interpretability” (because of the requirement for an interpretable model) and black-box models as “Explainability” (because we are attempting to explain the unknown). Sadly, however, there is no official definition and the words are often used interchangeably.

Photo by Piret Ilver on Unsplash


Fairness is by far the most subjective of the three terms. As we did for bias, let’s glance at its everyday definition before looking at how it’s applied in machine learning:

“impartial and just treatment or behaviour without favouritism or discrimination.”

Applying this to the context of machine learning, the definition I like to use is:

“An algorithm is fair if it makes predictions that do not favour or discriminate against certain individuals or groups based on sensitive characteristics.”

Most definitions you’ll see (including mine above) tend to narrow the scope to machine learning that affects humans. Typically this is where AI can have disastrous consequences, and so fairness is super important. Something like a mortgage approval or a healthcare diagnosis is such a life-changing event that it’s critical we handle predictions in a fair and responsible way.

You’re probably asking yourself “What’s a “sensitive characteristic” though?” which is a very good question. The interpretation of the definition depends heavily on what you class as sensitive. Some obvious examples tend to be things like race, gender, sexual orientation, disability, etc…

One approach is to just remove all “sensitive” attributes when building a model. This seems like a sensible thing to do at first but there are actually multiple issues with this:

  • The sensitive features might actually be critical to the model. Imagine you’re trying to predict the height a child will be when they are fully grown. Removing sensitive attributes like age and sex will make your predictions useless.
  • Fairness is not necessarily about being agnostic. Sometimes it’s important to include sensitive features in order to favor those who might be discriminated against in other features. An example of this is university admissions, where raw grades alone may not be the best way to find the brightest pupils. Those who had access to fewer resources or a lower quality of education might have had better scores otherwise.
  • Sensitive features might be hidden in other attributes. It is often possible to determine the values for sensitive features using a combination of non-sensitive ones. For example, an applicant’s full name might allow a machine learning model to infer their race, nationality, or gender.

The reality is that AI fairness is an incredibly difficult field. It requires policymakers to define what “fair” looks like for each use case which can sometimes be very subjective. Often there is also a trade-off between group fairness and individual fairness. Using the university admissions example from earlier, making your algorithm fairer for an underprivileged group who didn’t have the same educational resources (group fairness) comes at the cost of those who had a good educational background and whose grades are now no longer quite good enough (individual fairness).


In summary, bias, explainability, and fairness are not the same thing. Whilst trying to explain all or part of a machine learning model, you might find that the model contains bias. The existence of that bias might even mean that your model is unfair. That doesn’t, however, mean that explainability, bias, and fairness are the same thing.


Bias is a preference or prejudice against a particular group, individual, or feature and comes in many forms.

Explainability is the ability to explain how or why a model makes a predictions

Fairness is the subjective practice of using AI without favoritism or discrimination, particularly pertaining to humans

Bias vs Fairness vs Explainability in AI was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓