Join thousands of AI enthusiasts and experts at the Learn AI Community.

Publication

Latest

A Unique Way of Visualising Confusion Matrix — Sankey Chart

Last Updated on July 11, 2022 by Editorial Team

Author(s): Hrishikesh Patel

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

A Unique Way Of Visualising Confusion Matrix — Sankey Chart

Go Sankey for Less Confusion!

Image by the author

A confusion matrix in machine learning conveniently summarizes a model’s performance. However, when communicating with non-technical stakeholders, the confusion matrix might seem unintuitive 🤔. So what’s the fix — create a Sankey diagram.

Sankey diagram representing a binary confusion matrix (image by the author

The above image illustrates the Sankey diagram for a typical binary confusion matrix. In the diagram,

  • The rectangle boxes on the left show True classes whereas the right counterparts show Predicted classes.
  • The green color highlights correct classifications and the red color is for misclassifications.

Story outline

  1. What’s a Sankey diagram in a nutshell?
  2. How to Create a Sankey diagram from a Confusion Matrix?
  3. Bonus 🎁

What’s a Sankey diagram in a nutshell?

A Sankey diagram is used to visualize flow or connections from source to sink. Let’s understand its application with a simple example.

Consider we have a dataset of enrolments👨‍🎓👩‍🎓 in data science or business analytics courses in three universities🏫. Here the universities can be treated as the source and the courses as the sink. The number of enrolments indicates a connection from the source to the sink. Some of these connections can be heavier than others e.g. connection from University A to Data Science is heavier than its connection to business analytics.

Sankey diagram created from https://sankeymatic.com/build/

Sankey diagram for confusion matrix has the following components:

  • Source: True Classes
  • Target (Sink): Predicted Classes
  • Connection/flow: Number of instances

How to Create a Sankey diagram from a Confusion Matrix?

We’ll follow 3 steps approach as illustrated in the below image to create the Sankey diagram.

3 steps approach to plot Sankey diagram from Confusion Matrix (Image by the author)

Step-1: Get Confusion Matrix

In this step, we’ll generate a confusion matrix. This can be output from the sci-kit learn confusion_matrix function. For simplicity, we’ll use the following confusion matrix.

https://medium.com/media/4f13b9f93acc3b567d0f94aad0dcfcef/href

Step-2: Transform Confusion Matrix to DataFrame

We’ll divide this step into several small steps.

2.1 — Create a dataframe from the confusion matrix

https://medium.com/media/b29c7b97f64daa09cd7563647a174c32/href

2.2 — Restructure the dataframe

https://medium.com/media/f7abb513ba6b2479d5b3833d859d61ab/href

2.3 — Add a new column ‘color’

Now we’ll add a new column ‘color’ to highlight the truth of predictions. Here rgba(211,255,216,0.6) indicates the green color, which will highlight correct predictions. Whereas incorrect predictions will be highlighted in red color which is rgba(245,173,168,0.6) .

https://medium.com/media/c028985f3f631ac0eed4510f22a71f00/href

2.4 — Map source and target columns to a numeric index

https://medium.com/media/052a1f13d78abda924ccac0800f47172/href

Let’s add a new column for the text to show when we hover over the chart.

2.5 — Add New Column “tooltip”

https://medium.com/media/d87dabfe18b76cd5384092146a929a31/href

Now we are ready to plot the chart.

Step-3: Create a Sankey Chart

The plotting function go.Sankey takes two main arguments — node & link. Nodes are the classes i.e. True Class 1, Predicted Class 2, etc. and links are the connections/flow between True and Predicted Classes.

https://medium.com/media/7561a4b800899f97becbbca44a58ba6b/href

Bonus

Thanks for finishing all the steps😀 I know it’s a tedious process to go through the steps so why not create a function? Well, I have created a handy function to plot a Sankey chart for any confusion matrix (binary & multi-class).

Feel free to check out this notebook on GitHub to learn more about the function.

Before you go!

I hope you have enjoyed the story and found it useful. Follow me on Medium if you’d like more stories like this and subscribe to me to get my new stories directly into your inbox.

My other stories you might enjoy…


A Unique Way of Visualising Confusion Matrix — Sankey Chart was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓