Join thousands of AI enthusiasts and experts at the Learn AI Community.

Publication

Latest

How to Improve Your Analytical Report With Conditional Formatting In Pandas

Last Updated on July 3, 2022 by Editorial Team

Author(s): Hrishikesh Patel

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

How To Improve Your Analytical Report With Conditional Formatting Inย Pandas

Excel-like Conditional Formatting in Pandas Using Stylerย API

Image byย author

Clearly communicating analytical insights with stakeholders is crucial for data scientists/analysts.

The use of conditional formatting in analytical reports can help in quickly identifying insights into a data frameย (table).

Letโ€™s start with an example first. The following pivot table shows the total sales of different products from 2016 toย 2022.

Pivot table showing total sales of products from 2016 to 2022โ€Šโ€”โ€Šcan you identify max sales in 2016? (image byย author)
  • Can you identify the largest selling ๐Ÿ’ฐ product in 2016?โ€Šโ€”โ€ŠYes it is Product_B with a total sale of 169 but itโ€™s difficult to identify just by looking๐Ÿ‘€ at theย table.

Now letโ€™s color the largest selling๐Ÿ’ฐ product for each year. After highlighting, it becomes so much easier to answer the above question, isnโ€™tย it?

Highlighting maximum selling products in each yearโ€Šโ€”โ€Šnow itโ€™s easy to identify Product_B was the largest-selling product of 2016 with sales of 169 (image byย author)

Let me show you how to do this inย Pandas.

Topics:

1. Highlight missing values
2. Highlight the maximum (or minimum) value in each row/column
3. Highlight values within a range
4. Plot in-column bar chart
5. Highlight values using a color gradient
6.ย Bonus๐ŸŽ

Note: I strongly recommend using the latest version of Pandas. You can run pip install –upgrade pandas to get Pandasโ€™ latest stableย release.

1. Highlight missingย values

Using dataframe.style.highlight_null() you can color null values as shown below. I stored the pivot table in the variable df_pivotedย .

Highlighting nan values in red using `.highlight_null` (image byย author)

Itโ€™s okay๐Ÿ˜€ if you donโ€™t prefer red. Letโ€™s customize the text and background color of missing values using the argument props=โ€™color:white;background-color:blackโ€™ย .

`props` argument allows customizing text and background color of highlights (image byย author)

After highlighting, we can quickly get the insight that Product_H was not sold inย 2018.

2. Highlight maximum (or minimum)ย values

To highlight maximum values in each column, you can use dataframe.style.highlight_max()ย . The method by default colors maximum values in each column as illustrated in the belowย image.

`.highlight_max` by default colors max values in each column (image byย author)

To color max values in each row, you can specify the argument axis=1ย .

Setting axis=1 in `.highlight_max` colors max values in each row (image byย author)

Note: Similarly you can use the method dataframe.style.highlight_min() with proper arguments to color minimum values in rows/columns.

3. Highlight values within aย range

Letโ€™s consider that we want to highlight values between 100 and 200โ€Šโ€”โ€Šitโ€™s quite easy to using dataframe.style.highlight_between(left, right)ย .

Values between 100 and 200 are highlighted in yellow using the `.highlight_between` method (image byย author)

4. Plot in-column barย chart

A bar chart plotted within a column can be visually appealing and useful. Such bar charts can be created using dataframe.style.bar() the method as shownย below.

Bar chart plotted within each column using the `dataframe.style.bar` method (image byย author)

Letโ€™s customize the bar chart to change its color andย size.

The customized bar charts in columns (image byย author)

5. Highlight values using a colorย gradient

What if you want to highlight the entire column with a color gradient. It can be done using dataframe.style.background_gradient() as depicted below. In the image, the color changes from red to green as the value increases. You can set subset=None to apply the gradient to the entire dataย frame.

Column โ€˜Product_Cโ€™ is colored using a gradient of red, yellow, and green colors (image byย author)

6. Bonusย ๐ŸŽ

How can we highlight min, max, and missing values together in the data? Well, you can define a function as illustrated below. The function highlights min, max, and nan values in the column โ€˜Product_Cโ€™. By setting subset=Noneย , it highlights the values in the entire data frame. Isnโ€™t this function really cool? Let me know your thoughts in the comments!

Defining and using a function to highlight minimum, maximum, and missing values in the data frame (image byย author)

Please feel free to explore highlighting methods in Pandas documentation.

Before youย go!

I hope you have enjoyed the story and found it useful. Follow me on Medium if youโ€™d like more stories like this and subscribe to me to get my new stories directly into yourย inbox.

My other stories you mightย enjoyโ€ฆ


How to Improve Your Analytical Report With Conditional Formatting In Pandas was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. Itโ€™s free, we donโ€™t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aย sponsor.

Published via Towards AI

Feedback โ†“