
You're reading for free via Towards AI Editorial Team's Friend Link. Become a member to access the best of Medium.
Member-only story
DATA SCIENCE, EDITORIAL, PROGRAMMING
Handling Missing Values in Pandas
A hands-on visual tutorial on how to detect and handle missing data in pandas
Author(s): Pratik Shukla, Roberto Iriondo
The most crucial and time-consuming part of any data science project is data cleansing and preparation. Thankfully, there are many powerful tools available that help us expedite this process.
The pandas’ library is one of the widely used data analysis libraries in python. Before using our models to perform data analysis on our data, it is critical to find any missing values that may affect our outputs.
Missing data occurs when a user being surveyed does not share their data. This tutorial will dive into a few methods that will help us identify and remove such missing data with the help of pandas.
The companion materials for this tutorial can be found under our resources section.
Table of Contents:
pd.isna( ):
We use the pd.isna()
function of the pandas' library to detect missing values for an array-like object. Let us first see the syntax of pd.isna()
and understand it with examples.

Before we move on to understand how the pd.isna()
function works, let us first import some required libraries.

A. Example — 1:

B. Example — 2:

C. Example — 3:
Please note that empty strings are not considered NA values. That is why the output of pd.isna(“ ”)
will be False.

D. Example — 4:

E. Example — 5:

F. Example — 6:

G. Example — 7:

H. Example — 8:

N-Dimensional Arrays:
I. Example — 9:

J. Example — 10:

Index Values:
K. Example — 11:

L. Example — 11:

Pandas Series:
M. Example — 13:

Pandas DataFrame:
N. Example — 14:

pd.notna( ):
The pd.notna()
function of the pandas' library is used to detect non-missing or valid values for an array-like object. Please note that pd.notna()
is the boolean inverse of pd.isna()
function of the pandas' library. Let us first see the syntax of pd.isna()
and understand it with examples.

Before we move on to understand how the pd.isna()
function works, let us first import some required libraries.

A. Example — 1:

B. Example — 2:

C. Example — 3:

D. Example — 4:

E. Example — 5:

F. Example — 6:

G. Example — 7:

H. Example — 8:

I. Example — 9:

J. Example — 10:

K. Example — 11:

L. Example — 12:

M. Example — 13:

N. Example — 14:

Important Note:
- The
pd.isnull()
function is an alias ofpd.isna()
function. It will give us the exact same results. It is recommended to use thepd.isna()
function instead ofpd.isnull()
function.

2. The pd.notnull()
function is an alias of pdnotna()
function. It will give us the same results. It is recommended to use thepd.notna()
function instead of pd.notnull()
function.

To replicate this tutorial, please run the Google Colab notebook.
pd.dropna( ):
We use the pd.dropna()
function of the pandas' library to remove missing values. Let us first see its syntax and parameters to have a better idea about its functionalities.

Let’s take a few examples to understand how exactly the parameters of pd.dropna()
function affects the output.
Before diving deeper into the pd.dropna()
function, let us first create a DataFrame to work with.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
If we do not specify any parameter for the pd.dropna()
function, it will delete all the rows with at least one missing element.
Parameters Used:
None

Python Implementation:

C. Example — 2:
If we specify the parameter axis=0
, it will delete all the rows with at least one missing element. This deletion is the default behavior.
Parameters Used:
axis = 0

Python Implementation:

D. Example — 3:
Instead of specifying axis=0
, we can also specify axis="row”
as a parameter. It will work the same way.
Parameters Used:
axis = “rows”

Python Implementation:

E. Example — 4:
If we specify the parameter axis=1
, it will delete all the columns with at least one missing element.
Parameters Used:
axis = 1

Python Implementation:

F. Example — 5:
Instead of specifying axis=1
, we can also specify axis=“columns”
as a parameter. It will work the same way.
Parameters Used:
axis = “columns”

Python Implementation:

G. Example — 6:
If we specify how="any”
as a parameter, it will remove rows with at least one missing element. In short, it will remove the rows if it has any
missing elements. If we want to perform this operation on columns, we have to use the axis parameter.
Parameters Used:
how = “any”

Python Implementation:

H. Creating a DataFrame:

Python Implementation:


I. Example — 7:
If we specify how="all”
as a parameter, it will remove rows in which all elements are missing. In short, it will remove the rows if it has all
missing elements. If we want to perform this operation on columns, we have to use the axis parameter.
Parameters Used:
how = “all”

Python Implementation:

J. Example — 8:
If we specify the thresh
parameter, it will only keep the rows that have non-missing elements of at least the number specified by the thresh
parameter. In the following example, we can see that we have specified thresh=5
, it means that it will keep only those rows that have 5 non-missing elements.
Parameters Used:
thresh = 5

Python Implementation:

K. Create a DataFrame:

Python Implementation:


L. Example — 9:
If we only want to consider a subset of columns to find and drop the missing elements, we can use the subset
parameter to specify the column names in which we want to look for the missing elements. In the following example, it will only look in “Person”, “Degree”, “Country”
columns to find the missing values. Missing values in other columns will not affect the final output.
Parameters Used:
subset = [“Person”, “Degree”, “Country”]

Python Implementation:

M. Create a DataFrame:

Python Implementation:


N. Example — 10:
If we want the changes to occur in our original DataFrame, we have to specify inplace=True
as a parameter. Note that it will not return anything. After execution, the original DataFrame will be modified by the result of the pd.dropna()
function.
Parameters Used:
inplace = True

Python Implementation:

pd.fillna( ):
The pd.fillna()
function of the pandas' library is used to fill the missing values using a specific method. Let us first see its syntax and parameters to understand it in a better way.

Let’s take a few examples to understand how the parameter values affect the output.
Before we dive deeper into the pd.fillna()
function, let us first create a DataFrame to work with.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
We can use the value
parameter to specify by which value we want to fill the missing elements. In the following example, we are specifying value=0
So it will fill all the missing elements with 0.
Parameters Used:
value = 0

Python Implementation:


C. Example — 2:
We can also specify different values to fill the missing elements for different columns by using the value
parameter. The following example demonstrates how we can perform this operation.
Parameters Used:
value = dictionary

Python Implementation:


D. Example — 3:
To fill the missing elements, we can use the method
parameter. If we specify method=”ffill”
, it will use the last valid observation to fill the gap. If we do not specify the axis value, it will perform the operation row-wise or with axis=0. Please note that there is no limit to propagate the last valid observation to fill the gaps. If there are multiple consecutive missing elements, they will get filled by the last valid observation.
Important Note:
If we specify method=”ffill”
and the axis=0
, and if the elements in the first row are missing, they will never get filled.
Parameters Used:
method = “ffill”

Python Implementation:


E. Example — 4:
If we specify method=”pad”
, it works the same way as method=”ffill”
.
Parameters Used:
method = “pad”

Python Implementation:


F. Example — 5:
By default, the missing elements will be filled row-wise or with axis=0.
Important Note:
If we specify method=”ffill”
and the axis=0
, then if the elements in the first row are missing, they will never get filled.
Parameters Used:
method = “ffill”
axis = 0

Python Implementation:


G. Example — 6:
In some cases, if we want to fill missing the elements column-wise, we can specify the axis
parameter and set axis=1
.
Important Note:
If we specify method=”ffill”
and the axis=1
, then if the elements in the first column are missing, they will never get filled.
Parameters Used:
method = “ffill”
axis = 1

Python Implementation:


H. Example — 7:
To fill the missing elements, we can use the method
parameter. If we specify method=”bfill”
, it will use the next valid observation to fill the gap. If we do not specify the axis value, it will perform the operation row-wise or with axis=0. Please note that there is no limit to propagate the next valid observation to fill the gaps. If there are multiple consecutive missing elements, they will get filled by the next valid observation.
Important Note:
If we specify method=”bfill”
and the axis=0
, then if the elements in the last row are missing, they will never get filled.
Parameters Used:
method = “bfill”
axis = 0

Python Implementation:


I. Example — 8:
If we specify method=”backfill”
, it works the same way as method=”bfill”
.
Parameters Used:
method = “backfill”

Python Implementation:


J. Example — 9:
By default, the missing elements will be filled row-wise or with axis=0.
Important Note:
If we specify method=”bfill”
and the axis=0
, then if the elements in the last row are missing, they will never get filled.
Parameters Used:
method = “bfill”
axis = 0

Python Implementation:


K. Example — 10:
In some cases, if we want to fill missing the elements column-wise, we can specify the axis
parameter and set axis=1
.
Important Note:
If we specify method=”bfill”
and the axis=1
, then if the elements in the last column are missing, they will never get filled.
Parameters Used:
method = “ffill”
axis = 1

Python Implementation:


L. Example — 11:
If we specify the limit
parameter, it will restrict the maximum number of consecutive missing values to be filled in forward or backward fill methods. We can say that if the gap of consecutive missing elements is more than the number specified by the limit
parameter, it will only be filled partially. Here we are using the fill forward method with axis=0 and a limit of 1 element.
Parameters Used:
method = “ffill”
axis = 0
limit = 1

Python Implementation:


M. Example — 12:
In this example, we will use the fill forward method with axis=1 and a limit of 1 element.
Parameters Used:
method = “ffill”
axis = 1
limit = 1

Python Implementation:


N. Example — 13:
In this example, we will use the backward fill method with axis=0 and a limit of 1 element.
Parameters Used:
method = “bfill”
axis = 0
limit = 1

Python Implementation:


O. Example — 12:
In this example, we will use the backward fill method with axis=1 and a limit of 1 element.
Parameters Used:
method = “bfill”
axis = 1
limit = 1

Python Implementation:


P. Creating a DataFrame:

Python Implementation:



Q. Example — 13:
We can use the downcast
parameter to downcast the datatype if possible. The string value “infer”
will try to downcast to an appropriate equal type. For example, float64 to int64.
Parameters Used:
downcast = infer

Python Implementation:

R. Example — 14:
If we want the changes to take place in our original DataFrame, then we have to specify inplace=True
as a parameter. Note that it will not return anything. After execution, the original DataFrame will be modified by the result of pd.dropna()
function.
Parameters Used:
inplace = True

Python Implementation:


pd.DataFrame.bfill( ):
The pd.DataFrame.bfill()
function works exactly the same way as the pd.fillna()
function works with the parameter method=”bfill”.

Let us take an example to understand it.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
Parameters Used:
None

Python Implementation:


pd.DataFrame.backfill( ):
The pd.DataFrame.backfill()
function works the same way as the pd.fillna()
function works with the parameter method=”backfill”.

Let us take an example to understand how it works.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
Parameters Used:
None

Python Implementation:


pd.DataFrame.ffill( ):
The pd.DataFrame.ffill()
function works exactly the same way as the pd.fillna()
function works with the parameter method=”ffill”.

Let’s take an example to understand it better.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
Parameters Used:
None

Python Implementation:


pd.DataFrame.pad( ):
The pd.DataFrame.pad()
function works the same way as the pd.fillna()
function works with the parameter method=”pad”.

Let’s take an example to understand it better.
A. Create a DataFrame:

Python Implementation:


B. Example — 1:
Parameters Used:
None

Python Implementation:


Closing Remarks:
We hope you enjoyed reading this piece and learned something new about handling missing data.

DISCLAIMER: The views expressed in this article are those of the author(s) and do not represent the views of any company (directly or indirectly) associated with the author(s). This work does not intend to be a final product, yet rather a reflection of current thinking, along with being a catalyst for discussion and improvement.
All images are from the author(s) unless stated otherwise.
Published via Towards AI
Resources
References
- “Pandas.Dataframe.Backfill — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.backfill.html.
- “Pandas.Dataframe.Dropna — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html.
- “Pandas.Dataframe.Pad — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pad.html.
- “Pandas.Dataframe.Notnull — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.notnull.html.
- “Pandas.Dataframe.Notna — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.notna.html.
- “Pandas.Dataframe.Isnull — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isnull.html.
- “Pandas.Dataframe.Isna — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isna.html.
- “Pandas.Dataframe.Fillna — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.html.
- “Pandas.Dataframe.Ffill — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.ffill.html.
- “Pandas.Dataframe.Dropna — Pandas 1.2.4 Documentation”. 2021. Pandas.Pydata.Org. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html.