How to Efficiently Structure Your Data Processing Code
How to Efficiently Structure Your Data Processing Code

Last Updated on July 25, 2023

Author(s): Byron Dolon

Originally published on Towards AI.

An end-to-end example of pre-processing data using method chaining with the pipe method in Pandas

While a lot of attention is spent on making the machine-learning pipeline readable and reusable, it’s also important to make sure the same applies to your data pre-processing pipelines.

Before you even get into training a machine learning model, you’ll always need to do some exploratory data analysis, data cleaning, feature engineering, and other data transformation steps. All of this will get your data in a format ready for training a machine-learning model. Skipping this step can decrease the accuracy of the model you end up training.

