Encoding Categorical Data: A Step-by-Step Guide
Last Updated on September 3, 2024 by Editorial Team
Author(s): Souradip Pal
Originally published on Towards AI.
This member-only story is on us. Upgrade to access all of Medium.
Imagine youβre baking a cake, but instead of sugar, flour, and eggs, you have words like βvanilla,β βchocolate,β and βstrawberryβ on your countertop. As much as youβd like to start, thereβs a problem β your recipe can only follow numeric measurements, not words. This is exactly what happens when you try to feed categorical data into a machine-learning model. The model needs numbers to work its magic, not strings of text.
Image generated by Dall-EIn this hands-on tutorial, weβll unravel the mystery of encoding categorical data so your models can process it with ease. Weβll break down the types of categorical data, discuss when and why each encoding method is used, and dive into Python code examples that show exactly how to get the job done.
Before we start transforming data, letβs get our definitions straight. In the world of data, you generally have two types: numerical and categorical. Machine learning models can easily understand numbers β no surprise there! But when it comes to words or labels, we need to convert these into numbers to help our models βunderstandβ the data.
Ordinal Data:Ordinal data is like your favorite Netflix ranking list… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI