Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Exploratory Data Analysis: Baby Steps
Data Analysis

Exploratory Data Analysis: Baby Steps

Last Updated on November 18, 2020 by Editorial Team

Author(s): Swetha Lakshmanan

Image for post

Steps in Data Exploration and Preprocessing:

Dataset:

Variable identification:

Image for post
Classification of Variables
Unique ID, disbursed_amount, asset_cost, ltv, Current_pincode_ID, PERFORM_CNS.SCORE, PERFORM_CNS.SCORE.DESCRIPTION, PRI.NO.OF.ACCTS, PRI.ACTIVE.ACCTS, PRI.OVERDUE.ACCTS, PRI.CURRENT.BALANCE, PRI.SANCTIONED.AMOUNT, PRI.DISBURSED.AMOUNT, NO.OF_INQUIRIES
branch_id, supplier_id, manufacturer_id, Date.of.Birth, Employment.Type, DisbursalDate, State_ID, Employee_code_ID, MobileNo_Avl_Flag, Aadhar_flag, PAN_flag, VoterID_flag, Driving_flag, Passport_flag, loan_default

Importing Libraries:

#importing libraries 
import pandas as pd 
import numpy as np
import matplotlib as plt 
import seaborn as sns 

Importing Dataset:

train = pd.read_csv("train.csv")

Identification of data types:

train.dtypes
Image for post
A snippet of output for the above code

Size of the dataset:

train.shape

Statistical Summary of Numeric Variables:

train.describe()
Image for post
A snippet of output for the above code

Non-Graphical Univariate Analysis:

To get the count of unique values:

train['loan_default'].value_counts()
Image for post

Image for post

To get the list & number of unique values:

train['branch_id'].nunique()
train['branch_id'].unique()
Image for post

Filtering based on Conditions:

train[(train['Employment.Type'] == "Salaried")]
Image for post
A snippet of output for the above code
train[(train['Employment.Type'] == "Salaried") & (train['branch_id'] == 100)]
Image for post
A snippet of output for the above code

Finding null values:

train.apply(lambda x: sum(x.isnull()),axis=0)
Image for post
A snippet of output for the above code
train['Date.of.Birth']= pd.to_datetime(train['Date.of.Birth'])
train['ltv'] = train['ltv'].astype('int64')

Graphical Univariate Analysis:

Histogram:

train['ltv'].hist(bins=25)
Image for post
train['asset_cost'].hist(bins=200)
Image for post

Box Plots:

Image for post
print(train.boxplot(column='disbursed_amount'))
Image for post
train.boxplot(column=’disbursed_amount’, by = ‘Employment.Type’)
Image for post
sns.boxplot(x=train['asset_cost'])
Image for post

Count Plots:

sns.countplot(train.loan_default)
Image for post
sns.countplot(train.manufacturer_id)
Image for post

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.


Comments are closed.