Applying Classification Algorithms to Past Loan Data
Last Updated on July 5, 2022 by Editorial Team
Author(s): Gencay I.
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
KNN, Decision Tree, Support Vector Machine, Logistic Regression
In this data set, I am going to conduct classification machine learning analysis on past loan data whichΒ are;
- K Nearest Neighbor(KNN)
- Decision Tree
- Support VectorΒ Machine
- Logistic Regression
Content Table
Β· Data Visualization
Β· One hot encoding
Β· Feature Selection
Β· Normalize Data
Β· Classification
β K Nearest Neighbor
β Evaluation Metrics of KNN
β Decision Tree
β Evaluation Metrics of Decision Tree
β Support Vector Machine
β Evaluation Metrics of SVM
β Logistic Regression
β Evaluation Metrics of Logistic Regression
β Model Evaluation using a Test set
β Jaccard Scores
β F1 Scores
β Final Evaluation
Let's load the necessary libraries;
The Loan_train.csv data set includes details of 346 customers whose loans are already paid off or defaulted.
Lets loadΒ data;
It is always efficient to look shape of data, to see the bigΒ picture.
Now let's fix the data frames columnΒ type.
Data Visualization
Let's see how many of each class is in our dataΒ set
Let's plot some columns to understand better
Let's look at the day of week people get theΒ loan
We see that people who get the loan at the end of the week don't pay it off, so let's use Feature binarization to set threshold values less than dayΒ 4
Now it is time to change categorical features to numerical because we will use machine learning algorithms.
86 % of females pay their loans while only 73 % of males pay theirΒ loan
Let's convert male to 0 and female toΒ 1:
One hotΒ encoding
Now letβs look education column.
We use dummies to transform education from categorical to numerical.
Feature Selection
Letβs define features;
Now it is time to define ourΒ label;
Normalize Data
Classification
These are the classification techniques that I will use in thisΒ Dataset.
- K Nearest Neighbor(KNN)
- Decision Tree
- Support VectorΒ Machine
- Logistic Regression
K NearestΒ Neighbor
Now it is time to split train and test data, as usual, 0.2β0.8Β portion.
Now it is time to look into the accuracy of test and trainΒ data.
To define bestΒ K;
As we can see result 7 is the best K for ourΒ data.
Evaluation Metrics ofΒ KNN
Decision Tree
Now let's try using Decision Tree algorithms.
To define the best of theΒ depth;
5 is the best depth score according to accuracyΒ scores.
Letβs conduct our algorithm then and evaluate;
Evaluation Metrics of DecisionΒ Tree
Support VectorΒ Machine
Now letβs useΒ SVM.
To find out the best model inΒ SVM;
Evaluation Metrics ofΒ SVM
Logistic Regression
Now it is time to use Logistic Regression.
Lets lock andΒ load;
Train-test split;
Find the bestΒ solver;
Evaluation Metrics of Logistic Regression
Model Evaluation using a TestΒ set
Data processing;
Jaccard Scores
F1 Scores
Final Evaluation
Thanks, IBM for Machine Learning Tutorial which gets meΒ there.
Applying Classification Algorithms to Past Loan Data was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI