Our terms of service are changing. Learn more.

Publication

Latest

Applying Classification Algorithms to Past Loan Data

Last Updated on July 5, 2022 by Editorial Team

Author(s): Gencay I.

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

KNN, Decision Tree, Support Vector Machine, Logistic Regression

Photo by Scott Graham on Unsplash

In this data set, I am going to conduct classification machine learning analysis on past loan data which are;

Content Table
· Data Visualization
· One hot encoding
· Feature Selection
· Normalize Data
· Classification
K Nearest Neighbor
Evaluation Metrics of KNN
Decision Tree
Evaluation Metrics of Decision Tree
Support Vector Machine
Evaluation Metrics of SVM
Logistic Regression
Evaluation Metrics of Logistic Regression
Model Evaluation using a Test set
Jaccard Scores
F1 Scores
Final Evaluation

Let's load the necessary libraries;

Image by Author

The Loan_train.csv data set includes details of 346 customers whose loans are already paid off or defaulted.

Image by Author

Lets load data;

Image by Author

It is always efficient to look shape of data, to see the big picture.

Image by Author

Now let's fix the data frames column type.

Image by Author

Data Visualization

Let's see how many of each class is in our data set

Image by Author

Let's plot some columns to understand better

Image by Author
Image by Author

Let's look at the day of week people get the loan

Image by Author

We see that people who get the loan at the end of the week don't pay it off, so let's use Feature binarization to set threshold values less than day 4

Image by Author

Now it is time to change categorical features to numerical because we will use machine learning algorithms.

Image by Author

86 % of females pay their loans while only 73 % of males pay their loan

Let's convert male to 0 and female to 1:

Image by Author

One hot encoding

Now let’s look education column.

Image by Author
Image by Author- These are the features that we’re gonna use in our prediction.

We use dummies to transform education from categorical to numerical.

Image by Author

Feature Selection

Let’s define features;

Image by Author

Now it is time to define our label;

Image by Author

Normalize Data

Image by Author

Classification

These are the classification techniques that I will use in this Dataset.

  • K Nearest Neighbor(KNN)
  • Decision Tree
  • Support Vector Machine
  • Logistic Regression

K Nearest Neighbor

Now it is time to split train and test data, as usual, 0.2–0.8 portion.

Image by Author
Image by Author
Image by Author

Now it is time to look into the accuracy of test and train data.

Image by Author

To define best K;

Image by Author

As we can see result 7 is the best K for our data.

Image by Author
Image by Author
Image by Author- Fit the Model

Evaluation Metrics of KNN

Image by Author

Decision Tree

Now let's try using Decision Tree algorithms.

Image by Author
Image by Author

To define the best of the depth;

Image by Author

5 is the best depth score according to accuracy scores.

Image by Author

Let’s conduct our algorithm then and evaluate;

Evaluation Metrics of Decision Tree

Image by Author

Support Vector Machine

Now let’s use SVM.

Image by Author

To find out the best model in SVM;

Image by Author
Image by Author
Image by Author

Evaluation Metrics of SVM

Image by Author

Logistic Regression

Now it is time to use Logistic Regression.

Lets lock and load;

Image by Author

Train-test split;

Image by Author

Find the best solver;

Image by Author
Image by Author

Evaluation Metrics of Logistic Regression

Image by Author

Model Evaluation using a Test set

Image by Author
Image by Author

Data processing;

Image by Author

Jaccard Scores

Image by Author
Image by Author
Image by Author
Image by Author

F1 Scores

Image by Author
Image by Author
Image by Author
Image by Author

Final Evaluation

Image by Author

Thanks, IBM for Machine Learning Tutorial which gets me there.


Applying Classification Algorithms to Past Loan Data was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓