# Design a Multi-Layer Perceptron (MLP) Neural Network for Classification

Last Updated on May 14, 2024 by Editorial Team

**Author(s): Ayo Akinkugbe**

Originally published on Towards AI.

## Overview

This project solves a classification problem with a multilayer perceptron designed from the ground up. The model is used to predict if a customer is likely to exit a bank service subscription. Below are highlights covered in each section:

- Introduction
- Model Architecture
- Dataset
- Code Implementation
- Model Evaluation using a Confusion matrix, Accuracy, Precision, Recall and F1-Score
- Model Comparison Using Scikit Learn

## Introduction

For a Perceptron, inputs are combined with weights and biases to derive a weighted sum. The calculated weighted sum is passed through a linear activation function or a step function to generate an output. However a single perceptron architecture isn’t scalable for a lot of real problems. In fact, Marvin Minsky and Seymour Papert highlight in their 1969 book titled, *Perceptrons: an Introduction to Computational Geometry* that this type of architecture found in a simple perceptron can only solve linearly separable problems. Most real-world problems aren’t linearly separable.

A multilayer perceptron provides the nuance required to solve more complex problems and find patterns in data that are not linearly separable. The default design of a neural network includes:

- an input layer — layer containing preprocessed feature data
- hidden layer(s) — the hidden layer contains neurons that ingest weighted inputs and produce an output using an activation function
- output layer — layer containing the desired prediction. For classification problems, predictions are often probabilities or numbers that depict the likelihood of occurrence. They are further encoded to the desired output based on a threshold or maximum. For example — using
*np.argmax*on the output matrix for a multi-class neural network prediction produces the index label of the maximum value in the output matrix.

Unlike the Perceptron, the hidden layers in a multilayer perceptron uses an non linear activation function. Some examples of non-linear activation function include the Sigmoid function (used in this case), Rectified Linear Unit(ReLU), Leaky ReLU and Softmax.

## Architecture

This project uses a fully connected MLP architecture. In a fully connected MLP, also known as a dense MLP, each neuron in one layer is connected to every neuron in the next layer. This type of architecture allows for complex nonlinear mappings but runs the risk of overfitting with large datasets.

The MLP network in this case has an input layer, 2 hidden layers and an output layer. For the forward pass, the activation function used in each layer and neuron is the Sigmoid function.

The Sigmoid function takes in *x, *which is the weighted sum of the input for the neuron in every case.

Backpropagation is not used in this case. For optimization of weights and biases, Cross Entropy Loss is leveraged as an objective function. The network uses a total of 21 weights and 3 biases. The output y is a number between 0 and 1. A threshold function is used in the implementation to convert y to desired output.

## Dataset

This project uses the customer churn dataset for a bank referred to as *ABC multi-state bank*. Each row from the dataset represents details of customers at the bank. Originally the dataset has 11 features and 1 label. The features are reduced to 3 (*Tenure*, *NumOfProducts* , *HasCrCard*). The label *Exited* is 1 if the customer stops using the bank subscription. It is 0 if the customer is still a customer of the bank. The task with this dataset is to predict if a customer would stay or leave the bank given the 3 features selected. Learn more about the dataset here

## Implementation

*This section delineates the Python code implementation of the MLP build process. The neural network is built from scratch using only numpy library and compared with results from Scikit learn library.*

*A copy of the code and data files for this project can be found **here**.*

To implement MLP design for classification with Python*:*

## Step 1 — Import and Process Data

The first step of the process involves importing and preprocessing the data. In this phase, features for predictions are selected. Also the data is transformed into a *numpy* array to allow for easier selection and computation in the network,

`# Import required python libraries `

import numpy as np

import pandas as pd

from scipy.optimize import minimize

`# Read data from csv file`

sample_data = pd.read_csv('CustomerChurn.csv')

sample_data

`# make a copy of the data`

data = sample_data.copy()

# choose data features for prediction

data = data[['tenure','products_number','credit_card','churn']]

`# convert data to numpy array`

data = data.values

data

`# Split data columns into Features (X) and Label (Y)`

X = data[:, 0:3]

Y = data[:, -1]

## Step 2 — Create Forward Pass

This step computes each layer of the network. The weighted sum of the inputs from one layer is passed to the next. Each neuron in the hidden layers is activated using the Sigmoid function.

The output *y* is a value between the range of 0 and 1.

`# define sigmoid function`

def sigmoid(x):

return 1 / (1 + np.exp(-x))

def output(inputs, weights):

# Extracting weights for layers

w11, w12, w13, w21, w22, w23, w31, w32, w33,w41, w42, w43,w51, w52, w53,w61, w62, w63,w4, w5,w6, b1, b2, b3 = weights

x1, x2, x3 = inputs.T

# Hidden layer

h1 = sigmoid(w11 * x1 + w12 * x2 + w13 * x3 + b1)

h2 = sigmoid(w21 * x1 + w22 * x2 + w23 * x3 + b1)

h3 = sigmoid(w31 * x1 + w32 * x2 + w33 * x3 + b1)

h4 = sigmoid(w41 * h1 + w42 * h2 + w43 * h3 + b2)

h5 = sigmoid(w51 * h1 + w52 * h2 + w53 * h3 + b2)

h6 = sigmoid(w61 * h1 + w62 * h2 + w63 * h3 + b2)

y = sigmoid(w4 * h4 + w5 * h5 + w6 * h6 + b3)

return y

## Step 3— Create Objective Function (Cross Entropy)

This architecture does not optimize using back propagation. Instead Cross entropy loss function is leveraged to find the optimal weights and biases. The function takes in two sets of values — the predicted labels (y — output from the forward pass) and the true labels (Y).

`# Objective function (Cross Entropy)`

def cross_ent(weights):

predictions = output(X, weights)

return -np.mean(Y * (np.log(predictions)) + ((1 - Y) * np.log(1 - predictions)))

## Step 4— Initialize Weights and Biases

This step randomly initializes the first set of weights and biases to be passed through the objective function. Additionally, the *minimize* function from the *scipy.optimize *library is used to minimize the objective function to return optimized weights.

`initial_weights = np.random.rand(24)`

# Optimizing the weights

result = minimize(cross_ent, initial_weights, method='BFGS')

# Optimized weights

optimized_weights = result.x

optimized_weights, result.fun

## Step 5—Make Predictions

This step generates predictions using inputs *X* and optimized weights ( output from minimizing the objective function)

`predictions = output(X, optimized_weights)`

predictions

## Step 6 — Select a Threshold and Convert Predictions to Classes

The predictions from the *output *function are numbers between 0 and 1. This step inputs a threshold (in this case -0.5) which checks if the predictions are above or below the threshold and outputs 0 or 1.

`t = 0.5`

Y_Pred = (predictions >= t).astype(int)

Y_Pred

## Evaluation

This section evaluates the model using core classification metrics. The metrics considered for evaluation include:

**Confusion matrix**containing the true positive (TP) — predictions classified as True that are actually True by the model, true negative (TN) — predictions classified as False that are actually False, false positive (FP) — predictions classified as True that are actually False and false negative(FN) — predictions classified as False that are actually True by the model.**Accuracy**= (TP + TN) / (TP + TN + FP + FN) — This is the percentage of correct predictions the model makes.**Precision**= TP / (TP + FP) — This is the percentage of right positive predictions the model makes.**Recall**= TP / (TP + FN) — This says often the model is able to identify the right instance**F1 Score**= 2 x ((precision x recall) / (precision + recall)) — This measures the harmonic mean of precision and recall indicating a well balanced model for a high score.

For the project use case — since the label *Exited* is 1 if the customer stopped being a customer and 0 if the customer is still a customer of the bank — TN + FN is the sum of 1s and TP + FP equals the sum of 0s

## Model Evaluation

# Create confusion matrix

conf_matrix = np.zeros((2, 2))

for i in range(len(Y)):

conf_matrix[Y[i], Y_Pred[i]] += 1

conf_matrix

Confusion matrix:

`TP = conf_matrix[0, 0] # True Positives`

TN = conf_matrix[1, 1] # True Negatives

FP = conf_matrix[0, 1] # False Positives

FN = conf_matrix[1, 0] # False Negatives

# Calculate accuracy

accuracy = (TP + TN) / np.sum(conf_matrix)

Accuracy: 81.97%

`# Calculate Precision`

precision = TP / (TP + FP)

precision

Precision: 99.42%

`# Calculate Recall`

recall = TP / (TP + FN)

recall

Recall: 81.83%

`# Calculate F1 Score`

f1_score = 2 * ((precision * recall) / (precision + recall))

f1_score

F1 Score: 89.78%

## Comparison

## Using Scikit Learn Library

This section implements the same type of neural network using the scikit learn library. The same metrics as in the homegrown model are observed.

`from sklearn.model_selection import train_test_split`

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix

from sklearn.preprocessing import StandardScaler

# It's a good practice to scale the data for neural network training

scaler = StandardScaler()

X = scaler.fit_transform(X)

# Create a neural network model

# Architecture - one hidden layer with 100 neurons (this is the default setting)

mlp = MLPClassifier(hidden_layer_sizes=(4,3,3,3), activation='tanh', solver='adam', max_iter=500, random_state=42)

# Train the model

mlp.fit(X, Y)

# Predict on the test set

predictions = mlp.predict(X)

# Evaluate the model

cm = confusion_matrix(Y, predictions)

print("Confusion Matrix:")

print(cm)

`scikit_accuracy= accuracy_score(Y, predictions)`

scikit_accuracy

Scikit Accuracy = 81.97%

`scikit_precision = TP / (TP + FP)`

scikit_precision

Scikit Precision: 99.42%

`scikit_recall = TP / (TP + FN)`

scikit_recall

Scikit Recall: 81.83%

`scikit_f1= 2 * ((scikit_precision * scikit_recall) / (scikit_precision + scikit_recall))`

scikit_f1

Scikit F1 Score: 89.78%

## Conclusion

This project builds a 2 — layer MLP from scratch using Sigmoid as an activation function for all layers. It does not use back propagation but leverages Cross entropy as an optimizer. This project is entirely experimental. The model is further used to predict customer churn for a bank achieving same classification metrics as the *Scikit learn* library MLP model.

For more on Neural Networks 🧠, Check out other posts in this series:

## Neural Networks

View list2 stories

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI