Productivity Prediction of Employees using Machine Learning Python

Last Updated on September 24, 2022 by Editorial Team

Author(s): Muttineni Sai Rohith

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Often in industries, it is important to analyze, track and predict the productivity of employees as the companies rely on the productivity and performance of their workers. Also, various factors play a key role in affecting the productivity of employees like incentives given, the domain in which they are working, working hours, day — as people often believe it plays a huge role, the team they are working in and many other features. As companies need good productivity of employees, they need to analyze and take care of these features.

In this article, we are going to predict the productivity of Employees based on various features.

Dataset

The Dataset used in this article is taken from Kaggle. We can find the dataset here. This Dataset consists of information on 1197 employees working in the Garment Industry. The features used in this Dataset are —

The dataset contains 1197 rows and 15 columns

import pandas as pd

df = pd.read_csv("garments_worker_productivity.csv")
df.head(5)

Attribute Information:

date: Date in MM-DD-YYYY

day: Day of the Week

quarter: A portion of the month. A month was divided into four quarters

department: Associated department with the instance

teamno: Associated team number with the instance

noofworkers: Number of workers in each team

noofstylechange: Number of changes in the style of a particular product

targetedproductivity: Targeted productivity set by the Authority for each team for each day.

smv: Standard Minute Value, it is the allocated time for a task

WIP: Work in progress. Includes the number of unfinished items for products

overtime: Represents the amount of overtime by each team in minutes

incentive: Represents the amount of financial incentive (in BDT) that enables or motivates a particular course of action.

idletime: The amount of time when the production was interrupted due to several reasons

idlemen: The number of workers who were idle due to production interruption

actual_productivity: The actual % of productivity that was delivered by the workers. It ranges from 0–1.

EDA

Let’s perform some Data Analysis

Convert date string column to Date object —

df["date"] = pd.to_datetime(df["date"])

Let’s see the types of departments —

df['department'].value_counts()

Here we can see that space in the finishing split it into two different categories. Now let’s merge them.

df['department'] = df['department'].apply(lambda x: 'finishing' if x.replace(" ","") == 'finishing' else 'sewing' )

df.department.value_counts().plot.pie(autopct='%.2f %%')

As we can see, 58% of employees work in sewing while 42% are in finishing.

Let’s compare the actual productivity and target productivity to see the performance of employees.

import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize = (15,5))

ax=sns.lineplot(y='targeted_productivity',x='date' ,color = "red", data =df,legend='brief')

ax=sns.lineplot(y= 'actual_productivity',x='date',data=df, color="green", legend = 'brief')

ax.set(ylabel = 'Productivity')

plt.show()

As we can see, the tradeoff is not that consistent, but overall productivity is on the line.

Now Let’s analyze whether the particular day of the week or team or department has any significant effect on productivity.

l = []

l1=[]

column_name = "day"

for i in list(df[column_name].unique()):

print( f"productivity on {i} is ",df[df[column_name] == i]["actual_productivity"].mean())

l.append(df[df[column_name] == i]["actual_productivity"].mean())

l1.append(i)

dictionary = {"data":l,"keys":l1}

sns.barplot( x = "keys" , y = "data", data = dictionary)

We can see productivity is constant across the number of days. Let’s repeat the same process for other features by replacing column_name with the targeted column name in the above code —

As we can see above, productivity does not depend on the team, category, Quarter, or day.

Let’s plot the correlation Matrix to see the amount of correlation —

corrMatrix = df.corr()

fig, ax = plt.subplots(figsize=(15,15)) # Sample figsize in inches

sns.heatmap(corrMatrix, annot=True, linewidths=.5, ax=ax)

plt.show()

So from these data, it is quite evident productivity mainly depends on the target productivity as having a target will motivate and boost the employees.

Let’s Prepare the final data and start the prediction.

Preprocessing Data

Let’s make some data cleaning and preprocessing before going for the prediction

df.date

So the data we have is for 3 months. In the data, we already have a day column, so having a month column will suffice instead of the complete date.

df['month']=df['date'].dt.month

df.drop(['date'],axis=1, inplace=True)

Now let’s see whether we have any missing values —

# This will Display the percentage of missing values per column
df.isnull().sum() / len(df) * 100

So we have only one column — wip and it has 42% missing values. As of now, Instead of filling it, let’s remove this column.

df.drop(['wip'],axis=1, inplace=True)

In the data, you can see a few non-numerical columns. So let’s encode them as most machine learning algorithms work only with numerical data.

Let’s encode the data with MultiColumnLabelEncoder —

!pip install MultiColumnLabelEncoder

Here we have used MultiColumnLabelEncoder as it is most helpful in inversing the encoding.

import MultiColumnLabelEncoder

Mcle = MultiColumnLabelEncoder.MultiColumnLabelEncoder()

df = Mcle.fit_transform(df)

So our Data is ready. Let’s split the data into independent and dependent columns —

x=df.drop(['actual_productivity'],axis=1)

y=df['actual_productivity']

Predicting the Productivity

Let’s predict productivity using regression algorithms in Python. Before that, let’s prepare training and testing data —

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.8,random_state=0)

Using LinearRegression

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score

model_lr=LinearRegression()

model_lr.fit(x_train,y_train)

pred_test=model_lr.predict(x_test)

print("test_MSE:",mean_squared_error(y_test, pred_test))
print("test_MAE:",mean_absolute_error(y_test, pred_test))
print("R2_score:{}".format(r2_score(y_test, pred_test)))

Let’s improve the performance using Random Forest Regression.

Using Random Forest Regressor

from sklearn.ensemble import RandomForestRegressor

model_rfe = RandomForestRegressor(n_estimators=200,max_depth=5)

model_rfe.fit(x_train, y_train)

pred = model_rfe.predict(x_test)
print("test_MSE:",mean_squared_error(y_test, pred))
print("test_MAE:",mean_absolute_error(y_test, pred))
print("R2_score:{}".format(r2_score(y_test, pred)))

using XGBoost

import xgboost as xgb

model_xgb = xgb.XGBRegressor(n_estimators=200, max_depth=5,                          learning_rate=0.1)

model_xgb.fit(x_train, y_train)

pred3=model_xgb.predict(x_test)

print("test_MSE:",mean_squared_error(y_test, pred3))
print("test_MAE:",mean_absolute_error(y_test, pred3))
print("R2_score:{}".format(r2_score(y_test, pred3)))

So we have achieved 0.07 — Mean Absolute Error and 0.01 Mean Square error which says our model is performing very well.

So Out of all algorithms, XGBoost has performed well. In this way, we can predict the productivity of employees.

Happy Coding…….

Productivity Prediction of Employees using Machine Learning Python was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Productivity Prediction of Employees using Machine Learning Python

Author(s): Muttineni Sai Rohith

Dataset

EDA

Preprocessing Data

Predicting the Productivity

Using LinearRegression

Using Random Forest Regressor

using XGBoost

Towards AI Team

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

From Code to Conversation: The Rise of Seamless MLOps-DevOps Fusion in Large Language Models

Why Most Task Automation Fails — and How AI Agents Can Fix It

Exploring Deep Learning Models: Comparing ANN vs CNN for Image Recognition

LAI #72: From Python Groundwork to Function Calling, ICL Theory, and Load Balancing MoEs

Quantum AI Is Coming. Here’s What No One Is Telling You (But Should)

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Productivity Prediction of Employees using Machine Learning Python

Author(s): Muttineni Sai Rohith

Dataset

EDA

Preprocessing Data

Predicting the Productivity

Using LinearRegression

Using Random Forest Regressor

using XGBoost

Towards AI Team

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥