Productivity Prediction of Employees using Machine Learning Python
Last Updated on September 24, 2022 by Editorial Team
Author(s): Muttineni Sai Rohith
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
Often in industries, it is important to analyze, track and predict the productivity of employees as the companies rely on the productivity and performance of their workers. Also, various factors play a key role in affecting the productivity of employees like incentives given, the domain in which they are working, working hours, dayβββas people often believe it plays a huge role, the team they are working in and many other features. As companies need good productivity of employees, they need to analyze and take care of these features.
In this article, we are going to predict the productivity of Employees based on various features.
Dataset
The Dataset used in this article is taken from Kaggle. We can find the dataset here. This Dataset consists of information on 1197 employees working in the Garment Industry. The features used in this Dataset areΒ β
The dataset contains 1197 rows and 15Β columns
import pandas as pd
df = pd.read_csv("garments_worker_productivity.csv")
df.head(5)
Attribute Information:
date: Date in MM-DD-YYYY
day: Day of theΒ Week
quarter: A portion of the month. A month was divided into fourΒ quarters
department: Associated department with theΒ instance
teamno: Associated team number with theΒ instance
noofworkers: Number of workers in eachΒ team
noofstylechange: Number of changes in the style of a particular product
targetedproductivity: Targeted productivity set by the Authority for each team for eachΒ day.
smv: Standard Minute Value, it is the allocated time for aΒ task
WIP: Work in progress. Includes the number of unfinished items forΒ products
overtime: Represents the amount of overtime by each team inΒ minutes
incentive: Represents the amount of financial incentive (in BDT) that enables or motivates a particular course ofΒ action.
idletime: The amount of time when the production was interrupted due to severalΒ reasons
idlemen: The number of workers who were idle due to production interruption
actual_productivity: The actual % of productivity that was delivered by the workers. It ranges fromΒ 0β1.
EDA
Letβs perform some DataΒ Analysis
Convert date string column to Date objectΒ β
df["date"] = pd.to_datetime(df["date"])
Letβs see the types of departments β
df['department'].value_counts()
Here we can see that space in the finishing split it into two different categories. Now letβs mergeΒ them.
df['department'] = df['department'].apply(lambda x: 'finishing' if x.replace(" ","") == 'finishing' else 'sewing' )
df.department.value_counts().plot.pie(autopct='%.2f %%')
As we can see, 58% of employees work in sewing while 42% are in finishing.
Letβs compare the actual productivity and target productivity to see the performance of employees.
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize = (15,5))
ax=sns.lineplot(y='targeted_productivity',x='date' ,color = "red", data =df,legend='brief')
ax=sns.lineplot(y= 'actual_productivity',x='date',data=df, color="green", legend = 'brief')
ax.set(ylabel = 'Productivity')
plt.show()
As we can see, the tradeoff is not that consistent, but overall productivity is on theΒ line.
Now Letβs analyze whether the particular day of the week or team or department has any significant effect on productivity.
l = []
l1=[]
column_name = "day"
for i in list(df[column_name].unique()):
print( f"productivity on {i} is ",df[df[column_name] == i]["actual_productivity"].mean())
l.append(df[df[column_name] == i]["actual_productivity"].mean())
l1.append(i)
dictionary = {"data":l,"keys":l1}
sns.barplot( x = "keys" , y = "data", data = dictionary)
We can see productivity is constant across the number of days. Letβs repeat the same process for other features by replacing column_name with the targeted column name in the above codeΒ β
As we can see above, productivity does not depend on the team, category, Quarter, orΒ day.
Letβs plot the correlation Matrix to see the amount of correlation β
corrMatrix = df.corr()
fig, ax = plt.subplots(figsize=(15,15)) # Sample figsize in inches
sns.heatmap(corrMatrix, annot=True, linewidths=.5, ax=ax)
plt.show()
So from these data, it is quite evident productivity mainly depends on the target productivity as having a target will motivate and boost the employees.
Letβs Prepare the final data and start the prediction.
Preprocessing Data
Letβs make some data cleaning and preprocessing before going for the prediction
df.date
So the data we have is for 3 months. In the data, we already have a day column, so having a month column will suffice instead of the completeΒ date.
df['month']=df['date'].dt.month
df.drop(['date'],axis=1, inplace=True)
Now letβs see whether we have any missing valuesΒ β
# This will Display the percentage of missing values per column
df.isnull().sum() / len(df) * 100
So we have only one columnβββwip and it has 42% missing values. As of now, Instead of filling it, letβs remove thisΒ column.
df.drop(['wip'],axis=1, inplace=True)
In the data, you can see a few non-numerical columns. So letβs encode them as most machine learning algorithms work only with numerical data.
Letβs encode the data with MultiColumnLabelEncoder β
!pip install MultiColumnLabelEncoder
Here we have used MultiColumnLabelEncoder as it is most helpful in inversing the encoding.
import MultiColumnLabelEncoder
Mcle = MultiColumnLabelEncoder.MultiColumnLabelEncoder()
df = Mcle.fit_transform(df)
So our Data is ready. Letβs split the data into independent and dependent columnsΒ β
x=df.drop(['actual_productivity'],axis=1)
y=df['actual_productivity']
Predicting the Productivity
Letβs predict productivity using regression algorithms in Python. Before that, letβs prepare training and testing dataΒ β
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.8,random_state=0)
Using LinearRegression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
model_lr=LinearRegression()
model_lr.fit(x_train,y_train)
pred_test=model_lr.predict(x_test)
print("test_MSE:",mean_squared_error(y_test, pred_test))
print("test_MAE:",mean_absolute_error(y_test, pred_test))
print("R2_score:{}".format(r2_score(y_test, pred_test)))
Letβs improve the performance using Random Forest Regression.
Using Random Forest Regressor
from sklearn.ensemble import RandomForestRegressor
model_rfe = RandomForestRegressor(n_estimators=200,max_depth=5)
model_rfe.fit(x_train, y_train)
pred = model_rfe.predict(x_test)
print("test_MSE:",mean_squared_error(y_test, pred))
print("test_MAE:",mean_absolute_error(y_test, pred))
print("R2_score:{}".format(r2_score(y_test, pred)))
using XGBoost
import xgboost as xgb
model_xgb = xgb.XGBRegressor(n_estimators=200, max_depth=5, learning_rate=0.1)
model_xgb.fit(x_train, y_train)
pred3=model_xgb.predict(x_test)
print("test_MSE:",mean_squared_error(y_test, pred3))
print("test_MAE:",mean_absolute_error(y_test, pred3))
print("R2_score:{}".format(r2_score(y_test, pred3)))
So we have achieved 0.07βββMean Absolute Error and 0.01 Mean Square error which says our model is performing veryΒ well.
So Out of all algorithms, XGBoost has performed well. In this way, we can predict the productivity of employees.
Happy Codingβ¦β¦.
Productivity Prediction of Employees using Machine Learning Python was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI