Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Lazypredict: Run All Sklearn Algorithms With a Line Of Code
Latest   Machine Learning

Lazypredict: Run All Sklearn Algorithms With a Line Of Code

Last Updated on July 25, 2023 by Editorial Team

Author(s): Travis Tang

Originally published on Towards AI.

How to (and why you shouldn’t) use it

An output of lazypredict.

Here are two pain points of data scientists:

Pain Point 1: Limited time in the data science lifecycle

Data scientists have to prioritize. This may mean spending more time on understanding the business problem and identifying the most appropriate approach rather than focusing solely on developing machine learning algorithms.

Pain point 2: Machine learning modeling can be time-consuming

Fine-tuning a machine learning algorithm involves finding the optimal values for these hyperparameters, which can be a trial-and-error process. This takes a long time.

Automated machine learning can help data scientists tremendously. Image by stable diffusion.

AutoML saves the day

AutoML can address these. One nascent library is lazypredict. In this post, I run through the following:

  • What is lazypredict
  • Installing lazypredict
  • How to use it for automatically fit scikit-learn regression algorithms
  • How to use it for automatically fit classification algorithms
  • Why you shouldn’t use it (and what else you can use)

Note: I’m not affiliated with lazypredict.

What is Lazypredict

Lazypredict is a Python package that aims to automate the machine learning modeling process. It works on both regression and classification tasks.

Its key feature is its ability to automate the training and evaluation of machine learning models. It provides a simple interface for defining a range of hyperparameters and then trains and evaluates a model using a variety of different combinations of these hyperparameters.

Installing Lazypredict

On your terminal, run the following

pip install lazypredict

However, you might need to manually install some dependencies of lazypredict. If you run into issues that say that you need to install scikit-learn, xgboost, or lightgbm, you can run pip install to install the necessary libraries.

Personally, I got it to work on python 3.9.13by having the following requirements.txt

pandas==1.4.4
numpy==1.21.5
scikit-learn==1.0.2
lazypredict==0.2.12

I installed the following libraries by running this command on the terminal: pip install -r requirements.txt .

It’s even better to use a virtual environment in this case.

Using Lazypredict for Regression

Let’s walk through the code. (If you just want the complete code, search β€œFull code” in this article.)

We’ll first import the necessary libraries.

from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np

First, we’ll import the Diabetes dataset.

Ten baseline variables, age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of n = 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.

# Import the Diabetes Dataset
diabetes = datasets.load_diabetes()

Next, we shuffle the dataset so that we can split them into train-test sets.

# Shuffle the dataset 
X, y = shuffle(diabetes.data, diabetes.target, random_state=13)

# Cast the numerical values into a numpy float.
X = X.astype(np.float32)

# Split the dataset into 90% and 10%.
offset = int(X.shape[0] * 0.9)

# Split into train and test
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]

Next, we initialize the LazyRegressor object.

# Running the Lazypredict library and fit multiple regression libraries
# for the same dataset
reg = LazyRegressor(verbose=0,
ignore_warnings=False,
custom_metric=None,
predictions=False,
random_state = 13)

# Parameters
# ----------
# verbose : int, optional (default=0)
# For the liblinear and lbfgs solvers set verbose to any positive
# number for verbosity.
# ignore_warnings : bool, optional (default=True)
# When set to True, the warning related to algorigms that are not able
# to run are ignored.
# custom_metric : function, optional (default=None)
# When function is provided, models are evaluated based on the custom
# evaluation metric provided.
# prediction : bool, optional (default=False)
# When set to True, the predictions of all the models models are
# returned as dataframe.
# regressors : list, optional (default="all")
# When function is provided, trains the chosen regressor(s).

Now, we will fitmultiple regression algorithms with the lazypredict library. This step took 3 seconds in total.

Under the hood, the fit method does the following:

  1. Split all features into three categories: numerical (features which are numbers) or categorical (features which are text)
  2. Further split categorical features into two: β€˜High’ categorical features (which have more unique values than the total number of features) and β€˜low’ categorical features (which have less unique values than the total number of features)
  3. Each feature is then preprocessed in this manner:
  • Numerical features: Impute missing values with mean, then standardize the feature (removing the mean and dividing by the variance)
  • β€˜High’ categorical features: Impute missing values with the value β€˜missing’, then perform one-hot encoding.
  • β€˜Low’ categorical features: Impute missing values with the value β€˜missing’, then perform ordinal encoding (convert each unique string value into an integer. In the example of a Gender columnβ€” β€˜Male’ is encoded as 0 and β€˜Female’ 1.)
  • Fit the training dataset on each algorithm.
  • Test each algorithm on the testing set. By default, the metrics are adjusted R-squared, R-squared, root-mean-squared error, and the time taken.
models, predictions = reg.fit(X_train, X_test, y_train, y_test)
model_dictionary = reg.provide_models(X_train, X_test, y_train, y_test)
models

Here is the result.

U+007C Model U+007C Adjusted R-Squared U+007C R-Squared U+007C RMSE U+007C Time Taken U+007C
U+007C:------------------------------U+007C---------------------:U+007C------------:U+007C-------:U+007C-------------:U+007C
U+007C ExtraTreesRegressor U+007C 0.38 U+007C 0.52 U+007C 54.22 U+007C 0.17 U+007C
U+007C OrthogonalMatchingPursuitCV U+007C 0.37 U+007C 0.52 U+007C 54.39 U+007C 0.01 U+007C
U+007C Lasso U+007C 0.37 U+007C 0.52 U+007C 54.46 U+007C 0.01 U+007C
U+007C LassoLars U+007C 0.37 U+007C 0.52 U+007C 54.46 U+007C 0.01 U+007C
U+007C LarsCV U+007C 0.37 U+007C 0.51 U+007C 54.54 U+007C 0.02 U+007C
U+007C LassoCV U+007C 0.37 U+007C 0.51 U+007C 54.59 U+007C 0.07 U+007C
U+007C PassiveAggressiveRegressor U+007C 0.37 U+007C 0.51 U+007C 54.74 U+007C 0.01 U+007C
U+007C LassoLarsIC U+007C 0.36 U+007C 0.51 U+007C 54.83 U+007C 0.01 U+007C
U+007C SGDRegressor U+007C 0.36 U+007C 0.51 U+007C 54.85 U+007C 0.01 U+007C
U+007C RidgeCV U+007C 0.36 U+007C 0.51 U+007C 54.91 U+007C 0.01 U+007C
U+007C Ridge U+007C 0.36 U+007C 0.51 U+007C 54.91 U+007C 0.01 U+007C
U+007C BayesianRidge U+007C 0.36 U+007C 0.51 U+007C 54.94 U+007C 0.01 U+007C
U+007C LassoLarsCV U+007C 0.36 U+007C 0.51 U+007C 54.96 U+007C 0.02 U+007C
U+007C LinearRegression U+007C 0.36 U+007C 0.51 U+007C 54.96 U+007C 0.01 U+007C
U+007C TransformedTargetRegressor U+007C 0.36 U+007C 0.51 U+007C 54.96 U+007C 0.01 U+007C
U+007C Lars U+007C 0.36 U+007C 0.50 U+007C 55.09 U+007C 0.01 U+007C
U+007C ElasticNetCV U+007C 0.36 U+007C 0.50 U+007C 55.20 U+007C 0.06 U+007C
U+007C HuberRegressor U+007C 0.36 U+007C 0.50 U+007C 55.24 U+007C 0.02 U+007C
U+007C RandomForestRegressor U+007C 0.35 U+007C 0.50 U+007C 55.48 U+007C 0.25 U+007C
U+007C AdaBoostRegressor U+007C 0.34 U+007C 0.49 U+007C 55.88 U+007C 0.08 U+007C
U+007C LGBMRegressor U+007C 0.34 U+007C 0.49 U+007C 55.93 U+007C 0.05 U+007C
U+007C HistGradientBoostingRegressor U+007C 0.34 U+007C 0.49 U+007C 56.08 U+007C 0.20 U+007C
U+007C PoissonRegressor U+007C 0.32 U+007C 0.48 U+007C 56.61 U+007C 0.01 U+007C
U+007C ElasticNet U+007C 0.30 U+007C 0.46 U+007C 57.49 U+007C 0.01 U+007C
U+007C KNeighborsRegressor U+007C 0.30 U+007C 0.46 U+007C 57.57 U+007C 0.01 U+007C
U+007C OrthogonalMatchingPursuit U+007C 0.29 U+007C 0.45 U+007C 57.87 U+007C 0.01 U+007C
U+007C BaggingRegressor U+007C 0.29 U+007C 0.45 U+007C 57.92 U+007C 0.04 U+007C
U+007C XGBRegressor U+007C 0.28 U+007C 0.45 U+007C 58.18 U+007C 0.11 U+007C
U+007C GradientBoostingRegressor U+007C 0.25 U+007C 0.42 U+007C 59.70 U+007C 0.12 U+007C
U+007C TweedieRegressor U+007C 0.24 U+007C 0.42 U+007C 59.81 U+007C 0.01 U+007C
U+007C GammaRegressor U+007C 0.22 U+007C 0.40 U+007C 60.61 U+007C 0.01 U+007C
U+007C RANSACRegressor U+007C 0.20 U+007C 0.38 U+007C 61.40 U+007C 0.12 U+007C
U+007C LinearSVR U+007C 0.12 U+007C 0.32 U+007C 64.66 U+007C 0.01 U+007C
U+007C ExtraTreeRegressor U+007C 0.00 U+007C 0.23 U+007C 68.73 U+007C 0.01 U+007C
U+007C NuSVR U+007C -0.07 U+007C 0.18 U+007C 71.06 U+007C 0.01 U+007C
U+007C SVR U+007C -0.10 U+007C 0.15 U+007C 72.04 U+007C 0.02 U+007C
U+007C DummyRegressor U+007C -0.30 U+007C -0.00 U+007C 78.37 U+007C 0.01 U+007C
U+007C QuantileRegressor U+007C -0.35 U+007C -0.04 U+007C 79.84 U+007C 1.42 U+007C
U+007C DecisionTreeRegressor U+007C -0.47 U+007C -0.14 U+007C 83.42 U+007C 0.01 U+007C
U+007C GaussianProcessRegressor U+007C -0.77 U+007C -0.37 U+007C 91.51 U+007C 0.02 U+007C
U+007C MLPRegressor U+007C -1.87 U+007C -1.22 U+007C 116.51 U+007C 0.21 U+007C
U+007C KernelRidge U+007C -5.04 U+007C -3.67 U+007C 169.06 U+007C 0.01 U+007C

Here’s the full code for regression on a Diabetes dataset.

 
from lazypredict.Supervised import LazyRegressor
from sklearn import datasets
from sklearn.utils import shuffle
import numpy as np

# Import the Diabetes Dataset
diabetes = datasets.load_diabetes()

# Shuffle the dataset
X, y = shuffle(diabetes.data, diabetes.target, random_state=13)

# Cast the numerical values
X = X.astype(np.float32)
offset = int(X.shape[0] * 0.9)

# Split into train and test
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]

# Running the Lazypredict library and fit multiple regression libraries
# for the same dataset
reg = LazyRegressor(verbose=0, ignore_warnings=False, custom_metric=None)
models, predictions = reg.fit(X_train, X_test, y_train, y_test)
model_dictionary = reg.provide_models(X_train, X_test, y_train, y_test)
models

Using Lazypredict for Classification

Let’s use Lazypredict for classification. (If you just want the full code, search β€œfull code” in this article.

First, import the necessary libraries.

 from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

Next, we load the data, the Iris dataset, and split it into train and test sets. Here’s what it contains.

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

 data = load_iris()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.5,random_state =123)

Next, we initialize the LazyClassifier object.

# Running the Lazypredict library and fit multiple regression libraries
# for the same dataset
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)

"""
Parameters
----------
verbose : int, optional (default=0)
For the liblinear and lbfgs solvers set verbose to any positive
number for verbosity.
ignore_warnings : bool, optional (default=True)
When set to True, the warning related to algorigms that are not able to run are ignored.
custom_metric : function, optional (default=None)
When function is provided, models are evaluated based on the custom evaluation metric provided.
prediction : bool, optional (default=False)
When set to True, the predictions of all the models models are returned as dataframe.
classifiers : list, optional (default="all")
When function is provided, trains the chosen classifier(s).
"""

Then, we call the lazy regressor'sfitmethod, which fits ltiple classification algorithms with the lazypredict library. This step took 1 second in total for this small dataset.

(Search the keyword β€œunder the hood, the fit method” to jump to the section where I explain what fit does.)

 models,predictions = clf.fit(X_train, X_test, y_train, y_test)

Lastly, we can see how each model performs using provide_models. This reports the accuracy, balanced accuracy, ROC AUC, and F1 score on the test set.

# Calculate performance of all models on test dataset
model_dictionary = clf.provide_models(X_train,X_test,y_train,y_test)
models

Here is the full result.

U+007C Model U+007C Accuracy U+007C Balanced Accuracy U+007C ROC AUC U+007C F1 Score U+007C Time Taken U+007C
U+007C:------------------------------U+007C-----------:U+007C--------------------:U+007C:----------U+007C-----------:U+007C-------------:U+007C
U+007C LinearDiscriminantAnalysis U+007C 0.99 U+007C 0.99 U+007C U+007C 0.99 U+007C 0.01 U+007C
U+007C AdaBoostClassifier U+007C 0.97 U+007C 0.98 U+007C U+007C 0.97 U+007C 0.13 U+007C
U+007C PassiveAggressiveClassifier U+007C 0.97 U+007C 0.98 U+007C U+007C 0.97 U+007C 0.01 U+007C
U+007C LogisticRegression U+007C 0.97 U+007C 0.98 U+007C U+007C 0.97 U+007C 0.01 U+007C
U+007C GaussianNB U+007C 0.97 U+007C 0.98 U+007C U+007C 0.97 U+007C 0.01 U+007C
U+007C SGDClassifier U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.01 U+007C
U+007C RandomForestClassifier U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.19 U+007C
U+007C QuadraticDiscriminantAnalysis U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.01 U+007C
U+007C Perceptron U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.01 U+007C
U+007C LGBMClassifier U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.30 U+007C
U+007C ExtraTreeClassifier U+007C 0.96 U+007C 0.96 U+007C U+007C 0.96 U+007C 0.01 U+007C
U+007C BaggingClassifier U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.03 U+007C
U+007C ExtraTreesClassifier U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.13 U+007C
U+007C XGBClassifier U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.19 U+007C
U+007C DecisionTreeClassifier U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.01 U+007C
U+007C LinearSVC U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.01 U+007C
U+007C CalibratedClassifierCV U+007C 0.95 U+007C 0.95 U+007C U+007C 0.95 U+007C 0.04 U+007C
U+007C KNeighborsClassifier U+007C 0.93 U+007C 0.94 U+007C U+007C 0.93 U+007C 0.01 U+007C
U+007C NuSVC U+007C 0.93 U+007C 0.94 U+007C U+007C 0.93 U+007C 0.01 U+007C
U+007C SVC U+007C 0.93 U+007C 0.94 U+007C U+007C 0.93 U+007C 0.01 U+007C
U+007C RidgeClassifierCV U+007C 0.91 U+007C 0.91 U+007C U+007C 0.91 U+007C 0.01 U+007C
U+007C NearestCentroid U+007C 0.89 U+007C 0.90 U+007C U+007C 0.89 U+007C 0.01 U+007C
U+007C LabelPropagation U+007C 0.89 U+007C 0.90 U+007C U+007C 0.90 U+007C 0.01 U+007C
U+007C LabelSpreading U+007C 0.89 U+007C 0.90 U+007C U+007C 0.90 U+007C 0.01 U+007C
U+007C RidgeClassifier U+007C 0.88 U+007C 0.89 U+007C U+007C 0.88 U+007C 0.01 U+007C
U+007C BernoulliNB U+007C 0.79 U+007C 0.75 U+007C U+007C 0.77 U+007C 0.01 U+007C
U+007C DummyClassifier U+007C 0.27 U+007C 0.33 U+007C U+007C 0.11 U+007C 0.01 U+007C

Here is the full code for classification.


from lazypredict.Supervised import LazyClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load dataset
data = load_breast_cancer()
X = data.data
y= data.target

# Split data into train and test with a 90:10 ratio
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size=.1,random_state =123)

# Initialize the Lazypredict library
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)

# Fit all classification algorithms on training dataset
models,predictions = clf.fit(X_train, X_test, y_train, y_test)

# Calculate performance of all models on test dataset
model_dictionary = clf.provide_models(X_train,X_test,y_train,y_test)
models

Do I recommend lazypredict?

If you get to install it, lazypredict is very simple to use. Its syntax is very close to scikit-learn, making the learning curve very gentle.

But it has some critical weaknesses.

  1. Difficult installation: Many reported difficulties in installing the libraries because the developers did not add the requirements.txt that document their required dependencies.
  2. Limited documentation: I had to comb through the source code to find out how the preprocessing runs. This is not ideal. I also do not know the hyperparameters used to perform each of the classification and regression tasks.
  3. Limited customizability: I still have yet to find ways to customize the preprocessing steps.
  4. Unclear how to use the model after lazypredict: Once you’re done with the lazypredict library, you’d ideally want to select the best algorithm. Lazypredict does not make this easy since you do not have an easy way of exporting the best algorithm.

Main takeaway

Lazypredict’s critical weaknesses limit its utility. It is nice, but it’s still underdeveloped.

I’d strongly recommend you check out other AutoML libraries that are superior in terms of documentation and customizability.

Here are some alternatives.

  1. TPOT (Check out how to use TPOT here)
  2. Auto-Sklearn
  3. Auto-ViML
  4. H2O AutoML
  5. Auto-Keras
  6. MLBox
  7. Hyperopt Sklearn
  8. AutoGluon
Data scientist + Robots = Magic. Photo by Andy Kelly on Unsplash

I’m Travis Tang, a data scientist in Tech. I share how you can use open-sourced libraries on Medium. I also share data analytics and science tips on LinkedIn daily. Follow me if you like this content.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓