Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Tabular Classification and Regression Made Easy with Lightning Flash
Latest

Tabular Classification and Regression Made Easy with Lightning Flash

Last Updated on November 25, 2021 by Editorial Team

Author(s): むルカ Borovec

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

Illustration Photo by Oleg Magni fromΒ Pexels

Machine Learning

This post presents solving Tabular primary data via the two most common Machine Learning (ML) tasksβ€Šβ€”β€Šclassification and regression, with Lightning Flash, which makes it veryΒ simple.

When it comes to articles on deep learning, advances in Computer Vision or Natural Language Processing (NLP) receive the lion's share of the attention. Advancement in CV and NLP is fantastic and super exciting; however, many data scientists' day-to-day tasks revolve around tabular data processing.

Tabular data classification and regression are essential tasks. They are often modeled with classical methods such as Random Forests, Support Vector Machines, Linear/Logistic Regressions, and Naive Bayes, implemented in one of many standard librariesβ€Šβ€”β€Šscikit-learn, XGBoost,Β etc.

Still, it is beneficial to experiment with newer Deep Learning methods to model with more complexΒ data.

Screenshot from https://playground.tensorflow.org running simple classification withΒ NN.

In this post, we present how to prepare data and train models with just a few lines of code using Lightning Flash.

This open-source AI Factory built on top of PyTorch Lightning provides out-of-box solutions for several domains such as tabular, image, text, etc., and all basic tasks. We showcase the solution on two simple Kaggle competitions (and link particular kernelsΒ below):

🚒Titanic crash with Lightning⚑Flash

🏠House πŸ’΅prices predictions with Lightning⚑Flash

In the following sections, I will walk through the four stages (plus two bonuses) of tabular modeling, including:
1. Data preparation
2. Model creation
3. Training model
4. Evaluation/Inference

The Lightning Flash API unifies a variety of data loadings and tasks, ensuring that classification and regression code is similar and easy toΒ read.

1. Data Preparation

Data preparation, in general, is a broad subject, so let us narrow it down for this tutorial. For this post, we will use the House pricingΒ dataset.

We assume that the data is clean and has been checked for the task we are solving. For classification, the prediction is a positive discrete value mapping to pre-defined labels. For regression, the forecast is a floating-point value without anyΒ bounds.

The initial step of any training pipeline is loading the data and identification of the data (per column) types. We need to differentiate between the continuous and categorical inputs. The continuous (numerical) values are exemplary, but the categorical (primarily strings) need to be converted to numerical with some internalΒ mapping.

Don't worry. All of this is done inside Flash! As a user, you do not need to think about it unless you wantΒ to.

When you sort numerical and categorical inputs, you can cast them manually or use some heuristic/statistic to infer theΒ type.

Code snippet for casting numerical and categorical columns.

Next, we create a DataModule using the from_csv method. To do so, we specify the input CSV file, set the batch size for training and train/validation split, the numerical and categorical columns we want to use as features, and the target column we want toΒ predict.

Code snippet for creating Flash data. A rule of thumb is to use a validation split between 20%-40% of the providedΒ data.

2. ModelΒ creation

The next step is creating the task model. In this case, we will create a Tabular regression model in which we provide our DataModule, in addition, to a few other model-specific properties, such as optimizer and learningΒ rate.

Code snippet for creating FlashΒ model.

3. Training theΒ model

Training a model is usually a very complex task, but Lightning Flash makes it straightforward. Since Flash is powered by PyTorch-Lightning (PL), you can leverage all PL callbacks and features to train yourΒ model.

In this case, we will use a CSV logger to conveniently plot training statistics to the IPython notebook.

Code snippet for training Flash model. We train for 75 epochs and leverage all the GPUs on ourΒ machine.

When training is done, we plot all metrics and losses collected during the process with seabornΒ package:

Code snippet for plotting collected metrics.

If you do not know the best learning rate for your model/data, you can use the PyTorch Lightning learning rate finder. You need to enable LR in Trainer, and before fit method call the tuneΒ method:

Code snippet for finding the best LearningΒ rate.

When you run this code, you should see training curves similar to the image below that show that your model is converging and learning.

Plotting the trainingΒ metrics.

4. Model Inference

The final part of tabular modeling is inferencing on new data. Once again, Flash makes it straightforward to infer as the model remembers what columns were used during training and separated between numerical and categorical columns.

We pass a loaded table or a path to the CSV file we want to evaluate, and FFlash will give us model predictions:

Let's see what the prediction prices distribution is:

Price distribution on train vs. testΒ data.

Next Steps

Flash Zero for Zero Code CLIΒ Training

Flash Zero is a zero-code machine learning extension of Lightning Flash, which offers Lightning Flash functionality without requiring a single line of a pythonΒ script.

Flash zero is useful for fast prototyping and hyperparameter searches that define an external loop over given options or a cloud platform such asΒ grid.ai.

HyperParameter Optimization with Grid.ai and No Code Change

Let's demonstrate Flash Zero on another simple Tabular classification task from Kaggleβ€Šβ€”β€ŠTabular Playground Seriesβ€Šβ€”β€ŠNovΒ 2021.

Playing πŸ“‹tabular with Lightning⚑Flash

In this case, we will replace the python training script we wrote about, which consecutively created data, model, andΒ trainer:

With a single CLIΒ call:

flash tabular_classification \
--model.learning_rate=0.01 \
--model.optimizer="AdamW" \
--trainer.max_epochs 20 \
--trainer.accumulate_grad_batches=12 \
--trainer.gradient_clip_val=0.1 \
from_csv \
--train_file=/home/jirka/Downloads/train.csv \
--numerical_fields="['f0', 'f1', ..., 'f99']" \
--target_fields="target"
--batch_size=512

In the end, we can browse training progress with TesorBoard as it is also the default Lightning FlashΒ logger:

tensorboard --logdir ./lightning_logs

Tabular forecasting of time-series data

Recently Lightning Flash also introduced tabular forecasting with time series, which we showcase on actually running competition predicting Crypto value targetΒ values.

Sample crypto time-series draw with mplfinance package.

The ongoing Kaggle kernel with crypto demo can be foundΒ here:

πŸͺ™Crypto πŸ“ˆforecasting with Lightning⚑Flash

Are you interested in more cool PyTorch Lightning integrations?
Follow me and join our fantastic
Slack community!

About theΒ Author

Jirka Borovec holds a Ph.D. in Computer Vision from CTU in Prague. He has been working in Machine Learning and Data Science for a few years in several IT startups and companies. He enjoys exploring interesting world problems and solving them with State-of-the-Art techniques and developing open-source projects.


Tabular Classification and Regression Made Easy with Lightning Flash was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓