Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

Speed up EDA With the Intelligent Lux
Data Analysis

Speed up EDA With the Intelligent Lux

Last Updated on January 6, 2023 by Editorial Team

Author(s): Pranavi Duvva

Image by Colin Behrens from Pixabay

Data Analysis

Automate your visual data exploration with the new python library, Lux 💡.

Have you ever been tired of writing multiple lines of code even for a simple graph during EDA?

Did you ever wish for recommendation-based interactive graphs within the jupyter notebook itself?

If that’s a big yes!

Thankfully! We now have the new python library, Lux.

This article is based on Doris Jung-Lin Lee’s session in WiCDS 2021.

Lux is a python API for intelligent visual discovery, which comes with an inbuilt interactive jupyter widget.

  • Lux could be your intelligent assistant which can automate the visual aspects of the exploratory data analysis.
  • It provides powerful abstractions of the visualizations soon after the data frame has been displayed in the jupyter notebook with just a click.
  • Lux is a very rich user intent-based language.

The main intention of the Lux Library is,to make the visualizations as simple as loading a dataframe.

The interactive Lux widget assists the user to quickly browse through the data and view important trends and patterns. It provides recommendations for the user to analyze further. Lux, can also create visualizations for those sections of the data, you have no clear idea about.

Source: Image by Author

Lux works pretty well with the pandas and you do not have to worry about modifying the code. In fact, Lux was developed in such a way that, it preserves the pandas data frame semantics. This means it synchronizes its behavior with the pandas instructions itself.

That’s Awesome Right!

Let’s get started and bring in our intelligent visual assistant powered by Lux.

Installation requirements

  1. Lux can be installed through PyPI.
pip install lux-api

2. If you use conda, Lux can be installed by,

conda install -c conda-forge lux-api

3. For the setup in the jupyter notebook, you need to add the following extensions as well.

jupyter nbextension install --py luxwidget
jupyter nbextension enable --py luxwidget

That’s it! we are ready to go…

Case Study

Let’s consider an example dataset to explore the features of the Lux library.

I would be using the Graduate Admission dataset taken from the Kaggle data repository.

This dataset contains several parameters which are considered important during the application for Masters Programs.

Data Dictionary

  1. GRE Scores ( out of 340 )
  2. TOEFL Scores ( out of 120 )
  3. University Rating ( out of 5 )
  4. Statement of Purpose and Letter of Recommendation Strength ( out of 5 )
  5. Undergraduate GPA ( out of 10 )
  6. Research Experience ( either 0 or 1 )
  7. Chance of Admit ( ranging from 0 to 1 )

1. Importing all the necessary libraries

Now that the package has been successfully installed. We just have to import the lux library into our jupyter notebook.

2. Loading the data set and checking the concise summary

Let’s load the dataset and check the top 5 rows.

Source: Image by Author

Checking the shape of the data set.

(400, 9)

There are a total of 400 rows and 9 columns.

Removing the first column Serial No. and checking the concise summary of the data set with the info()

Source: Image by Author

We observe that the data type of all the 8 columns in the dataset is numeric.

3. Visual Data Exploration with Lux 💡

Let’s now display the data frame and explore the Lux widget.

Source: Image by Author

When the data frame is displayed Lux by default provides us with 3 tabs which are Correlation, Distribution, and Occurrence.

Let’s get to know each of these

  1. Correlation
Source: Image by Author

The correlation tab displays the relationship between the quantitative variables present in the dataset.

The order in which it displays is the most correlated ones to the least correlated ones.

Source: Image by Author

2. Distribution

Source: Image by Author

The distribution tab displays the histograms of the quantitative variables in the dataset.

The order in which it displays is the highly skewed ones to the least skewed ones.

Source: Image by Author

3. Occurrence

Source: Image by Author

The occurrence tab displays the bar charts of the categorical attributes.

The order it follows is the most uneven distribution to the even distribution.

Although our dataset did not contain any features with the categorical data type. It did recommend bar charts for those features which it thinks might be useful for our analysis.

4. Visualizations and Recommendations based on user intent.

Let’s say you want to know more about a specific feature or multiple features together. You can get all the visualizations related to those attributes with the help of intent

The lux widget not only displays the visualization for that feature intended. But will also provide you with additional recommendations for further analysis with the help of Filter and Enhance options.

  1. Enhance

The Enhance feature of lux adds an additional attribute to the intended attributes specified by the user for visualization.

It lets the user compare the effect of the added attribute to the intended visualization. This is similar to adding a hue.

2. Filter

The filter lets the user visualize the intended attributes for different subsets of the data.

Let’s understand better with the following examples.

Consider one attribute CGPA,

df.intent=[“CGPA”]
df

1.Enhance Recommendations for one attribute

Enhance Tab recommendations when the intended attribute is CGPA, Source: Image by Author

The Enhance tab when the given input is one feature “CGPA” fixes the intended variable “CGPA” on the x-axis and gives us recommendations by comparing it with different attributes.

2. Filter Recommendations for one attribute

Filter Tab recommendations when the intended attribute is CGPA, Source: Image by Author

The Filter tab fixes the intended variable “CGPA” on the x-axis and gives us recommendations by comparing it with different subparts of the data set.

Consider two attributes “TOEFL Score” and “GRE Score”,

df.intent=[“TOEFL Score”,”GRE Score”]
df

1.Enhance Recommendations for two attributes

Enhance Tab recommendations when the intended attributes are TOEFEL Score and GRE Score, Source: Image by Author

The Enhance tab when the given input is two attributes “TOEFL Score”, “GRE Score”. It fixes the intended variables “TOEFL Score” on the x-axis and the “GRE Score” on the y-axis. It then gives us recommendations by comparing with different attributes.

2. Filter Recommendations for two attributes

Filter Tab recommendations when the intended attributes are TOEFEL Score and GRE Score, Source: Image by Author

The Filter tab when the given input is two attributes “TOEFL Score”, “GRE Score”. It fixes the intended variable “TOEFL Score” on the x-axis and the “GRE Score” on the y-axis. It then gives us recommendations by comparing both together with different subparts of the data.

Source: Image by Author

5. Exporting Visualizations.

Lux makes it very easy to share the visualizations. To export visualizations into a static HTML the following command has to be used.

df.save_as_html(“File name.html”)

Conclusion

Lux the new python open-source library is definitely making data exploration a lot easier. This article has demonstrated how Lux was able to automate most of our visualizations with very minimal code. It also explained some of the prominent features of the Lux library.

Status of Project Lux: Currently, Lux is in its early development stage.

Resources

To know more about the Lux library you can find the details at lux-API.

You can also try their Hands-on exercises or tutorials on Binder.

Hope you enjoyed reading this article!

Please feel free to check my other articles on pranaviduvva at medium.

Thanks for reading!


Speed up EDA With the Intelligent Lux was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓