GPT-3: A Data Scientist in the Making
Last Updated on June 23, 2021 by Editorial Team
Author(s): Shubham Saboo
Autopilot exploratory data analysis in pandas by leveraging the capabilities of the world’s most sophisticated language model GPT-3…
I have collected the dots in the form of articles, please go through the below articles in the same order to connect the dots and understand the key tech stack behind the intelligent Kube Bot:
- FastAPI — The Spiffy Way Beyond Flask!
- Streamlit — Revolutionizing Data App Creation
- A Brief Introduction to GPT-3
Introduction to Pandas
Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. It is widely accepted among the Python community and is used in many other packages, frameworks, and modules. Pandas is an extremely flexible framework and has a wide range of use-cases for preparing the data for machine learning and deep learning models.
Pandas is available as a standard python library at PyPI, which can be easily installed using either pip or conda depending on the python environment. Due to the popularity of Pandas, it has its own conventional abbreviation, so the following command can be used for installing Pandas:
import pandas as pd
What kind of data pandas can handle?
If you work with tabular data, such as data in spreadsheets or databases, pandas is the right tool for you. With Pandas, you can explore, clean, and process your data. In pandas, a data table is called a DataFrame.
How to read and write tabular data with pandas?
Pandas support the integration with many file formats or data sources out of the box (like CSV, excel, SQL, JSON, parquet, etc). It is fairly easy and straightforward to import data from these sources by using the prefix read_*. Similarly, we can use the to_* methods to export the data to the respective formats.
Now I will walk you through the GPT-3 powered pandas assistant application step by step:
While creating any GPT-3 application the first and foremost thing to consider is the design and content of the training prompt. Prompt design is the most significant process in priming the GPT-3 model to give a favorable and contextual response.
As a rule of thumb while designing the training prompt you should aim towards getting a zero-shot response from the model, if that isn’t possible move forward with few examples rather than providing it with an entire corpus. The standard flow for training prompt design should look like: Zero-Shot → Few Shots →Corpus-based Priming.
For designing the training prompt for the pandas assistant application, I have used the following structure for the training prompt:
- Description: An initial description of the context about what the pandas assistant is supposed to do and adding a line or two about its functionality.
- Natural Language (English): This component includes a minimal one-liner description of the task that will be performed by the pandas assistant. It helps GPT-3 to understand the context in order to generate proper pandas code in python.
- Pandas Code: This component includes the pandas code corresponding to the English description provided as an input to the GPT-3 model.
Input → Natural Language ; Output → Pandas Code
Let’s see an example in action, to truly understand the power of GPT-3 in generating pandas code from pure English language. In the below example, we will generate the pandas code by providing minimal instructions to the AI pandas assistant.
If you would like to learn more or want to me write more on this subject, feel free to reach out.
If you liked this post or found it helpful, please take a minute to press the clap button, it increases the post visibility for other medium users.
Published via Towards AI