GPT-3: A Data Scientist in the Making
Last Updated on January 6, 2023 by Editorial Team
Last Updated on June 23, 2021 by Editorial Team
Author(s): Shubham Saboo
Natural Language Processing
Autopilot exploratory data analysis in pandas by leveraging the capabilities of the worldβs most sophisticated language modelΒ GPT-3β¦
Pre-Requisites
I have collected the dots in the form of articles, please go through the below articles in the same order to connect the dots and understand the key tech stack behind the intelligent KubeΒ Bot:
- FastAPIβββThe Spiffy Way BeyondΒ Flask!
- StreamlitβββRevolutionizing Data AppΒ Creation
- A Brief Introduction toΒ GPT-3
Introduction toΒ Pandas
Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. It is widely accepted among the Python community and is used in many other packages, frameworks, and modules. Pandas is an extremely flexible framework and has a wide range of use-cases for preparing the data for machine learning and deep learningΒ models.
Installing pandas
Pandas is available as a standard python library at PyPI, which can be easily installed using either pip or conda depending on the python environment. Due to the popularity of Pandas, it has its own conventional abbreviation, so the following command can be used for installing Pandas:
import pandas as pd
What kind of data pandas canΒ handle?
If you work with tabular data, such as data in spreadsheets or databases, pandas is the right tool for you. With Pandas, you can explore, clean, and process your data. In pandas, a data table is called a DataFrame.
How to read and write tabular data withΒ pandas?
Pandas support the integration with many file formats or data sources out of the box (like CSV, excel, SQL, JSON, parquet, etc). It is fairly easy and straightforward to import data from these sources by using the prefix read_*. Similarly, we can use the to_* methods to export the data to the respective formats.
Application walkthrough
Now I will walk you through the GPT-3 powered pandas assistant application step byΒ step:
While creating any GPT-3 application the first and foremost thing to consider is the design and content of the training prompt. Prompt design is the most significant process in priming the GPT-3 model to give a favorable and contextual response.
As a rule of thumb while designing the training prompt you should aim towards getting a zero-shot response from the model, if that isnβt possible move forward with few examples rather than providing it with an entire corpus. The standard flow for training prompt design should look like: Zero-Shot β Few Shots βCorpus-based Priming.
For designing the training prompt for the pandas assistant application, I have used the following structure for the trainingΒ prompt:
- Description: An initial description of the context about what the pandas assistant is supposed to do and adding a line or two about its functionality.
- Natural Language (English): This component includes a minimal one-liner description of the task that will be performed by the pandas assistant. It helps GPT-3 to understand the context in order to generate proper pandas code inΒ python.
- Pandas Code: This component includes the pandas code corresponding to the English description provided as an input to the GPT-3Β model.
Input β Natural LanguageΒ ; Output β PandasΒ Code
Letβs see an example in action, to truly understand the power of GPT-3 in generating pandas code from pure English language. In the below example, we will generate the pandas code by providing minimal instructions to the AI pandas assistant.
References
- https://en.wikipedia.org/wiki/GPT-3
- https://openai.com/blog/openai-api
- https://pandas.pydata.org/docs
If you would like to learn more or want to me write more on this subject, feel free to reachΒ out.
My social links: LinkedIn| Twitter |Β Github
If you liked this post or found it helpful, please take a minute to press the clap button, it increases the post visibility for other mediumΒ users.
GPT-3: A Data Scientist in the Making was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI