GPT-3: A Data Scientist in the Making

Last Updated on January 6, 2023 by Editorial Team

Last Updated on June 23, 2021 by Editorial Team

Autopilot exploratory data analysis in pandas by leveraging the capabilities of the world’s most sophisticated language model GPT-3…

“Errors using inadequate data are much less than those using no data at all” — **Charles Babbage**

Pre-Requisites

I have collected the dots in the form of articles, please go through the below articles in the same order to connect the dots and understand the key tech stack behind the intelligent Kube Bot:

Introduction to Pandas

Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation tool built on top of the Python programming language. It is widely accepted among the Python community and is used in many other packages, frameworks, and modules. Pandas is an extremely flexible framework and has a wide range of use-cases for preparing the data for machine learning and deep learning models.

“Torture the data to the right extent and it will confess to anything” — **Ronald Coase**

Installing pandas

Pandas is available as a standard python library at PyPI, which can be easily installed using either pip or conda depending on the python environment. Due to the popularity of Pandas, it has its own conventional abbreviation, so the following command can be used for installing Pandas:

import pandas as pd

What kind of data pandas can handle?

If you work with tabular data, such as data in spreadsheets or databases, pandas is the right tool for you. With Pandas, you can explore, clean, and process your data. In pandas, a data table is called a DataFrame.

How to read and write tabular data with pandas?

Pandas support the integration with many file formats or data sources out of the box (like CSV, excel, SQL, JSON, parquet, etc). It is fairly easy and straightforward to import data from these sources by using the prefix read_*. Similarly, we can use the to_* methods to export the data to the respective formats.

Fig: Illustration of import and export sources in pandas

Application walkthrough

Now I will walk you through the GPT-3 powered pandas assistant application step by step:

While creating any GPT-3 application the first and foremost thing to consider is the design and content of the training prompt. Prompt design is the most significant process in priming the GPT-3 model to give a favorable and contextual response.

As a rule of thumb while designing the training prompt you should aim towards getting a zero-shot response from the model, if that isn’t possible move forward with few examples rather than providing it with an entire corpus. The standard flow for training prompt design should look like: Zero-Shot → Few Shots →Corpus-based Priming.

For designing the training prompt for the pandas assistant application, I have used the following structure for the training prompt:

Description: An initial description of the context about what the pandas assistant is supposed to do and adding a line or two about its functionality.
Natural Language (English): This component includes a minimal one-liner description of the task that will be performed by the pandas assistant. It helps GPT-3 to understand the context in order to generate proper pandas code in python.
Pandas Code: This component includes the pandas code corresponding to the English description provided as an input to the GPT-3 model.

Input → Natural Language ; Output → Pandas Code

The magic of FastAPI → On-the-fly API documentation

Let’s see an example in action, to truly understand the power of GPT-3 in generating pandas code from pure English language. In the below example, we will generate the pandas code by providing minimal instructions to the AI pandas assistant.

References

If you would like to learn more or want to me write more on this subject, feel free to reach out.

My social links: LinkedIn| Twitter | Github

If you liked this post or found it helpful, please take a minute to press the clap button, it increases the post visibility for other medium users.

GPT-3: A Data Scientist in the Making was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

GPT-3: A Data Scientist in the Making

Author(s): Shubham Saboo

Natural Language Processing

Autopilot exploratory data analysis in pandas by leveraging the capabilities of the world’s most sophisticated language model GPT-3…

Pre-Requisites

Introduction to Pandas

Installing pandas

What kind of data pandas can handle?

How to read and write tabular data with pandas?

Application walkthrough

References

Towards AI Team

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Understandability of Deep Learning Models

AI for Everyone: The Biggest AI Myths People Still Believe

How We Taught Machines to Think

#62 Will AI Take Your Job?

NN#6 — Neural Networks Decoded: Concepts Over Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

GPT-3: A Data Scientist in the Making

Author(s): Shubham Saboo

Autopilot exploratory data analysis in pandas by leveraging the capabilities of the world’s most sophisticated language model GPT-3…

Pre-Requisites

Introduction to Pandas

Installing pandas

What kind of data pandas can handle?

How to read and write tabular data with pandas?

Application walkthrough

References

Towards AI Team

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement