PySpark AI | Complete Guide of Using English SDK
Last Updated on April 5, 2024 by Editorial Team
Author(s): Naveen Nelamali
Originally published on Towards AI.
PySpark released an English SDK package called pyspark-ai which allows you to write transformations in English instead of using DataFrame API or SQL. In other words, it enables you to use natural language in data processing and analysis more elegantly. Using English to get the analytical results will eliminate you from learning DataFrame API and SQL to some extent.
In this article, I will explore pyspark-ai English SDK by installing and using English instructions to apply DataFrame transformations, run some simple analytical queries, and explore the results in tables and charts.
This package makes PySpark simple and more user-friendly. You donβt have to learn to code or write complex queries and focus your effort on using the data to get insights.
The pyspark-ai leverages langchain and openai framework to use GenAI Large Language Models (LLM) to simplify the usage of PySpark. The openai is an opensource framework that is used to interact with the OpenAI GPT models, whereas langchain is also a open-source framework that allows you to interact with several LLM models seamlessly; this abstracts the complex code to use with LLM model APIs.
You can use the Python pip command to install English SDK pyspark-ai from PyPI.
# Install pyspark-aipip install pyspark-ai
Since pyspark-ai… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI