Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

PySpark Snowflake Data Warehouse Read Write operations — Part1 (Read Only)
Programming

PySpark Snowflake Data Warehouse Read Write operations — Part1 (Read Only)

Last Updated on February 8, 2021 by Editorial Team

Author(s): Vivek Chaudhary

Programming

PySpark Snowflake Data Warehouse Read Write operations — Part1 (Read-Only)

The Objective of this story is to build an understanding of the Read and Write operations on the Snowflake Data warehouse table using Apache Spark API, Pyspark.

Snowflake is a cloud-based Data Warehousing solution, designed for scalability and performance. In my upcoming blogs I will definitely share a detailed insight on Snowflake and its components, but for the moment take it as a cloud-based Database system with massively parallel processing and enormous computation power, so as of now I will restrict myself here only to Snowflake and Spark related stuff.

Pyspark-Snowflake
  1. Load Data from Amazon S3 Bucket to Snowflake DB table

Snowflake provides a 30 day free trial. Create your free tier account and enjoy the benefits of learning.

Data present in AWS S3 bucket:

Setup before loading data into Snowflake Table:

  1. Create Snowflake Free Tier Account and Create a Database as below:

2. Create a Stage that will point to the AWS S3 bucket:

3. Under the File Formats tab create a CSV file format to be used while loading data from AWS S3 to Snowflake Table.

4. The next step is to create Snowflake Table “EMP”, go to the Worksheets tab and execute the SnowSQL DDL command to create a table.

create table emp( empno INTEGER, 
ename string,
sal integer,
deptno integer,
comm integer);
Snow DDL

5. Next, copy the data from AWS S3 to the Snowflake table.

copy into learning_db.emp 
from @csv_data_loads/emp.csv file_format=csv_loads
on_error=’skip_file’;
Data copy command

csv_data_loads represents Stage location where we pointed to S3 bucket csv_loads is the file format we created for csv dataset.

Verify the dataset using SnowSQL:

Data successfully loaded to Snowflake.

2. Read Snowflake Table using PySpark.

Import prerequisites and set spark configurations:

import findspark
findspark.init(‘D:\spark-2.3.3-bin-hadoop2.7’)
import pyspark
from pyspark.sql import SparkSession
from pyspark import SparkContext, SparkConf
import os
os.environ[‘PYSPARK_SUBMIT_ARGS’] = ‘ — packages net.snowflake:snowflake-jdbc:3.11.1,net.snowflake:spark-snowflake_2.11:2.5.7-spark_2.4 pyspark-shell’
spark = SparkSession.builder.appName(‘Pyspark_snowflake’).getOrCreate()
spark._jvm.net.snowflake.spark.snowflake.SnowflakeConnectorUtils.enablePushdownSession(spark._jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate())

Set Snowflake connection properties:

sfOptions = {
“sfURL” : “wa29709.ap-south-1.aws.snowflakecomputing.com”,
“sfAccount” : “xxxxxxx”,
“sfUser” : “xxxxxxxx”,
“sfPassword” : “xxxxxxx”,
“sfDatabase” : “learning_db”,
“sfSchema” : “public”,
“sfWarehouse” : “compute_wh”,
“sfRole” : “sysadmin”,
}

SNOWFLAKE_SOURCE_NAME = “net.snowflake.spark.snowflake”
df=spark.read.format(SNOWFLAKE_SOURCE_NAME).options(**sfOptions).option(“query”,”select * from emp”).load()
df.show()

Data Validation successful, we are able to connect to spark and read the data from Snowflake table into Spark.

Summary:

· Load AWS S3 data into Snowflake DB table

· Snowflake Stage and File Format creation.

· SnowSQL commands such as DDL and copying data.

· Use Pyspark to read Snowflake table data

Thanks to all for reading my blog. Do share your views and feedback.


PySpark Snowflake Data Warehouse Read Write operations — Part1 (Read Only) was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓