AWS S3 Read Write Operations Using the Pandas’ API

Last Updated on January 25, 2021 by Editorial Team

AWS S3 Read Write Operations Using the Pandas API

The Objective of this blog is to build an understanding of basic Read and Write operations on Amazon Web Storage Service “S3”. To be more specific, read a CSV file using Pandas and write the DataFrame to AWS S3 bucket and in vice versa operation read the same file from S3 bucket using Pandas API.

1. Prerequisite libraries

import boto3
import pandas as pd
import io

2. Read a CSV file using pandas

emp_df=pd.read_csv(r’D:\python_coding\GitLearn\python_ETL\emp.dat’)
emp_df.head(10)

3. Write the Pandas DataFrame to AWS S3

from io import StringIO

REGION = ‘us-east-2’
ACCESS_KEY_ID = xxxxxxxxxxxxx’
SECRET_ACCESS_KEY = ‘xxxxxxxxxxxxxxxx’

BUCKET_NAME = ‘pysparkcsvs3’
FileName=’pysparks3/emp.csv’

csv_buffer=StringIO()
emp_df.to_csv(csv_buffer, index=False)

s3csv = boto3.client(‘s3’, 
 region_name = REGION,
 aws_access_key_id = ACCESS_KEY_ID,
 aws_secret_access_key = SECRET_ACCESS_KEY
 )

response=s3csv.put_object(Body=csv_buffer.getvalue(),
                           Bucket=BUCKET_NAME,
                           Key=FileName)

As per Boto3 documentation: “Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3.”

StringIO: Is an in-memory file-like object. StringIO provides a convenient means of working with text in memory using the file API (read, write. etc.). This object can be used as input or output to the functions that expect a standard file object. When the StringIO object is created it is initialized by passing a string to the constructor. If no string is passed the StringIO will start empty.

getvalue() method returns the entire content of the file.

Let’s check if the file is available on AWS S3 bucket “pysparkcsvs3”

csv file successfully uploaded to S3 bucket.

4. Read the AWS S3 file to Pandas DataFrame

REGION = ‘us-east-2’
ACCESS_KEY_ID = ‘xxxxxxxxx’
SECRET_ACCESS_KEY = ‘xxxxxxxxx’

BUCKET_NAME = ‘pysparkcsvs3’
KEY = ‘pysparks3/emp.csv’ # file path in S3

s3c = boto3.client(‘s3’, 
 region_name = REGION,
 aws_access_key_id = ACCESS_KEY_ID,
 aws_secret_access_key = SECRET_ACCESS_KEY)

obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)

emp_df = pd.read_csv(io.BytesIO(obj[‘Body’].read()),            encoding='utf8')
emp_df.head(5)

obj is the HTTP response in dictionary format, refer below.

get_object() retrieves objects from Amazon S3 buckets. To get objects user must have read access.

io.BytesIO() data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations.

Verify the data retrieved from S3.

Data retrieved from CSV file present in the AWS S3 bucket looks good and string-byte conversion is successfully done.

Summary:

· Pandas API connectivity with AWS S3

· Read and Write Pandas DataFrame to S3 Storage

· Boto3 for connectivity with S3

Thanks to all for reading my blog. Do share your views or feedback.

AWS S3 Read Write Operations Using the Pandas’ API was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

AWS S3 Read Write Operations Using the Pandas’ API

Author(s): Vivek Chaudhary

Programming

AWS S3 Read Write Operations Using the Pandas API

1. Prerequisite libraries

2. Read a CSV file using pandas

3. Write the Pandas DataFrame to AWS S3

4. Read the AWS S3 file to Pandas DataFrame

Summary:

Towards AI Team

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Exploring Deep Learning Models: Comparing ANN vs CNN for Image Recognition

LAI #72: From Python Groundwork to Function Calling, ICL Theory, and Load Balancing MoEs

Quantum AI Is Coming. Here’s What No One Is Telling You (But Should)

Tool Descriptions Are Critical: Making Better LLM Tools + Research Capability

Top 5 AI Chatbot projects to showcase on your Portfolio: with Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

AWS S3 Read Write Operations Using the Pandas’ API

Author(s): Vivek Chaudhary

AWS S3 Read Write Operations Using the Pandas API

1. Prerequisite libraries

2. Read a CSV file using pandas

3. Write the Pandas DataFrame to AWS S3

4. Read the AWS S3 file to Pandas DataFrame

Summary:

Towards AI Team

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥