Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

AWS S3 Read Write Operations Using the Pandas’ API
Data Science   Programming

AWS S3 Read Write Operations Using the Pandas’ API

Last Updated on January 25, 2021 by Editorial Team

Author(s): Vivek Chaudhary

Programming

AWS S3 Read Write Operations Using the Pandas API

The Objective of this blog is to build an understanding of basic Read and Write operations on Amazon Web Storage Service “S3”. To be more specific, read a CSV file using Pandas and write the DataFrame to AWS S3 bucket and in vice versa operation read the same file from S3 bucket using Pandas API.

1. Prerequisite libraries

import boto3
import pandas as pd
import io

2. Read a CSV file using pandas

emp_df=pd.read_csv(r’D:\python_coding\GitLearn\python_ETL\emp.dat’)
emp_df.head(10)

3. Write the Pandas DataFrame to AWS S3

from io import StringIO
REGION = ‘us-east-2’
ACCESS_KEY_ID = xxxxxxxxxxxxx’
SECRET_ACCESS_KEY = ‘xxxxxxxxxxxxxxxx’
BUCKET_NAME = ‘pysparkcsvs3’
FileName=’pysparks3/emp.csv’
csv_buffer=StringIO()
emp_df.to_csv(csv_buffer, index=False)
s3csv = boto3.client(‘s3’, 
region_name = REGION,
aws_access_key_id = ACCESS_KEY_ID,
aws_secret_access_key = SECRET_ACCESS_KEY
)
response=s3csv.put_object(Body=csv_buffer.getvalue(),
Bucket=BUCKET_NAME,
Key=FileName)

As per Boto3 documentation: “Boto is the Amazon Web Services (AWS) SDK for Python. It enables Python developers to create, configure, and manage AWS services, such as EC2 and S3.

StringIO: Is an in-memory file-like object. StringIO provides a convenient means of working with text in memory using the file API (read, write. etc.). This object can be used as input or output to the functions that expect a standard file object. When the StringIO object is created it is initialized by passing a string to the constructor. If no string is passed the StringIO will start empty.

getvalue() method returns the entire content of the file.

Let’s check if the file is available on AWS S3 bucket “pysparkcsvs3

csv file successfully uploaded to S3 bucket.

4. Read the AWS S3 file to Pandas DataFrame

REGION = ‘us-east-2’
ACCESS_KEY_ID = ‘xxxxxxxxx’
SECRET_ACCESS_KEY = ‘xxxxxxxxx’
BUCKET_NAME = ‘pysparkcsvs3’
KEY = ‘pysparks3/emp.csv’ # file path in S3
s3c = boto3.client(‘s3’, 
region_name = REGION,
aws_access_key_id = ACCESS_KEY_ID,
aws_secret_access_key = SECRET_ACCESS_KEY)
obj = s3c.get_object(Bucket= BUCKET_NAME , Key = KEY)
emp_df = pd.read_csv(io.BytesIO(obj[‘Body’].read()),            encoding='utf8')
emp_df.head(5)

obj is the HTTP response in dictionary format, refer below.

get_object() retrieves objects from Amazon S3 buckets. To get objects user must have read access.

io.BytesIO() data can be kept as bytes in an in-memory buffer when we use the io module’s Byte IO operations.

Verify the data retrieved from S3.

Data retrieved from CSV file present in the AWS S3 bucket looks good and string-byte conversion is successfully done.

Summary:

· Pandas API connectivity with AWS S3

· Read and Write Pandas DataFrame to S3 Storage

· Boto3 for connectivity with S3

Thanks to all for reading my blog. Do share your views or feedback.


AWS S3 Read Write Operations Using the Pandas’ API was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓