Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

AWS Redshift ETL using Pandas API
Cloud Computing

AWS Redshift ETL using Pandas API

Last Updated on January 28, 2021 by Editorial Team

Author(s): Vivek Chaudhary

Cloud Computing

The Objective of this blog is to perform a simple ETL exercise with AWS Redshift Database. Oracle Database tables are used as the source dataset, perform simple transformations using Pandas methods on the dataset and write the dataset into AWS Redshift table.

AWS Redshift ETL using Pandas API
  1. Import prerequisites and connection with source Oracle:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(‘oracle://scott:scott@oracle’, echo=False)

2. Extract Datasets from Oracle Database:

#Employee Dataset
emp_df=pd.read_sql_query(‘select * from emp’,engine)
emp_df.head(10)
#Department Dataset
dept_df=pd.read_sql_query(‘select * from dept’,engine)
dept_df.head(10)

3. Transform Dataset

Create AWS Redshift Target Table using the below script:

create table emp (
empno integer,
ename varchar(20),
sal integer,
comm float,
deptno integer,
dname varchar(20)
);

Join the EMP and DEPT datasets:

joined_df=pd.merge(emp_df,dept_df,left_on=’deptno’,right_on=’deptno’,how=’inner’)
joined_df.head(10)

Drop the the columns that are not present in target:

joined_df.drop(columns=['job','mgr','hiredate','loc'],inplace=True)
joined_df.head(10)

4. Create Redshift connection and insert data

#create connection object
conn=create_engine(‘postgresql+psycopg2://<dbuser>:<dbpassword>@<cluster_endpoint_URL>:5439/<dbname>’)
joined_df.to_sql(‘emp’, conn, index=False, if_exists=’append’)

Verify the data in the Redshift table.

AWS Redshift console

Querying the “emp” table from AWS console, we can also set up SQLWorkbench on local system to query Redshift tables.

DML operation is successful.

5. Connectivity issue I faced

OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out (0x0000274C/10060) Is the server running on host “redshift_cluster_name.unique_here.region.redshift.amazonaws.com” (<IP address>) and accepting TCP/IP connections on port 5439?

Issue Description

The issue was that the inbound rule in the Security Group specified a security group as the source. Changing it to a CIDR that included my IP address fixed the issue.

How to Fix?

Go to Cluster Properties → Network Security

GO to VPC Security Group → Inbound rules →Edit inbound rules and Add both below rules → Click Save Rules.

And we are ready to go. In absence of the second rule, there might be a situation where one may face connectivity issues with AWS Redshift DB. So follow the above steps to avoid/resolve the issue.

Thanks to all for reading my blog. Do share your views or feedback.


AWS Redshift ETL using Pandas API was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.


Comments are closed.