Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Code Migration: SAS to Python
Latest   Machine Learning

Code Migration: SAS to Python

Last Updated on July 20, 2023 by Editorial Team

Author(s): Vivek Chaudhary

Originally published on Towards AI.

Programming

The objective of this article is to build a fundamental understanding of SAS programming and rewrite the feature in the Python programming language. As a part of the demo, I have to build a tiny ETL process in SAS and redesigned the same in Python.

  1. Create SAS library and Import Datasets
--SAS library creation
libname mylib '<path>';
--import emp dataset
DATA mylib.emp; --> Data step
infile '/folders/myfolders/emp.csv'
dlm=','
FIRSTOBS=2 DSD;
input EMPNO ENAME $ SAL DEPTNO COMM;
run;
emp dataset
DATA mylib.dept; --> Data step
infile '/folders/myfolders/dept.csv'
dlm=','
FIRSTOBS=2 DSD;
input DEPTNO DNAME $ LOC $;
run;
dept dataset

2. SAS Programming structure explanation

SAS programming involves three basic steps:- Data Step, Proc Step, and Output Step.

DATA Step involves loading dataset into SAS memory (RAM), variable identification (columns are referred to as variables in SAS). In short, this step is a combination of parsing and data loading. Data or records are referred to as observations in SAS.

PROC Step involves the data analysis part, basically to invoke SAS built-in procedure to analyze the data.

OUTPUT Step: prints or stores the result of the proc step.

3. SAS ETL

The next step is to perform join on the above emp and dept datasets loaded into SAS memory. But SAS demands data be sorted before joining happens.

Sort the EMP dataset as the DEPT dataset is already sorted on join key “DEPTNO.”

proc sort data=mylib.emp; --> Proc step
by DEPTNO;

The above step is the proc step.

Next is to perform merge operation on EMP & DEPT dataset.

DATA mylib.emp_dept; --> equi join
merge mylib.dept(in=X) mylib.emp(in=Y);
by DEPTNO;
if X and Y;
run;

The default join strategy of SAS is Full Outer Join, which doesn’t require mention of any condition. In our case, the join is Equi join.

Let’s break up the join and try to understand what is what?

Merge: in SAS, a merge is a keyword for join operation.

IN: in=val, represents that keep only matching values.

By: mentions join key.

Reorder the SAS variables (columns) as the business demands the output to be in a specific format.

#reorder SAS variables
DATA mylib.emp_dept;
retain EMPNO ENAME DEPTNO DNAME SAL COMM LOC;
set mylib.emp_dept;
run;
reordered output as below

Note: Nulls in SAS is denoted by ‘.’

The next step is to write the table data to a CSV file.

#write data to csv file
proc export data=mylib.emp_dept
outfile='/folders/myfolders/emp_dept.csv'
dbms= csv
replace; --> to overwrite existing file, truncate load
run;

We have successfully completed the ETL process using SAS programming. The next step is to replicate or redesign the same code using Python programming.

4. Python replica of SAS program

Import prerequisite library and dataset.

import pandas as pd
emp_df= pd.read_csv(r’D:<path>\emp.csv’)
emp_df.head(10)
emp dataset
dept_df= pd.read_csv('D:<path>\dept.csv’)
dept_df.head(5)
dept dataset
#merge the dataset to perform inner join
final_df=pd.merge(emp_df,dept_df[[‘DEPTNO’,’DNAME’,’LOC’]],on=’DEPTNO’,how=’inner’)
final_df.head(10)
emp-dept merged data

The next step is to write the merged dataset to a CSV file.

#write pandas df to csv file
final_df.to_csv(‘D:<path>\emp_dept.csv’,index=False)

We have successfully written output in CSV format and replicated the SAS application code in python as well.

To summarize, first, we have designed a simple ETL application using SAS programming and then redesigned the same application using Python programming.

Thank you for supporting the content.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓