SAS Python Interaction
Last Updated on April 14, 2021 by Editorial Team
Author(s): Vivek Chaudhary
Programming
The objective of this article is to understand Python 3.x interaction with SAS 9.4 university edition. Read SAS datasets using python pandas library and manipulate datasets and write the result back toΒ SAS.
SAS University Edition is free SAS software that can be used for teaching and learning statistics and quantitative methods. The scope of this article is however limited to ETL operations with SAS andΒ Python.
#Note: Assuming SAS University edition is installed.
- SAS Library andΒ Datasets
A SAS library is a collection of one or more SAS files/datasets that are recognized by SAS and that are referenced and stored as a unit. Whenever a new session is created SAS automatically creates two libraries Work, temporary library, and SASUSER, permanent library.
A SAS dataset is a table with columns and rows. In SAS, the table is called a data set, a column is called a variable, and a row is called an observation. In each observation, each variable has a specificΒ value.
In simple terms or DB terms, a SAS library can be construed as a schema whereas a SAS dataset can be understood as a DB table with a matrix structure having rows andΒ columns.
2. Create Library andΒ Dataset
SAS command to create aΒ library:
libname mylib β<path>β;
SAS command to create aΒ dataset:
DATA mylib.emp;
infile β<path>/emp.csvβ
dlm=β,β
FIRSTOBS=2 DSD;
input EMPNO ENAME $ SAL DEPTNO COMM;
run;
DATA mylib.dept;
infile β<path>/dept.csvβ
dlm=β,β
FIRSTOBS=2 DSD;
input DEPTNO DNAME $ LOC $;
run;
Datasets in SAS are stored on the disk with SAS data format: sas7bdat.
3. Read SAS dataset usingΒ Python
import pandas as pd
emp_df= pd.read_sas(rβD:\VirtualMs\SAS University Edition\myfolders\emp.sas7bdatβ,encoding=βutf-8')
emp_df.head(10)
dept_df= pd.read_sas(rβD:\VirtualMs\SAS University Edition\myfolders\dept.sas7bdatβ,encoding=βutf-8')
dept_df.head(10)
Data Manipulation step, apply an equijoin to merge emp and dept datasets.
final_df=pd.merge(emp_df,dept_df[[βDEPTNOβ,βDNAMEβ,βLOCβ]],on=βDEPTNOβ,how=βinnerβ)
final_df.head(10)
Write Pandas dataframe toΒ disk:
final_df.to_csv(βD:\VirtualMs\SAS University Edition\myfolders\emppy.csvβ,index=False)
4. Create a SAS Table from the CSVΒ file
SAS program to create a table from the CSVΒ file:
DATA mylib.emp_py;
infile β/folders/myfolders/emppy.csvβ
dlm=β,β
FIRSTOBS=2 DSD;
input EMPNO ENAME $ SAL DEPTNO COMM DNAME $ LOC $;
run;
Explanation of the aboveΒ program:
Data Step: This step involves loading the required data set into SAS memory and identifying the variables (also called columns) of the data set. It also captures the records also called observations.
Infile: specify the input file path and name along with delimiter dlm, in our case β,β due to the CSVΒ file.
FIRSTOBS: specify the line to start reading from, 2 means skip header and read from the first observation, which is the actualΒ header.
To summarize, we have successfully read datasets from SAS using the Python Pandas library and written back the dataset to the disk, and then created a table out of the same dataset. Thatβs all for thisΒ blog.
Thank you for supporting theΒ content.
SAS Python Interaction was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Published via Towards AI