SAS Python Interaction
Last Updated on April 14, 2021 by Editorial Team
Author(s): Vivek Chaudhary
The objective of this article is to understand Python 3.x interaction with SAS 9.4 university edition. Read SAS datasets using python pandas library and manipulate datasets and write the result back to SAS.
SAS University Edition is free SAS software that can be used for teaching and learning statistics and quantitative methods. The scope of this article is however limited to ETL operations with SAS and Python.
#Note: Assuming SAS University edition is installed.
- SAS Library and Datasets
A SAS library is a collection of one or more SAS files/datasets that are recognized by SAS and that are referenced and stored as a unit. Whenever a new session is created SAS automatically creates two libraries Work, temporary library, and SASUSER, permanent library.
A SAS dataset is a table with columns and rows. In SAS, the table is called a data set, a column is called a variable, and a row is called an observation. In each observation, each variable has a specific value.
In simple terms or DB terms, a SAS library can be construed as a schema whereas a SAS dataset can be understood as a DB table with a matrix structure having rows and columns.
2. Create Library and Dataset
SAS command to create a library:
libname mylib ‘<path>’;
SAS command to create a dataset:
input EMPNO ENAME $ SAL DEPTNO COMM;
input DEPTNO DNAME $ LOC $;
Datasets in SAS are stored on the disk with SAS data format: sas7bdat.
3. Read SAS dataset using Python
import pandas as pd
emp_df= pd.read_sas(r’D:\VirtualMs\SAS University Edition\myfolders\emp.sas7bdat’,encoding=’utf-8')
dept_df= pd.read_sas(r’D:\VirtualMs\SAS University Edition\myfolders\dept.sas7bdat’,encoding=’utf-8')
Data Manipulation step, apply an equijoin to merge emp and dept datasets.
Write Pandas dataframe to disk:
final_df.to_csv(‘D:\VirtualMs\SAS University Edition\myfolders\emppy.csv’,index=False)
4. Create a SAS Table from the CSV file
SAS program to create a table from the CSV file:
input EMPNO ENAME $ SAL DEPTNO COMM DNAME $ LOC $;
Explanation of the above program:
Data Step: This step involves loading the required data set into SAS memory and identifying the variables (also called columns) of the data set. It also captures the records also called observations.
Infile: specify the input file path and name along with delimiter dlm, in our case ‘,’ due to the CSV file.
FIRSTOBS: specify the line to start reading from, 2 means skip header and read from the first observation, which is the actual header.
To summarize, we have successfully read datasets from SAS using the Python Pandas library and written back the dataset to the disk, and then created a table out of the same dataset. That’s all for this blog.
Thank you for supporting the content.
Published via Towards AI