How I Modified 67 Datasets At Once Using PROC SQL in SAS

A life saving tip for SAS users to manage your datasets in 3 steps.

Sijia Yue
3 min readJul 8, 2021
Photo by Caspar Camille Rubin on Unsplash

Introduction

SQL is programming language that widely used in data management. It is easy to catch up for beginners and extremely powerful with simple lines of code. PROC SQL is a procedure to execute SQL statements in SAS.

Today I will show you a small project I did at work, which I managed 67 dataset at once by using PROC SQL and Data Call Execute procedures in SAS. I believe it is really easy to learn and it is super efficient if you encounter the same challenge as me.

A Problem to Solve

The request was straightforward from my advisor. The client wants to design a survey so we were at the first step of generating the random selection list from all the household in Florida. We received 10 excel spreadsheets that in total contains the addresses for residents in all 67 counties in the Florida state. Each sheet represents one county and is named by the county name. Here is an example of the original dataset I have:

Example of the original dataset

To prepare the randomization list, I need to prepare each county as a separate dataset, adding record ID in this county and also a column representing the county name. Here is an example of one expected output dataset:

Example of the expected dataset

It is obviously a simple fix in any programming language if we only need to modify 3 or 6 datasets. However, no one really wants to code 67 times manually with millions of records in total. I then did some research online and got a easy and tidy solution for this request.

Solution:

I solved this question in 3 steps.

Step 1: Use the XLSX engine and PROC COPY to import data

To import all the counties all at once and name the datasets as the sheet name, XLSX engine and PROC Copy is a life saver. This is the sample code for loading one spreadsheet into SAS:

libname e1 xlsx 'C:\User\Florida_State_Address1.xlsx';
options validvarname=v7;
proc copy in=e1 out=work;
run; quit;

After step 1, I got 67 datasets in my Work library in seconds. All the datasets are names as their sheet name, i.e. the county names.

Step 2: Use PROC SQL to get the datasets names in a table

This is a preparation step to modifying the datasets. Use PROC SQL to generate table with all dataset names in a column and name it as memname:

proc sql;
create table work.control as
select memname
from dictionary.tables
where libname='WORK';
quit;

This procedure will create a table named “control” with only one column named “memname”. The column will store all the datasets’ name.

Step 3: Use Data Call Execute to modify all datasets at once

call execute is a powerful function to use in the data procedure in SAS. With the table of names that we generated from step 2, SAS could know which tables to work with. It will execute the same code to all the datasets from the table of step 2 in one data step. The code to add county column on each dataset:

data _null_;
set work.control end=lastone;
call execute('data counties.' ||memname||';');
call execute('set counties.'||memname||' indsname=name;');
call execute('County = scan(name, 2);');
call execute('run;');
if lastone then call execute('quit;');
run;

Code to add record ID on each dataset:

data _null_;
set work.control end=lastone;
call execute('data ' ||"'"||memname||"'"||'n;');
call execute('set ' ||"'"||memname||"'"||'n;');
call execute('Obs = _N_;');
call execute('run;');
if lastone then call execute('quit;');
run;

Conclusion

Even though SAS is a statistical programming language, there’s still lots of tricks for efficiency and reproducibility. Hope you guys liked this tip of PROC SQL and Data Call Execute! Thanks for reading.

--

--

Sijia Yue
0 Followers

Data Scientist in Healthcare Industry| Ex-Statistician