3. Getting Started with SAS Programming

Let’s get started! This lesson is an overview of what the SAS statistical software application is all about. The purpose here is to get you acquainted with how the application works and is organized so that you can begin writing programs to manage and analyze the data that you have collected.

Keep your eye out for important syntax rules for what the SAS application expects or needs from the user in order to perform the tasks that you want it to do.

3.1. A Basic SAS Program

Since SAS is a programming language, let’s start by looking at a simple SAS program, such as this one:

/****************
This program reads in a set of grades for six students, 
and prints out their student numbers and genders
******************/
 
DATA grade;
    InPuT subject gender $
        exam1 exam2 hwgrade $;
    LABEL exam1 = "Exam 1 grade";
    FORMAT exam2 4.2;
    DATELINES;
    10 M 80 84 A
     7 . 85 89 A
     4 F 90 .  B
    20 M 82 85 B
    25 F 94 94 A
    14 F 88 84 C
    ;
RUN;
 
PROC PRINT data=grade;
    var subject gender; * print student ID and gender;
run;
SAS Output

The SAS System

Obs subject gender
1 10 M
2 7  
3 4 F
4 20 M
5 25 F
6 14 F

Don’t worry just yet about understanding the “code” for this program. Just trust that this program reads in and prints out a set of grades for six students. The lines between the DATA statement and the first RUN statement tell SAS to read in the grades. And, the lines between the PROC PRINT statement and the second RUN statement tell SAS to print out the student number and gender of each of the six students.

As is true for any other programming language, a SAS program is a series of instructions written in the SAS language that are executed in order. That is, just as you read the words on this page, SAS reads and executes programs from top to bottom and from left to right. And, just as I must adhere to language and grammar rules that allow you to understand what I am saying, you must adhere to a certain set of rules known as “syntax” in order for SAS to be able to read and run your programs properly.

3.2. Basic SAS Program Requirements

Here are the basic set of requirements every SAS program must follow. As you read through them, you might want to refer back to the above program to see that each of the rules is indeed followed.

Rules for SAS Statements. The basic requirements for SAS statements are:

  • All SAS statements (except those containing actual data) must end with a semicolon (;). (“DATA grade;” is an example of a SAS statement. “DATALINES;” is another.)

  • SAS statements typically begin with a SAS keyword. (Examples in the above program include OPTIONS, TITLE, DATA, INPUT, DATALINES, RUN, PROC, and VAR.)

  • SAS programs can be freely formatted:

    • Any number of SAS statements can appear on a single line provided they are separated by a semicolon. (The second to last line is such an example.)

    • A SAS statement can be continued from one line to the next as long as no word is split. (The statement beginning with “InPuT …” is such an example.)

    • SAS statements can begin in any column.

  • SAS statements are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (The statement beginning with “InPuT …” is such an example.)

  • The words in SAS statements are separated by blanks or special characters (e.g. =, +, or *).

  • Comments may (and should!) be used to annotate your program. Two methods are:

    • A delimited comment begins with a forward slash-asterisk (/) and ends with an asterisk-forward slash (/). All text within the delimiters is ignored by SAS. (The first five lines of the program constitute one such comment.)

    • An alternative comment begins with an asterisk () and ends with a semicolon (;). All text between the asterisk () and the semicolon (;) is ignored by SAS. (The second statement in the second to the last line constitutes such a comment.)

Rules for SAS Names. SAS names are used for SAS data set names, variable names, and other such items. An example of a data set name appearing in the above program is grade. Two examples of variable names are subject and exam2. Note that each of the names appearing in the program adheres to the following rules:

  • All names must contain between 1 and 32 characters.

  • The first character appearing in a name must be a letter (A, B, …Z, a, b, … z) or an underscore (_). Subsequent characters must be letters, numbers, or underscores. That is, no other characters, such as $, %, or & are permitted. Blanks also cannot appear in SAS names.

  • SAS names are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (SAS is only case sensitive within quotation marks.)

PROC steps and DATA steps. The DATA step and PROC step — that’s PROC for “procedure” — are the basic building blocks of any SAS program:

  • Any portion of a SAS program that begins with a DATA statement and ends with a RUN statement, another DATA statement, or a PROC statement is called a DATA step.

  • Any portion of a SAS program that begins with a PROC statement and ends with a RUN statement, a DATA statement, or another PROC statement is called a PROC step.

In general, DATA steps are used to manage data. For example, DATA steps are used to read data into a SAS data set, to modify data values, to check for and correct data errors, and to subset or merge data sets. PROC steps, on the other hand, are pre-written routines that allow us to analyze the data contained in a SAS data set. For example, PROC steps are used to calculate descriptive statistics, to generate summary reports, and to create summary graphs and charts.

In the above program, all of the statements appearing between the “DATA grade;” statement and the first “RUN;” statement make up the one and only one DATA step appearing in the program. And, all of the statements appearing between the “PROC PRINT data = grade;” statement and the second “RUN;” statement make up the one and only one PROC step appearing in the program.

Note! By definition, SAS will execute DATA statements and most PROC statements when another DATA or PROC statement is called in the absence of the RUN statement. It is good programming practice, however, to close all DATA and PROC statements with a RUN statement. Also, note that DATA and PROC statements must be written as distinct operations in your SAS code, that is, you cannot combine a PROC step within a DATA step and vice versa.

3.3. SAS Data Sets

In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.

The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade which you can view by runnning PROC CONTENTS on your dataset:

ODS SELECT Attributes;
proc contents data = work.grade;
run;
ODS SELECT ALL;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.GRADE Observations 6
Member Type DATA Variables 5
Engine V9 Indexes 0
Created 09/15/2020 23:38:06 Observation Length 40
Last Modified 09/15/2020 23:38:06 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table which you can view by using PROC PRINT, such as this:

proc print data = work.grade;
run;
SAS Output

The SAS System

Obs subject gender exam1 exam2 hwgrade
1 10 M 80 84 A
2 7   85 89 A
3 4 F 90 . B
4 20 M 82 85 B
5 25 F 94 94 A
6 14 F 88 84 C

In this example, the subject id 10 is a data value, the gender code M is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just five variables — the subject, gender, exam1 (exam 1 grade), exam2 (exam 2 grade), and hwgrade (homework grade) of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 10, M, 80, 84, and A constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.

3.4. SAS Variables

One thing you might have noticed about our data set:

proc print data = work.grade;
run;
SAS Output

The SAS System

Obs subject gender exam1 exam2 hwgrade
1 10 M 80 84 A
2 7   85 89 A
3 4 F 90 . B
4 20 M 82 85 B
5 25 F 94 94 A
6 14 F 88 84 C

is that there are two different types of variables – there are three numeric variables (subject, exam1 and exam2) and two character variables (gender and hwgrade). The type of variable is just one of six variable attributes that SAS stores in the descriptor portion of every SAS data set. The six attributes that SAS stores are:

  • the variable’s name

  • the variable’s type

  • the variable’s length

  • the variable’s format (if any)

  • the variable’s informat (if any)

  • and the variable’s label (if any)

As suggested by the presence of the (“if any”) phrase in the above list, an informat, format, and label do not exist for every variable. A name, type, and length, on the other hand, do exist for every variable. The following is a partial listing of what might be the attribute information in the descriptor portion of our SAS data set:

ODS SELECT Variables;
proc contents data = work.grade;
run;
ODS SELECT ALL;
SAS Output

The SAS System

The CONTENTS Procedure

Alphabetic List of Variables and Attributes
# Variable Type Len Format Label
3 exam1 Num 8   Exam 1 grade
4 exam2 Num 8 4.2  
2 gender Char 8    
5 hwgrade Char 8    
1 subject Num 8    

Let’s investigate each of the six attributes briefly.

Variable names. There’s not much more to say about a variable’s name that hasn’t already been said on the SAS basics page. That is, as long as your variable names conform to SAS naming conventions, you’re good to go. In case you haven’t memorized it yet, variable names must be between 1 and 32 characters long, must begin with either an uppercase letter, lowercase letter or an underscore (_), and thereafter can contain any combination of numbers, letters, or underscores.

Variable types. As mentioned earlier, a variable is either identified as being character or numeric. Character variables, such as name, can contain any character that you can make with your keyboard (letters, numbers, !@#$%^&( )_+, … you get the idea). Numeric variables, on the other hand, such as id, height, and weight, can contain only numeric values — namely, the digits 0 through 9, a positive sign (+), a negative sign (-), a decimal place (.), and the capital letter E for scientific notation.

Another thing you might have noticed about our data set above is that some information in the data set is missing — the gender of the person whose subject ID is 7 is missing, and the exam2 score of the person whose subject id is 4 is missing. That’s okay — SAS can handle missing data. What SAS displays when a value is missing depends on the variable’s type. As suggested by the above data set, SAS displays a blank space for a missing character value and a period (.) for a missing numeric value.

Variable length. A variable’s length tells us how many bytes are used to store the variable in your computer’s memory. Character variables can be up to 32,767 bytes long. In our data set, the variable gender has a length of 7 characters and therefore uses 7 bytes of storage. All numeric variables have a default length of 8. Numeric values (no matter how many digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless you specify a different length. It shouldn’t be surprising therefore that the length of each of the numeric variables in our data set is 8.

Variable formats and informats. We’ll learn about these two attributes in greater detail later in the course. For now, just know that a variable’s format tells SAS how you’d like your variable’s values displayed in reports. For example, you might want to tell SAS to display the value 5391 as $5391.00 or maybe 5,391 instead. Whereas formats tell SAS how to write a variable’s values, informats tell SAS how to read data values having a special form into standard SAS values.

Variable labels. If you want, you can give your variables descriptive labels up to 256 characters long. By default, many of the standard reports in SAS identify variables by their names. You can instead tell SAS to display more descriptive information about the variable by assigning a label to the variable. In our data set above, the variable exam1 has a label of “Exam 1 grade”.

That’s about all you need to know about SAS variables for now! Let’s now delve into how to interact with the SAS Windowing Environment, so we can first write and then run some SAS programs.

3.5. Guidelines for Formatting and Commenting SAS Programs

Regardless of the programming language used, there is a basic set of good programming practices to which any good programmer will adhere. Two good programming practices concern the formatting and commenting of your programs. Therefore, let’s close this lesson by reviewing some guidelines for formatting and commenting your SAS programs. Throughout this course (and beyond!), you should plan on adhering to the following guidelines:

/********************
This SAS program blah, blah, blah, ..........
*********************/
 
*Convert Fahrenheit to celsius;

As mentioned earlier, comment statements allow you to document your program without affecting processing. A delimited comment, which begins with a forward-slash-asterisk (/) and ends with an asterisk-forward-slash (/), is useful for creating large blocks of comments. All text within the delimiters are ignored. An alternative type of comment begins with an asterisk (*) and ends with a semicolon (;). Examples:

Although SAS allows for free-formatted code, a good SAS program will be well organized.

/******************
Filename: /home/lsimon/stat597c/sas/temp.sas
Written by: laura J. Simon
Date: January 9, 1996
 
This program calculates the average number of days 
that the tempreture falls below freezing in State College, PA
 
Input: C:\data\temps.dat
Output: average number of days below freezing by month 
        stored in C:\data\temps.ssd
******************/

Every SAS program should start with a main block of comments, emphasized by asterisks. The block of comments should include the filename, by whom the program is written, the date on which the program was written, and text that clearly describes the main purpose, input and output of the program. Example:

Every critical DATA step or PROC step should be preceded by a block of comments, emphasized by asterisks, which describes the primary purpose of the step. The block of comments should also include any critical information, such as variable names, input and output of the block of code.

Comments that pertain to a single line of code are useful, e.g. for describing what an expression is calculating, describing a new variable and how it is calculated, why the dataset is subsetted on a particular set of values, and so on. Example:

At least one line should separate any PROC or DATA steps within your SAS program.

data grade;
  set grade;
  exam_avg = (exam1 + exam2) / 2; *Calculate the average exam grade;
run;

proc print data = grade;
run;

To help offset blocks of code, it is useful (but not necessary) to capitalize PROC PROCNAME, DATA, and RUN. Examples:

DATA grade;
  set grade;
  exam_avg = (exam1 + exam2) / 2; *Calculate the average exam grade;
RUN;

PROC PRINT data = grade;
RUN;

3.6. Exercises

  1. Submit the following simple SAS program. What does it do?

    PROC PRINT DATA = sashelp.heart (OBS=10);
    RUN;
    
    PROC MEANS DATA = sashelp.heart;
      VAR height weight;
    RUN;
    
  2. Submit the following small program. How many DATA steps are there? How man PROC steps? Which steps are executed?

    DATA work.quant;
      SET sashelp.heart; /* Read in from the heart datset */
      KEEP height weight diastolic systolic; *Only keep these variable 
                                          in the new dataset;
    RUN;
    
    PROC PRINT DATA = work.quant (OBS=10);
    RUN;
    
    /*
    PROC MEANS DATA = work.quant MIN Q1 Median Q3 MAX MEAN STD;
    RUN;
    */
    
  3. Run PROC CONTENTS on the dataset sashelp.heart. How many observations are in this datset?

  4. Submit the following program to SAS. What is the value of CurrentDate? (This value represents the number of days since Januar 1, 1960, the reference date for the SAS system).

    DATA work.date;
       CurrentDate = Today();
    RUN;
    
    PROC PRINT DATA = work.date;
    RUN;