14. SAS MACRO Programming

This lesson introduces the most commonly used features of the SAS macro language. When you write a program that will be run over and over again, you might want seriously to consider using “macros” in your code, because:

  • macros allow you to make a change in one location of your program, so that SAS can cascade the change throughout your program

  • macros allow you to write a section of code once and use it over and over again, in the same program or even different programs

  • macros allow you to make programs data driven, letting SAS decide what to do based on actual data values.

For a good introduction with some more examples, see

  • SAS Macro Programming for Beginners

  • Chapter 7: Writing Flexible Code with the SAS Macro Facility in the Little SAS Book: A Primer, 5th ed.

  • Chapter 25: Introducing the SAS Macro Language from Learning SAS by Example: A Programmer’s Guide, 2nd ed

For our examples in this chapter, we will use the Framingham dataset which was a large longitudinal, epidemiologic study of the risks of heart disease. Be sure to download and the SAS dataset file fghm113.sas7bdat and place it in a convenient folder. Then change the LIBNAME path to point to this folder in the following code.

LIBNAME myData "/folders/myfolders/SAS_Notes/data";

*Create a temporary data set, so that we don't save changes to the original data set.;
DATA fghmTemp;
SET myData.fghm113;
RUN;

/*Now let's code some variables with some more descriptive values.
  SEX (Gender): 1=Men 
				2=Women
  Period (Examination cycle): 1=Period1 
							  2=Period2
							  3=Period3
  BPMEDS (Use of anti-hypertensive meds): 0=Not currently
										  1=Currently use
  CURSMOKE (Currently smoke?): 0=No
							   1=Yes
  DIABETES: 0=Not diabetic
			1=Diabetic
  PREVAP (Have angina pectoric?): 0=No
								  1=Yes
  PREVCHD (Coronary heart disease?): 0=No
									 1=Yes
  PREVMI (Myocardial infarction?): 0=No
								   1=Yes
  PREVSTRK (Had a stroke?): 0=No
				   			1=Yes
  PREVHYP (Hypertensive? sys bp >=140 or dyas bp >= 90): 0=no
														 1=yes
*/

PROC FORMAT;
VALUE YNfmt 0="No"
			1="Yes";
VALUE perfmt 1="Period 1"
			 2="Period 2"
			 3="Period 3";
VALUE gndrfmt 1="Men"
			  2="Women";
RUN;

DATA fghmtemp;
SET fghmtemp;
FORMAT prevap ynfmt.
	   diabetes ynfmt.
	   cursmoke ynfmt.
	   bpmeds ynfmt.
	   prevchd ynfmt.
	   prevmi ynfmt.
	   prevstrk ynfmt.
	   prevhyp ynfmt.
	   sex gndrfmt.;
RUN;

*Check to see if the formatting was done correctly;
PROC CONTENTS DATA = fghmtemp;
RUN;

PROC PRINT DATA = fghmtemp (obs=5);
RUN;
SAS Connection established. Subprocess id is 2406
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.FGHMTEMP Observations 500
Member Type DATA Variables 21
Engine V9 Indexes 0
Created 10/10/2020 19:58:41 Observation Length 168
Last Modified 10/10/2020 19:58:41 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 359
Number of Data Set Repairs 0
Filename /tmp/SAS_workA8FB00000966_localhost.localdomain/fghmtemp.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671645
Access Permission rw-r--r--
Owner Name sasdemo
File Size 192KB
File Size (bytes) 196608
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
4 AGE Num 8 BEST12. F12. Age (years) at examination
9 BMI Num 8 BEST12. F12. Body Mass Index (kr/(M*M)
11 BPMEDS Num 8 YNFMT. F12. Anti-hypertensive meds Y/N
8 CIGPDAY Num 8 BEST12. F12. Cigarettes per day
7 CURSMOKE Num 8 YNFMT. F12. Current Cig Smoker Y/N
10 DIABETES Num 8 YNFMT. F12. Diabetic Y/N
6 DIABP Num 8 BEST12. F12. Diastolic BP mmHg
13 GLUCOSE Num 8 BEST12. F12. Casual Glucose mg/dL
20 HDLC Num 8 BEST12. F12. HDL Cholesterol mg/dL
12 HEARTRTE Num 8 BEST12. F12. Ventricular Rate (beats/min)
21 LDLC Num 8 BEST12. F12. LDL Cholesterol mg/dL
19 PERIOD Num 8 BEST12. F12. Examination cycle
15 PREVAP Num 8 YNFMT. F12. Prevalent Angina
14 PREVCHD Num 8 YNFMT. F12. Prevalent CHD (MI,AP,CI)
18 PREVHYP Num 8 YNFMT. F12. Prevalent Hypertension
16 PREVMI Num 8 YNFMT. F12. Prevalent MI (Hosp,Silent)
17 PREVSTRK Num 8 YNFMT. F12. Prevalent Stroke (Infarct,Hem)
2 RANDID Num 8 BEST12. F12. Random ID
1 SEX Num 5 GNDRFMT. F12. SEX
5 SYSBP Num 8 BEST12. F12. Systolic BP mmHg
3 TOTCHOL Num 8 BEST12. F12. Serum Cholesterol mg/dL

The SAS System

Obs SEX RANDID TOTCHOL AGE SYSBP DIABP CURSMOKE CIGPDAY BMI DIABETES BPMEDS HEARTRTE GLUCOSE PREVCHD PREVAP PREVMI PREVSTRK PREVHYP PERIOD HDLC LDLC
1 Women 11263 220 55 180 106 No 0 31.17 Yes Yes 86 81 No No No No Yes 3 46 135
2 Men 16365 211 55 173 123 No 0 29.11 No Yes 75 85 No No No No Yes 3 48 163
3 Women 43770 283 64 159 69 No 0 32.93 Yes No 70 230 No No No No Yes 3 45 238
4 Men 47561 290 56 132 70 Yes 40 22.73 No No 100 90 No No No No Yes 3 54 236
5 Women 55965 298 72 109 60 No 0 26.2 No No 80 100 No No No No No 3 38 260

14.1. MACRO Variables and %LET

The general form of a %LET statement is:

%LET macro-variable-name = value;

where macro-variable-name is a name you make up following the standard rules for SAS names (32 characters or fewer in length, starting with a letter or underscore, and containing only letters, numerals or underscores). Value is the text to be substituted for the macro variable name, and can be up to 64,000 characters long. The following statements each create a macro variable.

%LET iterations = 5;

%LET winner = Lance Armstrong;

Notice that there are no quotation marks around value even when it contains characters. Blanks at the beginning and end will be trimmed, and everything else between the equal sign and semicolon will become part of the value for the MACRO variable.

To use the MACRO variable, you simply add the ampersand (&) prefix and stick the MACRO variable name where you want its value to be substituted. For example,

DO i = 1 TO &iterations;

TITLE "First: &winner";

When we run a program with the previous lines of code, it will first go through the MACRO processor to substitute the values of the MACRO variable to generate the SAS code:

DO i = 1 TO 5;

TITLE "First: Lance Armstrong";

and then it runs the resulting code as normal.

Example

The following SAS program uses three MACRO variables to specify the dataset name and the two variable names from this dataset to use to create a contingency table and chi-square test output. In this case, we produce a chi-square test and contingency table between diabetes and previous cardiovascular heart disease.

%LET response = prevchd;
%LET predictor = diabetes;
%LET dataset = fghmtemp;

PROC FREQ DATA = &dataset;
   TITLE "&predictor vs &response";
   TABLE &predictor * &response / CHISQ;
RUN;

TITLE;
SAS Output

diabetes vs prevchd

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  
Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500

Note that the MACRO variables response and predictor are used to update both the TABLE statement and the TITLE statement. After running through the MACRO processor, the following SAS code is generated and run:


PROC FREQ DATA = fghmtemp;
   TITLE "diabetes vs prevchd";
   TABLE diabetes * prevchd / CHISQ;
RUN;

This was a short program, with only a few occurrences of the MACRO variables. But imagine if you had a long program with dozens of occurrences of MACRO variables. You could save a lot of time and trouble by changing the MACRO variable only once at the beginning.

14.2. MACRO Functions

Anytime you find yourself repeated the same program statements over and over, you might want to consider creating a MACRO instead. MACROS are simply a group of SAS statememts that have a name. And, anytime you want to run that group of statements in your program, you use the name instead of re-typng all of the statements.

The general form of a MACRO is


%MACRO macro-name(parameter-1=, parameter-2=,..., parameter-n= );
   macro-text
%MEND macro-name;

The %MACRO statement tells SAS that this is the beginning of the MACRO and %MEND statement signals the end of the MACRO. Macro-name is a name you make up for your MACRO, and parameter-1, parameter-2, …,parameter-n are inputs to your MACRO function. These parameters are MACRO variables defined within your MACRO and are used just like %LET MACRO variables by prefixing with the ampersand (&).

For example, if you have a MACRO names %MONTHLYREPORT that has parameters for the month and region, it might start like this:

%MACRO monthlyreport(month=, region= );

Then, when you invoke the macro, specify the values for the MACRO variables after the equal signs:

%monthlyreport(month=May, region=West);

Example

The following SAS program turns our previous SAS code using %LET MACRO variables into a MACRO function to produce a chi-square test with contingency table bewteen two specified variables in a given dataset.

%MACRO twobytwo(dataset, predictor, response);
PROC FREQ DATA = &dataset;
    TITLE "&predictor vs &response";
    TABLE &predictor*&response / CHISQ;
RUN;
TITLE;
%MEND twobytwo;

%twobytwo(fghmtemp, diabetes, prevchd);
SAS Output

diabetes vs prevchd

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  
Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500

Note that we get the same output as before, but now we have used a MACRO function. Also note that we did not use the parameter names. When not using the names to specify the parameter values, you must be careful to be sure that you give them in the correct order. In our example, the MACRO twobytwo expects the dataset first, then the predictor (row variable) and finally the response (column variable).

14.3. MPRINT System Option

We have shown you what SAS sees after the MACRO processor has resolved the program, but normally you won’t see these statements. However, if you specify the MPRINT system option in your program, then SAS will print the resolved statements from MACROS in the SAS log. This can be useful for debugging purposes. To turn on the MPRINT option submit an OPTIONS statement like this:

OPTIONS MPRINT;

Here is what the SAS log looks like. (Note we have changed the MACRO by adding the NOPRINT option to PROC FREQ to suppress the output for this example.)

%MACRO twobytwo(dataset, predictor, response);
PROC FREQ DATA = &dataset NOPRINT;
    TITLE "&predictor vs &response";
    TABLE &predictor*&response / CHISQ;
RUN;
TITLE;
%MEND twobytwo;

OPTIONS MPRINT;

%twobytwo(fghmtemp, diabetes, prevchd);

OPTIONS NOMPRINT;

131  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
131! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
132
133 %MACRO twobytwo(dataset, predictor, response);
134 PROC FREQ DATA = &dataset NOPRINT;
135 TITLE "&predictor vs &response";
136 TABLE &predictor*&response / CHISQ;
137 RUN;
138 TITLE;
139 %MEND twobytwo;
140
141 OPTIONS MPRINT;
142
143 %twobytwo(fghmtemp, diabetes, prevchd);
MPRINT(TWOBYTWO): PROC FREQ DATA = fghmtemp NOPRINT;
MPRINT(TWOBYTWO): TITLE "diabetes vs prevchd";
MPRINT(TWOBYTWO): TABLE diabetes*prevchd / CHISQ;
MPRINT(TWOBYTWO): RUN;
NOTE: Processing will terminate because there are no valid requests for displayed output or output data sets.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

MPRINT(TWOBYTWO): TITLE;
144
145 OPTIONS NOMPRINT;
146
147 ods html5 (id=saspy_internal) close;ods listing;

148

You can see that SAS has inserted into the regular SAS log the MPRINT lines. The statements generated by the MACRO processor are labeled with the word MPRINT followed by the name of the MACRO that generated the statements, in this case twobytwo. By using the MPRINT system option it is easy to see the standard SAS statements your MACRO is generating.

14.4. MACRO %DO Loop

DO loops are useful for simplifying repetitive code, but a regular DO loop can only be used inside a DATA step. What if we want to repeat the same procedure with minor changes such as different variables. In this case, a MACRO %DO loop is useful. The general form of this statement is:


%DO index-variable = start TO end;
   SAS code;
%END;

The index-variable serves as the counter variable for the loop and is a MACRO variable that runs from start to end. This type of loop can be used outside of PROC and DATA steps to loop over multiple of these steps.

Example

Suppose we want to run our two by two table code but for many different variables such as in the following SAS code.

PROC FREQ DATA = fghmtemp;
   TITLE "prevap vs prevchd";
   TABLE prevap * prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
   TITLE "diabetes vs prevchd";
   TABLE diabetes * prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
   TITLE "prevmi vs prevchd";
   TABLE prevmi * prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
   TITLE "prevstrk vs prevchd";
   TABLE prevstrk * prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
   TITLE "prevhyp vs prevchd";
   TABLE prevhyp * prevchd / CHISQ;
RUN;

The code is all the same with just the row variables changing between each PROC FREQ call. Instead of writing this code over and over, we can instead use a %DO loop. To keep the output shorter, we will only use the first two PROC FREQ calls.

%MACRO twobytwov2(dataset, predictor, response);
   %DO i=1 %TO %SYSFUNC(countw(&predictor, ' '));
        %LET dep = %SCAN(&predictor, &i);
        PROC FREQ DATA = &dataset;
            TITLE "&dep vs &response";
            TABLE &dep * &response / CHISQ;
        RUN;
        TITLE;
    %END;
%MEND;

*%twobytwov2(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd);
%twobytwov2(fghmtemp, prevap diabetes, prevchd);
SAS Output

prevap vs prevchd

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
88.60
96.30
100.00
17
3.40
3.70
29.82
460
92.00
 
 
Yes
0
0.00
0.00
0.00
40
8.00
100.00
70.18
40
8.00
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVAP by PREVCHD

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 337.9100 <.0001
Likelihood Ratio Chi-Square 1 209.3011 <.0001
Continuity Adj. Chi-Square 1 328.4425 <.0001
Mantel-Haenszel Chi-Square 1 337.2342 <.0001
Phi Coefficient   0.8221  
Contingency Coefficient   0.6350  
Cramer's V   0.8221  
Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


diabetes vs prevchd

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  
Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500

In this MACRO, we instead pass in a list of row variables to the MACRO parameter predictor separated by a space. Previously, we learned about the COUNTW and SCAN functions, but now we need to use them outside of a DATA step. To use COUNTW, we need to use the MACRO function %SYSFUNC to evaluate this function when it is outside the DATA step. Recall that COUNTW counts the number of words, in this case separated by a space, in a list. SAS already has a MACRO version of SCAN, %SCAN, that we can use to go through the list of row variables. The rest of the code is the same as before, but now we need to walk through a list of row variables in predictor using the %DO loop.

14.5. Conditional Logic in MACROs

With MACROs and MACRO variables, you have a lot of flexibility. You can increase that flexibility still more by using conditional MACRO statements such as %IF. Here are the general forms of statements used for conditional logic in MACROS:


%IF condition %THEN action;
   %ELSE %IF condition %THEN action;
   %ELSE action;
   
%IF condition %THEN %DO;
   action;
%END;

These statements probably look familiar because there are parallel statements in standard SAS code, but don’t confuse these with their standard counterparts. As with the DO loops, the regular IF/THEN/ELSE statements we learned before can only be used inside a DATA step. The %IF/%THEN/%ELSE statements can be used outside of DATA and PROC steps and inside other MACROs.

Example

In the following SAS program, we create a MACRO with a parameter that will take the values "Yes" or "No" to request risk difference output from PROC FREQ in our twobytwo MACRO.

%MACRO twobytwov2(dataset, predictor, response, rd=No);
   %DO i=1 %TO %SYSFUNC(countw(&predictor, ' '));
        %LET dep = %SCAN(&predictor, &i);
        %IF &rd = No %THEN %DO;
            PROC FREQ DATA = &dataset;
                TITLE "&dep vs &response";
                TABLE &dep * &response / CHISQ;
            RUN;
            TITLE;
        %END;
        %ELSE %IF &rd = Yes %THEN %DO;
            PROC FREQ DATA = &dataset;
                TITLE "&dep vs &response";
                TABLE &dep * &response / CHISQ riskdiff;
            RUN;
            TITLE;
        %END;
    %END;
%MEND;

%twobytwov2(dataset=fghmtemp, predictor=diabetes, response=prevchd, rd=Yes);
*%twobytwov2(dataset=fghmtemp, predictor=diabetes, response=prevchd);
SAS Output

diabetes vs prevchd

The FREQ Procedure

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  
Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119
Column 1 Risk Estimates
  Risk ASE 95%
Confidence Limits
Exact 95%
Confidence Limits
Difference is (Row 1 - Row 2)
Row 1 0.8987 0.0142 0.8709 0.9264 0.8672 0.9249
Row 2 0.7609 0.0629 0.6376 0.8841 0.6123 0.8741
Total 0.8860 0.0142 0.8581 0.9139 0.8548 0.9125
Difference 0.1378 0.0645 0.0115 0.2642    
Column 2 Risk Estimates
  Risk ASE 95%
Confidence Limits
Exact 95%
Confidence Limits
Difference is (Row 1 - Row 2)
Row 1 0.1013 0.0142 0.0736 0.1291 0.0751 0.1328
Row 2 0.2391 0.0629 0.1159 0.3624 0.1259 0.3877
Total 0.1140 0.0142 0.0861 0.1419 0.0875 0.1452
Difference -0.1378 0.0645 -0.2642 -0.0115    

Sample Size = 500

The rd parameter expects the values either Yes or No with the default value of No. If No is specified (or if it is left blank), then we will only get the chi-squared output. If rd is specified to be Yes, then we get the chi-squared output and the risk difference output.

Note that, we have specified a default value for rd in our MACRO definition. When we use a defualt value, we create positiional parameters, and in this case, when calling the MACRO, we must specify all the parameters by parameter-name=parameter-value in the MACRO call.

14.6. Data Driven Programs - CALL SYMPUT

Using the CALL SYMPUT macro routing you can let a MACRO program look at the data and then decide for itself what to do. CALL Symput takes a value from a DATA step and assigns it to a MACRO variable which you can then use later in your program.

CALL SYMPUT can take many forms, but to assign a single value to a single MACRO variable, use CALL SYMPUT with this general form:

CALL SYMPUT("marcro-variable", value);

where macro-variable is the name of a macro variable, either new or old, and is enclosed in quotes. Value is the name of a variable from a DATA step whose current value you want to assigne to that MACRO variable.

CALL SYMPUT is often used in IF/THEN/ELSE statements, for example


IF Place = 1 THEN
   CALL SYMPUT("WinningTime", Time);

This statement tells SAS to creat a MACRO variable named WinningTime and set it equal to the current value of the variable Time when the value of Place is 1.

Be careful. You cannot create a MACRO variable with CALL SYMPUT and use it in the same DATA step. Here’s why. When you submit MACRO code, it is resolved by the MACRO processor, and then compiled and executed. Not untile the final stage, execution, does SAS see your DATA. CALL SYMPUT takes a data value from the execution phase, and passes it back to the MACRO processor for use in a later step. That’s why you must put CALL SYMPUT in one DATA step, but not use it until a later step.

Example

The following SAS program use the expected cell counts, calculated by PROC FREQ, to determine whether to return the chi-squared test p-value or Fisher's exact test p-value. Recall that one standard rule of thumb is that to use the chi-square test, all expected cell counts should be at least 5.

The following code is some prelimanary code that we run to figure out how to define how MACRO.

/*First, we will need to write and test our code for selecting only the output we want before
  we make our updated macro. We want to run a continuity adjusted chi-square test or 
  Fisher's exact test if the expected counts are too small. We only want the table with row percentages
  and the result of the correct test in our output.*/

*ODS TRACE ON;
PROC FREQ DATA = fghmtemp;
   TABLE prevap*prevchd / EXPECTED chisq;
   ODS OUTPUT CrossTabFreqs = ct ChiSq = chi FishersExact = fet;
RUN;
*ODS TRACE OFF;

*Lot's of missing data. We only want to use non missing values;
PROC PRINT DATA = ct;
RUN;

/* Test how to clean the resulting datasets */
/* Remove the missing values and get the cells with epxected countes less than 5 */
title '/* Test how to clean the resulting datasets */';
PROC PRINT DATA=ct;
   WHERE expected < 5 and prevap ne . and prevchd ne .;
RUN;
title;

*Create an indicator variable that is 1 if there is at least one cell with expected count less than 5;
%LET low_count = 0;
DATA _NULL_; *Run through the data set without creating a new one;
   SET ct;
   IF EXPECTED < 5 and prevap ne . and prevchd ne . THEN CALL SYMPUT('low_count', 1);
RUN;

%PUT &=low_count; *Should show value of 1 in the log file.;

/* Now, that we can check to see if the expected cell counts are less than 5,
 * we need to see how we can clean the chi-square or Fisher tables to print only 
 * the test name and p-value.
 */

PROC PRINT DATA=chi;
RUN;

*Select the row for the continuity adjusted chi-square test;
DATA chi2;
   SET chi;
   WHERE statistic="Continuity Adj. Chi-Square";
   DROP TABLE DF VALUE;
RUN;

PROC PRINT DATA = chi2;
RUN;

*Do the same thing for Fisher's exact test;
PROC PRINT DATA = fet;
RUN;

DATA fet2 (RENAME = (cValue1 = Prob));
   SET fet;
   Statistic = "Fisher's Exact Test";
   WHERE NAME1 = "XP2_FISH";
   KEEP statistic cValue1;
RUN;

DATA fet2;
   RETAIN Statistic Prob;
   SET fet2;
RUN;

PROC PRINT DATA = fet2;
RUN;

The previous code is exploratory to find the ODS table names and figure out how to extract the needed data from these tables. The following code provides the final workig MACRO.

%MACRO twobytwov3(dataset, predictor, response);
	%LET n = %sysfun(countw(predictor));
	%LET i = 1;
	%LET dep = %SCAN(&predictor, &i);
	%DO %WHILE(&dep ne);
		PROC FREQ DATA = &dataset;
            TITLE;
            ODS SELECT NONE;
			TABLE &dep*&response / CHISQ EXPECTED;
			ODS OUTPUT crosstabfreqs = ct ChiSq = chi FishersExact = fet;
		RUN;

		PROC FREQ DATA = &dataset;
			TITLE "&dep vs &response";
			TABLE &dep*&response / NOCOL NOPERCENT;
			ODS SELECT CrossTabFreqs;
		RUN;
		TITLE;

		%LET low_count = 0;
		DATA _NULL_; *Run through the data set without creating a new one;
			SET ct;
			IF EXPECTED < 5 and &dep ne . and &response ne . THEN CALL SYMPUT('low_count', 1);
		RUN;

		%IF &low_count = 0 %THEN %DO;
		DATA chi;
			SET chi;
			WHERE statistic="Continuity Adj. Chi-Square";
			DROP TABLE DF VALUE;
		RUN;

		PROC PRINT DATA = chi;
		RUN;
		%END;
		%ELSE %DO;
		DATA fet (RENAME = (cValue1 = Prob));
			SET fet;
			Statistic = "Fisher's Exact Test";
			WHERE NAME1 = "XP2_FISH";
			KEEP statistic cValue1;
		RUN;

		DATA fet;
			RETAIN Statistic Prob;
			SET fet;
		RUN;

		PROC PRINT DATA = fet;
		RUN;
		%END;
        
		%LET i = %EVAL(&i + 1);
		%LET dep = %SCAN(&predictor, &i);
	%END;
%MEND;

%twobytwov3(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd);
SAS Output

prevap vs prevchd

The FREQ Procedure

Frequency
Row Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
96.30
17
3.70
460
 
Yes
0
0.00
40
100.00
40
 
Total
443
57
500

Obs Statistic Prob
1 Fisher's Exact Test <.0001

diabetes vs prevchd

The FREQ Procedure

Frequency
Row Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
89.87
46
10.13
454
 
Yes
35
76.09
11
23.91
46
 
Total
443
57
500

Obs Statistic Prob
1 Continuity Adj. Chi-Square 0.0105

prevmi vs prevchd

The FREQ Procedure

Frequency
Row Pct
Table of PREVMI by PREVCHD
PREVMI(Prevalent MI (Hosp,Silent)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
94.46
26
5.54
469
 
Yes
0
0.00
31
100.00
31
 
Total
443
57
500

Obs Statistic Prob
1 Fisher's Exact Test <.0001

prevstrk vs prevchd

The FREQ Procedure

Frequency
Row Pct
Table of PREVSTRK by PREVCHD
PREVSTRK(Prevalent Stroke (Infarct,Hem)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
438
89.02
54
10.98
492
 
Yes
5
62.50
3
37.50
8
 
Total
443
57
500

Obs Statistic Prob
1 Fisher's Exact Test 0.0519

prevhyp vs prevchd

The FREQ Procedure

Frequency
Row Pct
Table of PREVHYP by PREVCHD
PREVHYP(Prevalent Hypertension) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
191
93.63
13
6.37
204
 
Yes
252
85.14
44
14.86
296
 
Total
443
57
500

Obs Statistic Prob
1 Continuity Adj. Chi-Square 0.0052

This program uses PROC FREQ to calculate the expected cell counts and extracts this table using ODS OUTPUT. If any of the cells have an expected cell count less than 5, then CALL SYMPUT sets the value of the MACRO variable lowcount to 1. Otherwise, it retains the starting value of 0. This MACRO variable is then used in a %IF statement to determine whether to save and print the p-value from a chi-square test or Fisher's exact test.

14.7. Exercises

  1. Write a macro that discretizes a quantitative variable into four categories based on quantiles. That is:

    • if X < Q1 then group = 1

    • if Q1 < X < M then group = 2

    • if M < X < Q3 then group = 3

    • if Q3 < X then group = 4

The macro should have the following defnition - %quartilesmacro(mydata, qvar, round, out);

  • mydata: dataset containing the quantitative variable

  • qvar: name of quantitative variable

  • round: integer representing number of decimal places to round to

  • out: name of output dataset which contains the categorized variable which will have name qvar_cat, e.g. if the qvar is bmi then the output variable is bmi_cat.