14. SAS MACRO Programming¶
This lesson introduces the most commonly used features of the SAS macro language. When you write a program that will be run over and over again, you might want seriously to consider using “macros” in your code, because:
macros allow you to make a change in one location of your program, so that SAS can cascade the change throughout your program
macros allow you to write a section of code once and use it over and over again, in the same program or even different programs
macros allow you to make programs data driven, letting SAS decide what to do based on actual data values.
For a good introduction with some more examples, see
Chapter 7: Writing Flexible Code with the SAS Macro Facility in the Little SAS Book: A Primer, 5th ed.
Chapter 25: Introducing the SAS Macro Language from Learning SAS by Example: A Programmer’s Guide, 2nd ed
For our examples in this chapter, we will use the Framingham dataset which was a large longitudinal, epidemiologic study of the risks of heart disease. Be sure to download and the SAS dataset file fghm113.sas7bdat and place it in a convenient folder. Then change the LIBNAME path to point to this folder in the following code.
LIBNAME myData "/folders/myfolders/SAS_Notes/data";
*Create a temporary data set, so that we don't save changes to the original data set.;
DATA fghmTemp;
SET myData.fghm113;
RUN;
/*Now let's code some variables with some more descriptive values.
SEX (Gender): 1=Men
2=Women
Period (Examination cycle): 1=Period1
2=Period2
3=Period3
BPMEDS (Use of anti-hypertensive meds): 0=Not currently
1=Currently use
CURSMOKE (Currently smoke?): 0=No
1=Yes
DIABETES: 0=Not diabetic
1=Diabetic
PREVAP (Have angina pectoric?): 0=No
1=Yes
PREVCHD (Coronary heart disease?): 0=No
1=Yes
PREVMI (Myocardial infarction?): 0=No
1=Yes
PREVSTRK (Had a stroke?): 0=No
1=Yes
PREVHYP (Hypertensive? sys bp >=140 or dyas bp >= 90): 0=no
1=yes
*/
PROC FORMAT;
VALUE YNfmt 0="No"
1="Yes";
VALUE perfmt 1="Period 1"
2="Period 2"
3="Period 3";
VALUE gndrfmt 1="Men"
2="Women";
RUN;
DATA fghmtemp;
SET fghmtemp;
FORMAT prevap ynfmt.
diabetes ynfmt.
cursmoke ynfmt.
bpmeds ynfmt.
prevchd ynfmt.
prevmi ynfmt.
prevstrk ynfmt.
prevhyp ynfmt.
sex gndrfmt.;
RUN;
*Check to see if the formatting was done correctly;
PROC CONTENTS DATA = fghmtemp;
RUN;
PROC PRINT DATA = fghmtemp (obs=5);
RUN;
SAS Connection established. Subprocess id is 2406
The CONTENTS Procedure
Data Set Name | WORK.FGHMTEMP | Observations | 500 |
---|---|---|---|
Member Type | DATA | Variables | 21 |
Engine | V9 | Indexes | 0 |
Created | 10/10/2020 19:58:41 | Observation Length | 168 |
Last Modified | 10/10/2020 19:58:41 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 2 |
First Data Page | 1 |
Max Obs per Page | 389 |
Obs in First Data Page | 359 |
Number of Data Set Repairs | 0 |
Filename | /tmp/SAS_workA8FB00000966_localhost.localdomain/fghmtemp.sas7bdat |
Release Created | 9.0401M6 |
Host Created | Linux |
Inode Number | 671645 |
Access Permission | rw-r--r-- |
Owner Name | sasdemo |
File Size | 192KB |
File Size (bytes) | 196608 |
Alphabetic List of Variables and Attributes | ||||||
---|---|---|---|---|---|---|
# | Variable | Type | Len | Format | Informat | Label |
4 | AGE | Num | 8 | BEST12. | F12. | Age (years) at examination |
9 | BMI | Num | 8 | BEST12. | F12. | Body Mass Index (kr/(M*M) |
11 | BPMEDS | Num | 8 | YNFMT. | F12. | Anti-hypertensive meds Y/N |
8 | CIGPDAY | Num | 8 | BEST12. | F12. | Cigarettes per day |
7 | CURSMOKE | Num | 8 | YNFMT. | F12. | Current Cig Smoker Y/N |
10 | DIABETES | Num | 8 | YNFMT. | F12. | Diabetic Y/N |
6 | DIABP | Num | 8 | BEST12. | F12. | Diastolic BP mmHg |
13 | GLUCOSE | Num | 8 | BEST12. | F12. | Casual Glucose mg/dL |
20 | HDLC | Num | 8 | BEST12. | F12. | HDL Cholesterol mg/dL |
12 | HEARTRTE | Num | 8 | BEST12. | F12. | Ventricular Rate (beats/min) |
21 | LDLC | Num | 8 | BEST12. | F12. | LDL Cholesterol mg/dL |
19 | PERIOD | Num | 8 | BEST12. | F12. | Examination cycle |
15 | PREVAP | Num | 8 | YNFMT. | F12. | Prevalent Angina |
14 | PREVCHD | Num | 8 | YNFMT. | F12. | Prevalent CHD (MI,AP,CI) |
18 | PREVHYP | Num | 8 | YNFMT. | F12. | Prevalent Hypertension |
16 | PREVMI | Num | 8 | YNFMT. | F12. | Prevalent MI (Hosp,Silent) |
17 | PREVSTRK | Num | 8 | YNFMT. | F12. | Prevalent Stroke (Infarct,Hem) |
2 | RANDID | Num | 8 | BEST12. | F12. | Random ID |
1 | SEX | Num | 5 | GNDRFMT. | F12. | SEX |
5 | SYSBP | Num | 8 | BEST12. | F12. | Systolic BP mmHg |
3 | TOTCHOL | Num | 8 | BEST12. | F12. | Serum Cholesterol mg/dL |
Obs | SEX | RANDID | TOTCHOL | AGE | SYSBP | DIABP | CURSMOKE | CIGPDAY | BMI | DIABETES | BPMEDS | HEARTRTE | GLUCOSE | PREVCHD | PREVAP | PREVMI | PREVSTRK | PREVHYP | PERIOD | HDLC | LDLC |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Women | 11263 | 220 | 55 | 180 | 106 | No | 0 | 31.17 | Yes | Yes | 86 | 81 | No | No | No | No | Yes | 3 | 46 | 135 |
2 | Men | 16365 | 211 | 55 | 173 | 123 | No | 0 | 29.11 | No | Yes | 75 | 85 | No | No | No | No | Yes | 3 | 48 | 163 |
3 | Women | 43770 | 283 | 64 | 159 | 69 | No | 0 | 32.93 | Yes | No | 70 | 230 | No | No | No | No | Yes | 3 | 45 | 238 |
4 | Men | 47561 | 290 | 56 | 132 | 70 | Yes | 40 | 22.73 | No | No | 100 | 90 | No | No | No | No | Yes | 3 | 54 | 236 |
5 | Women | 55965 | 298 | 72 | 109 | 60 | No | 0 | 26.2 | No | No | 80 | 100 | No | No | No | No | No | 3 | 38 | 260 |
14.1. MACRO Variables and %LET¶
The general form of a %LET statement is:
%LET macro-variable-name = value;
where macro-variable-name is a name you make up following the standard rules for SAS names (32 characters or fewer in length, starting with a letter or underscore, and containing only letters, numerals or underscores). Value is the text to be substituted for the macro variable name, and can be up to 64,000 characters long. The following statements each create a macro variable.
%LET iterations = 5;
%LET winner = Lance Armstrong;
Notice that there are no quotation marks around value even when it contains characters. Blanks at the beginning and end will be trimmed, and everything else between the equal sign and semicolon will become part of the value for the MACRO variable.
To use the MACRO variable, you simply add the ampersand (&) prefix and stick the MACRO variable name where you want its value to be substituted. For example,
DO i = 1 TO &iterations;
TITLE "First: &winner";
When we run a program with the previous lines of code, it will first go through the MACRO processor to substitute the values of the MACRO variable to generate the SAS code:
DO i = 1 TO 5;
TITLE "First: Lance Armstrong";
and then it runs the resulting code as normal.
Example
The following SAS program uses three MACRO variables to specify the dataset name and the two variable names from this dataset to use to create a contingency table and chi-square test output. In this case, we produce a chi-square test and contingency table between diabetes and previous cardiovascular heart disease.
%LET response = prevchd;
%LET predictor = diabetes;
%LET dataset = fghmtemp;
PROC FREQ DATA = &dataset;
TITLE "&predictor vs &response";
TABLE &predictor * &response / CHISQ;
RUN;
TITLE;
The FREQ Procedure
|
|
Statistics for Table of DIABETES by PREVCHD
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 7.8534 | 0.0051 |
Likelihood Ratio Chi-Square | 1 | 6.3873 | 0.0115 |
Continuity Adj. Chi-Square | 1 | 6.5483 | 0.0105 |
Mantel-Haenszel Chi-Square | 1 | 7.8377 | 0.0051 |
Phi Coefficient | 0.1253 | ||
Contingency Coefficient | 0.1244 | ||
Cramer's V | 0.1253 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 408 |
Left-sided Pr <= F | 0.9972 |
Right-sided Pr >= F | 0.0090 |
Table Probability (P) | 0.0062 |
Two-sided Pr <= P | 0.0119 |
Sample Size = 500
Note that the MACRO variables response and predictor are used to update both the TABLE statement and the TITLE statement. After running through the MACRO processor, the following SAS code is generated and run:
PROC FREQ DATA = fghmtemp;
TITLE "diabetes vs prevchd";
TABLE diabetes * prevchd / CHISQ;
RUN;
This was a short program, with only a few occurrences of the MACRO variables. But imagine if you had a long program with dozens of occurrences of MACRO variables. You could save a lot of time and trouble by changing the MACRO variable only once at the beginning.
14.2. MACRO Functions¶
Anytime you find yourself repeated the same program statements over and over, you might want to consider creating a MACRO instead. MACROS are simply a group of SAS statememts that have a name. And, anytime you want to run that group of statements in your program, you use the name instead of re-typng all of the statements.
The general form of a MACRO is
%MACRO macro-name(parameter-1=, parameter-2=,..., parameter-n= );
macro-text
%MEND macro-name;
The %MACRO statement tells SAS that this is the beginning of the MACRO and %MEND statement signals the end of the MACRO. Macro-name is a name you make up for your MACRO, and parameter-1, parameter-2, …,parameter-n are inputs to your MACRO function. These parameters are MACRO variables defined within your MACRO and are used just like %LET MACRO variables by prefixing with the ampersand (&).
For example, if you have a MACRO names %MONTHLYREPORT that has parameters for the month and region, it might start like this:
%MACRO monthlyreport(month=, region= );
Then, when you invoke the macro, specify the values for the MACRO variables after the equal signs:
%monthlyreport(month=May, region=West);
Example
The following SAS program turns our previous SAS code using %LET MACRO variables into a MACRO function to produce a chi-square test with contingency table bewteen two specified variables in a given dataset.
%MACRO twobytwo(dataset, predictor, response);
PROC FREQ DATA = &dataset;
TITLE "&predictor vs &response";
TABLE &predictor*&response / CHISQ;
RUN;
TITLE;
%MEND twobytwo;
%twobytwo(fghmtemp, diabetes, prevchd);
The FREQ Procedure
|
|
Statistics for Table of DIABETES by PREVCHD
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 7.8534 | 0.0051 |
Likelihood Ratio Chi-Square | 1 | 6.3873 | 0.0115 |
Continuity Adj. Chi-Square | 1 | 6.5483 | 0.0105 |
Mantel-Haenszel Chi-Square | 1 | 7.8377 | 0.0051 |
Phi Coefficient | 0.1253 | ||
Contingency Coefficient | 0.1244 | ||
Cramer's V | 0.1253 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 408 |
Left-sided Pr <= F | 0.9972 |
Right-sided Pr >= F | 0.0090 |
Table Probability (P) | 0.0062 |
Two-sided Pr <= P | 0.0119 |
Sample Size = 500
Note that we get the same output as before, but now we have used a MACRO function. Also note that we did not use the parameter names. When not using the names to specify the parameter values, you must be careful to be sure that you give them in the correct order. In our example, the MACRO twobytwo expects the dataset first, then the predictor (row variable) and finally the response (column variable).
14.3. MPRINT System Option¶
We have shown you what SAS sees after the MACRO processor has resolved the program, but normally you won’t see these statements. However, if you specify the MPRINT system option in your program, then SAS will print the resolved statements from MACROS in the SAS log. This can be useful for debugging purposes. To turn on the MPRINT option submit an OPTIONS statement like this:
OPTIONS MPRINT;
Here is what the SAS log looks like. (Note we have changed the MACRO by adding the NOPRINT option to PROC FREQ to suppress the output for this example.)
%MACRO twobytwo(dataset, predictor, response);
PROC FREQ DATA = &dataset NOPRINT;
TITLE "&predictor vs &response";
TABLE &predictor*&response / CHISQ;
RUN;
TITLE;
%MEND twobytwo;
OPTIONS MPRINT;
%twobytwo(fghmtemp, diabetes, prevchd);
OPTIONS NOMPRINT;
131 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
131! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
132
133 %MACRO twobytwo(dataset, predictor, response);
134 PROC FREQ DATA = &dataset NOPRINT;
135 TITLE "&predictor vs &response";
136 TABLE &predictor*&response / CHISQ;
137 RUN;
138 TITLE;
139 %MEND twobytwo;
140
141 OPTIONS MPRINT;
142
143 %twobytwo(fghmtemp, diabetes, prevchd);
MPRINT(TWOBYTWO): PROC FREQ DATA = fghmtemp NOPRINT;
MPRINT(TWOBYTWO): TITLE "diabetes vs prevchd";
MPRINT(TWOBYTWO): TABLE diabetes*prevchd / CHISQ;
MPRINT(TWOBYTWO): RUN;
NOTE: Processing will terminate because there are no valid requests for displayed output or output data sets.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
MPRINT(TWOBYTWO): TITLE;
144
145 OPTIONS NOMPRINT;
146
147 ods html5 (id=saspy_internal) close;ods listing;
148
You can see that SAS has inserted into the regular SAS log the MPRINT lines. The statements generated by the MACRO processor are labeled with the word MPRINT followed by the name of the MACRO that generated the statements, in this case twobytwo. By using the MPRINT system option it is easy to see the standard SAS statements your MACRO is generating.
14.4. MACRO %DO Loop¶
DO loops are useful for simplifying repetitive code, but a regular DO loop can only be used inside a DATA step. What if we want to repeat the same procedure with minor changes such as different variables. In this case, a MACRO %DO loop is useful. The general form of this statement is:
%DO index-variable = start TO end;
SAS code;
%END;
The index-variable serves as the counter variable for the loop and is a MACRO variable that runs from start to end. This type of loop can be used outside of PROC and DATA steps to loop over multiple of these steps.
Example
Suppose we want to run our two by two table code but for many different variables such as in the following SAS code.
PROC FREQ DATA = fghmtemp;
TITLE "prevap vs prevchd";
TABLE prevap * prevchd / CHISQ;
RUN;
PROC FREQ DATA = fghmtemp;
TITLE "diabetes vs prevchd";
TABLE diabetes * prevchd / CHISQ;
RUN;
PROC FREQ DATA = fghmtemp;
TITLE "prevmi vs prevchd";
TABLE prevmi * prevchd / CHISQ;
RUN;
PROC FREQ DATA = fghmtemp;
TITLE "prevstrk vs prevchd";
TABLE prevstrk * prevchd / CHISQ;
RUN;
PROC FREQ DATA = fghmtemp;
TITLE "prevhyp vs prevchd";
TABLE prevhyp * prevchd / CHISQ;
RUN;
The code is all the same with just the row variables changing between each PROC FREQ call. Instead of writing this code over and over, we can instead use a %DO loop. To keep the output shorter, we will only use the first two PROC FREQ calls.
%MACRO twobytwov2(dataset, predictor, response);
%DO i=1 %TO %SYSFUNC(countw(&predictor, ' '));
%LET dep = %SCAN(&predictor, &i);
PROC FREQ DATA = &dataset;
TITLE "&dep vs &response";
TABLE &dep * &response / CHISQ;
RUN;
TITLE;
%END;
%MEND;
*%twobytwov2(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd);
%twobytwov2(fghmtemp, prevap diabetes, prevchd);
The FREQ Procedure
|
|
Statistics for Table of PREVAP by PREVCHD
Statistic | DF | Value | Prob |
---|---|---|---|
WARNING: 25% of the cells have expected counts less than 5. Chi-Square may not be a valid test. |
|||
Chi-Square | 1 | 337.9100 | <.0001 |
Likelihood Ratio Chi-Square | 1 | 209.3011 | <.0001 |
Continuity Adj. Chi-Square | 1 | 328.4425 | <.0001 |
Mantel-Haenszel Chi-Square | 1 | 337.2342 | <.0001 |
Phi Coefficient | 0.8221 | ||
Contingency Coefficient | 0.6350 | ||
Cramer's V | 0.8221 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 443 |
Left-sided Pr <= F | 1.0000 |
Right-sided Pr >= F | <.0001 |
Table Probability (P) | <.0001 |
Two-sided Pr <= P | <.0001 |
Sample Size = 500
The FREQ Procedure
|
|
Statistics for Table of DIABETES by PREVCHD
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 7.8534 | 0.0051 |
Likelihood Ratio Chi-Square | 1 | 6.3873 | 0.0115 |
Continuity Adj. Chi-Square | 1 | 6.5483 | 0.0105 |
Mantel-Haenszel Chi-Square | 1 | 7.8377 | 0.0051 |
Phi Coefficient | 0.1253 | ||
Contingency Coefficient | 0.1244 | ||
Cramer's V | 0.1253 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 408 |
Left-sided Pr <= F | 0.9972 |
Right-sided Pr >= F | 0.0090 |
Table Probability (P) | 0.0062 |
Two-sided Pr <= P | 0.0119 |
Sample Size = 500
In this MACRO, we instead pass in a list of row variables to the MACRO parameter predictor separated by a space. Previously, we learned about the COUNTW and SCAN functions, but now we need to use them outside of a DATA step. To use COUNTW, we need to use the MACRO function %SYSFUNC to evaluate this function when it is outside the DATA step. Recall that COUNTW counts the number of words, in this case separated by a space, in a list. SAS already has a MACRO version of SCAN, %SCAN, that we can use to go through the list of row variables. The rest of the code is the same as before, but now we need to walk through a list of row variables in predictor using the %DO loop.
14.5. Conditional Logic in MACROs¶
With MACROs and MACRO variables, you have a lot of flexibility. You can increase that flexibility still more by using conditional MACRO statements such as %IF. Here are the general forms of statements used for conditional logic in MACROS:
%IF condition %THEN action;
%ELSE %IF condition %THEN action;
%ELSE action;
%IF condition %THEN %DO;
action;
%END;
These statements probably look familiar because there are parallel statements in standard SAS code, but don’t confuse these with their standard counterparts. As with the DO loops, the regular IF/THEN/ELSE statements we learned before can only be used inside a DATA step. The %IF/%THEN/%ELSE statements can be used outside of DATA and PROC steps and inside other MACROs.
Example
In the following SAS program, we create a MACRO with a parameter that will take the values "Yes" or "No" to request risk difference output from PROC FREQ in our twobytwo MACRO.
%MACRO twobytwov2(dataset, predictor, response, rd=No);
%DO i=1 %TO %SYSFUNC(countw(&predictor, ' '));
%LET dep = %SCAN(&predictor, &i);
%IF &rd = No %THEN %DO;
PROC FREQ DATA = &dataset;
TITLE "&dep vs &response";
TABLE &dep * &response / CHISQ;
RUN;
TITLE;
%END;
%ELSE %IF &rd = Yes %THEN %DO;
PROC FREQ DATA = &dataset;
TITLE "&dep vs &response";
TABLE &dep * &response / CHISQ riskdiff;
RUN;
TITLE;
%END;
%END;
%MEND;
%twobytwov2(dataset=fghmtemp, predictor=diabetes, response=prevchd, rd=Yes);
*%twobytwov2(dataset=fghmtemp, predictor=diabetes, response=prevchd);
The FREQ Procedure
|
|
Statistics for Table of DIABETES by PREVCHD
Statistic | DF | Value | Prob |
---|---|---|---|
Chi-Square | 1 | 7.8534 | 0.0051 |
Likelihood Ratio Chi-Square | 1 | 6.3873 | 0.0115 |
Continuity Adj. Chi-Square | 1 | 6.5483 | 0.0105 |
Mantel-Haenszel Chi-Square | 1 | 7.8377 | 0.0051 |
Phi Coefficient | 0.1253 | ||
Contingency Coefficient | 0.1244 | ||
Cramer's V | 0.1253 |
Fisher's Exact Test | |
---|---|
Cell (1,1) Frequency (F) | 408 |
Left-sided Pr <= F | 0.9972 |
Right-sided Pr >= F | 0.0090 |
Table Probability (P) | 0.0062 |
Two-sided Pr <= P | 0.0119 |
Column 1 Risk Estimates | ||||||
---|---|---|---|---|---|---|
Risk | ASE | 95% Confidence Limits |
Exact 95% Confidence Limits |
|||
Difference is (Row 1 - Row 2) | ||||||
Row 1 | 0.8987 | 0.0142 | 0.8709 | 0.9264 | 0.8672 | 0.9249 |
Row 2 | 0.7609 | 0.0629 | 0.6376 | 0.8841 | 0.6123 | 0.8741 |
Total | 0.8860 | 0.0142 | 0.8581 | 0.9139 | 0.8548 | 0.9125 |
Difference | 0.1378 | 0.0645 | 0.0115 | 0.2642 |
Column 2 Risk Estimates | ||||||
---|---|---|---|---|---|---|
Risk | ASE | 95% Confidence Limits |
Exact 95% Confidence Limits |
|||
Difference is (Row 1 - Row 2) | ||||||
Row 1 | 0.1013 | 0.0142 | 0.0736 | 0.1291 | 0.0751 | 0.1328 |
Row 2 | 0.2391 | 0.0629 | 0.1159 | 0.3624 | 0.1259 | 0.3877 |
Total | 0.1140 | 0.0142 | 0.0861 | 0.1419 | 0.0875 | 0.1452 |
Difference | -0.1378 | 0.0645 | -0.2642 | -0.0115 |
Sample Size = 500
The rd parameter expects the values either Yes or No with the default value of No. If No is specified (or if it is left blank), then we will only get the chi-squared output. If rd is specified to be Yes, then we get the chi-squared output and the risk difference output.
Note that, we have specified a default value for rd in our MACRO definition. When we use a defualt value, we create positiional parameters, and in this case, when calling the MACRO, we must specify all the parameters by parameter-name=parameter-value in the MACRO call.
14.6. Data Driven Programs - CALL SYMPUT¶
Using the CALL SYMPUT macro routing you can let a MACRO program look at the data and then decide for itself what to do. CALL Symput takes a value from a DATA step and assigns it to a MACRO variable which you can then use later in your program.
CALL SYMPUT can take many forms, but to assign a single value to a single MACRO variable, use CALL SYMPUT with this general form:
CALL SYMPUT("marcro-variable", value);
where macro-variable is the name of a macro variable, either new or old, and is enclosed in quotes. Value is the name of a variable from a DATA step whose current value you want to assigne to that MACRO variable.
CALL SYMPUT is often used in IF/THEN/ELSE statements, for example
IF Place = 1 THEN
CALL SYMPUT("WinningTime", Time);
This statement tells SAS to creat a MACRO variable named WinningTime and set it equal to the current value of the variable Time when the value of Place is 1.
Be careful. You cannot create a MACRO variable with CALL SYMPUT and use it in the same DATA step. Here’s why. When you submit MACRO code, it is resolved by the MACRO processor, and then compiled and executed. Not untile the final stage, execution, does SAS see your DATA. CALL SYMPUT takes a data value from the execution phase, and passes it back to the MACRO processor for use in a later step. That’s why you must put CALL SYMPUT in one DATA step, but not use it until a later step.
Example
The following SAS program use the expected cell counts, calculated by PROC FREQ, to determine whether to return the chi-squared test p-value or Fisher's exact test p-value. Recall that one standard rule of thumb is that to use the chi-square test, all expected cell counts should be at least 5.
The following code is some prelimanary code that we run to figure out how to define how MACRO.
/*First, we will need to write and test our code for selecting only the output we want before
we make our updated macro. We want to run a continuity adjusted chi-square test or
Fisher's exact test if the expected counts are too small. We only want the table with row percentages
and the result of the correct test in our output.*/
*ODS TRACE ON;
PROC FREQ DATA = fghmtemp;
TABLE prevap*prevchd / EXPECTED chisq;
ODS OUTPUT CrossTabFreqs = ct ChiSq = chi FishersExact = fet;
RUN;
*ODS TRACE OFF;
*Lot's of missing data. We only want to use non missing values;
PROC PRINT DATA = ct;
RUN;
/* Test how to clean the resulting datasets */
/* Remove the missing values and get the cells with epxected countes less than 5 */
title '/* Test how to clean the resulting datasets */';
PROC PRINT DATA=ct;
WHERE expected < 5 and prevap ne . and prevchd ne .;
RUN;
title;
*Create an indicator variable that is 1 if there is at least one cell with expected count less than 5;
%LET low_count = 0;
DATA _NULL_; *Run through the data set without creating a new one;
SET ct;
IF EXPECTED < 5 and prevap ne . and prevchd ne . THEN CALL SYMPUT('low_count', 1);
RUN;
%PUT &=low_count; *Should show value of 1 in the log file.;
/* Now, that we can check to see if the expected cell counts are less than 5,
* we need to see how we can clean the chi-square or Fisher tables to print only
* the test name and p-value.
*/
PROC PRINT DATA=chi;
RUN;
*Select the row for the continuity adjusted chi-square test;
DATA chi2;
SET chi;
WHERE statistic="Continuity Adj. Chi-Square";
DROP TABLE DF VALUE;
RUN;
PROC PRINT DATA = chi2;
RUN;
*Do the same thing for Fisher's exact test;
PROC PRINT DATA = fet;
RUN;
DATA fet2 (RENAME = (cValue1 = Prob));
SET fet;
Statistic = "Fisher's Exact Test";
WHERE NAME1 = "XP2_FISH";
KEEP statistic cValue1;
RUN;
DATA fet2;
RETAIN Statistic Prob;
SET fet2;
RUN;
PROC PRINT DATA = fet2;
RUN;
The previous code is exploratory to find the ODS table names and figure out how to extract the needed data from these tables. The following code provides the final workig MACRO.
%MACRO twobytwov3(dataset, predictor, response);
%LET n = %sysfun(countw(predictor));
%LET i = 1;
%LET dep = %SCAN(&predictor, &i);
%DO %WHILE(&dep ne);
PROC FREQ DATA = &dataset;
TITLE;
ODS SELECT NONE;
TABLE &dep*&response / CHISQ EXPECTED;
ODS OUTPUT crosstabfreqs = ct ChiSq = chi FishersExact = fet;
RUN;
PROC FREQ DATA = &dataset;
TITLE "&dep vs &response";
TABLE &dep*&response / NOCOL NOPERCENT;
ODS SELECT CrossTabFreqs;
RUN;
TITLE;
%LET low_count = 0;
DATA _NULL_; *Run through the data set without creating a new one;
SET ct;
IF EXPECTED < 5 and &dep ne . and &response ne . THEN CALL SYMPUT('low_count', 1);
RUN;
%IF &low_count = 0 %THEN %DO;
DATA chi;
SET chi;
WHERE statistic="Continuity Adj. Chi-Square";
DROP TABLE DF VALUE;
RUN;
PROC PRINT DATA = chi;
RUN;
%END;
%ELSE %DO;
DATA fet (RENAME = (cValue1 = Prob));
SET fet;
Statistic = "Fisher's Exact Test";
WHERE NAME1 = "XP2_FISH";
KEEP statistic cValue1;
RUN;
DATA fet;
RETAIN Statistic Prob;
SET fet;
RUN;
PROC PRINT DATA = fet;
RUN;
%END;
%LET i = %EVAL(&i + 1);
%LET dep = %SCAN(&predictor, &i);
%END;
%MEND;
%twobytwov3(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd);
The FREQ Procedure
|
|
Obs | Statistic | Prob |
---|---|---|
1 | Fisher's Exact Test | <.0001 |
The FREQ Procedure
|
|
Obs | Statistic | Prob |
---|---|---|
1 | Continuity Adj. Chi-Square | 0.0105 |
The FREQ Procedure
|
|
Obs | Statistic | Prob |
---|---|---|
1 | Fisher's Exact Test | <.0001 |
The FREQ Procedure
|
|
Obs | Statistic | Prob |
---|---|---|
1 | Fisher's Exact Test | 0.0519 |
The FREQ Procedure
|
|
Obs | Statistic | Prob |
---|---|---|
1 | Continuity Adj. Chi-Square | 0.0052 |
This program uses PROC FREQ to calculate the expected cell counts and extracts this table using ODS OUTPUT. If any of the cells have an expected cell count less than 5, then CALL SYMPUT sets the value of the MACRO variable lowcount to 1. Otherwise, it retains the starting value of 0. This MACRO variable is then used in a %IF statement to determine whether to save and print the p-value from a chi-square test or Fisher's exact test.
14.7. Exercises¶
Write a macro that discretizes a quantitative variable into four categories based on quantiles. That is:
if X < Q1 then group = 1
if Q1 < X < M then group = 2
if M < X < Q3 then group = 3
if Q3 < X then group = 4
The macro should have the following defnition - %quartilesmacro(mydata, qvar, round, out);
mydata: dataset containing the quantitative variable
qvar: name of quantitative variable
round: integer representing number of decimal places to round to
out: name of output dataset which contains the categorized variable which will have name qvar_cat, e.g. if the qvar is bmi then the output variable is bmi_cat.