SAS MACROS

This lecture will cover how to write SAS MACROS. SAS MACROS are how users can define their own functions to automate repetitive code. We will cover

  • MACRO variables with %LET
  • %MACRO/%MEND
  • MACRO Loops
  • Conditional logice in MACROS with %IF/%THEN/%ELSE
  • Data driven code with SYMPUT and SYMGET

For this lecture, we will use the Framingham dataset which was a large longitudinal, epidemiologice study of the risks of heart disease. Before we get started with MACROS, let's load the dataset and explore it a little.

In [3]:
LIBNAME myData "H:\BiostatCourses\PublicHealthComputing\Lectures\Week4Macros_Functions\SAS";

PROC CONTENTS DATA = myData.fghm113;
RUN;
Out[3]:
SAS Output

SAS Output

The CONTENTS Procedure

The CONTENTS Procedure

MYDATA.FGHM113

Attributes

Data Set Name MYDATA.FGHM113 Observations 500
Member Type DATA Variables 21
Engine V9 Indexes 0
Created 01/04/2014 11:31:52 Observation Length 168
Last Modified 01/04/2014 11:31:52 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    

Engine/Host Information

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 368
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename H:\BiostatCourses\PublicHealthComputing\Lectures\Week4Macros_Functions\SAS\fghm113.sas7bdat
Release Created 9.0401M0
Host Created X64_S08R2

Variables

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
4 AGE Num 8 BEST12. F12. Age (years) at examination
9 BMI Num 8 BEST12. F12. Body Mass Index (kr/(M*M)
11 BPMEDS Num 8 BEST12. F12. Anti-hypertensive meds Y/N
8 CIGPDAY Num 8 BEST12. F12. Cigarettes per day
7 CURSMOKE Num 8 BEST12. F12. Current Cig Smoker Y/N
10 DIABETES Num 8 BEST12. F12. Diabetic Y/N
6 DIABP Num 8 BEST12. F12. Diastolic BP mmHg
13 GLUCOSE Num 8 BEST12. F12. Casual Glucose mg/dL
20 HDLC Num 8 BEST12. F12. HDL Cholesterol mg/dL
12 HEARTRTE Num 8 BEST12. F12. Ventricular Rate (beats/min)
21 LDLC Num 8 BEST12. F12. LDL Cholesterol mg/dL
19 PERIOD Num 8 BEST12. F12. Examination cycle
15 PREVAP Num 8 BEST12. F12. Prevalent Angina
14 PREVCHD Num 8 BEST12. F12. Prevalent CHD (MI,AP,CI)
18 PREVHYP Num 8 BEST12. F12. Prevalent Hypertension
16 PREVMI Num 8 BEST12. F12. Prevalent MI (Hosp,Silent)
17 PREVSTRK Num 8 BEST12. F12. Prevalent Stroke (Infarct,Hem)
2 RANDID Num 8 BEST12. F12. Random ID
1 SEX Num 5 BEST12. F12. SEX
5 SYSBP Num 8 BEST12. F12. Systolic BP mmHg
3 TOTCHOL Num 8 BEST12. F12. Serum Cholesterol mg/dL

We will add the following translations to the dataset.

 SEX (Gender): 1=Men 
                2=Women
  Period (Examination cycle): 1=Period1 
                              2=Period2
                              3=Period3
  BPMEDS (Use of anti-hypertensive meds): 0=Not currently
                                          1=Currently use
  CURSMOKE (Currently smoke?): 0=No
                               1=Yes
  DIABETES: 0=Not diabetic
            1=Diabetic
  PREVAP (Have angina pectoric?): 0=No
                                  1=Yes
  PREVCHD (Coronary heart disease?): 0=No
                                     1=Yes
  PREVMI (Myocardial infarction?): 0=No
                                   1=Yes
  PREVSTRK (Had a stroke?): 0=No
                            1=Yes
  PREVHYP (Hypertensive? sys bp >=140 or dyas bp >= 90): 0=no
                                                         1=yes
In [4]:
DATA fghmTemp;
SET myData.fghm113;
RUN;

PROC FORMAT;
VALUE YNfmt 0="No"
			1="Yes";
VALUE perfmt 1="Period 1"
			 2="Period 2"
			 3="Period 3";
VALUE gndrfmt 1="Men"
			  2="Women";
RUN;

DATA fghmtemp;
SET fghmtemp;
FORMAT prevap ynfmt.
	   diabetes ynfmt.
	   cursmoke ynfmt.
	   bpmeds ynfmt.
	   prevchd ynfmt.
	   prevmi ynfmt.
	   prevstrk ynfmt.
	   prevhyp ynfmt.
	   sex gndrfmt.;
RUN;

*Check to see if the formatting was done correctly;
PROC CONTENTS DATA = fghmtemp;
RUN;

PROC PRINT DATA = fghmtemp (obs=5);
RUN;
Out[4]:
SAS Output

SAS Output

The CONTENTS Procedure

The CONTENTS Procedure

WORK.FGHMTEMP

Attributes

Data Set Name WORK.FGHMTEMP Observations 500
Member Type DATA Variables 21
Engine V9 Indexes 0
Created 09/13/2017 16:10:21 Observation Length 168
Last Modified 09/13/2017 16:10:21 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_64    
Encoding wlatin1 Western (Windows)    

Engine/Host Information

Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 367
Number of Data Set Repairs 0
ExtendObsCounter YES
Filename C:\Users\rlp176\AppData\Local\Temp\SAS Temporary Files\_TD5268_PHHP-DDH4LH2_\Prc2\fghmtemp.sas7bdat
Release Created 9.0401M3
Host Created X64_7PRO

Variables

Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat Label
4 AGE Num 8 BEST12. F12. Age (years) at examination
9 BMI Num 8 BEST12. F12. Body Mass Index (kr/(M*M)
11 BPMEDS Num 8 YNFMT. F12. Anti-hypertensive meds Y/N
8 CIGPDAY Num 8 BEST12. F12. Cigarettes per day
7 CURSMOKE Num 8 YNFMT. F12. Current Cig Smoker Y/N
10 DIABETES Num 8 YNFMT. F12. Diabetic Y/N
6 DIABP Num 8 BEST12. F12. Diastolic BP mmHg
13 GLUCOSE Num 8 BEST12. F12. Casual Glucose mg/dL
20 HDLC Num 8 BEST12. F12. HDL Cholesterol mg/dL
12 HEARTRTE Num 8 BEST12. F12. Ventricular Rate (beats/min)
21 LDLC Num 8 BEST12. F12. LDL Cholesterol mg/dL
19 PERIOD Num 8 BEST12. F12. Examination cycle
15 PREVAP Num 8 YNFMT. F12. Prevalent Angina
14 PREVCHD Num 8 YNFMT. F12. Prevalent CHD (MI,AP,CI)
18 PREVHYP Num 8 YNFMT. F12. Prevalent Hypertension
16 PREVMI Num 8 YNFMT. F12. Prevalent MI (Hosp,Silent)
17 PREVSTRK Num 8 YNFMT. F12. Prevalent Stroke (Infarct,Hem)
2 RANDID Num 8 BEST12. F12. Random ID
1 SEX Num 5 GNDRFMT. F12. SEX
5 SYSBP Num 8 BEST12. F12. Systolic BP mmHg
3 TOTCHOL Num 8 BEST12. F12. Serum Cholesterol mg/dL

The PRINT Procedure

Data Set WORK.FGHMTEMP

Obs SEX RANDID TOTCHOL AGE SYSBP DIABP CURSMOKE CIGPDAY BMI DIABETES BPMEDS HEARTRTE GLUCOSE PREVCHD PREVAP PREVMI PREVSTRK PREVHYP PERIOD HDLC LDLC
1 Women 11263 220 55 180 106 No 0 31.17 Yes Yes 86 81 No No No No Yes 3 46 135
2 Men 16365 211 55 173 123 No 0 29.11 No Yes 75 85 No No No No Yes 3 48 163
3 Women 43770 283 64 159 69 No 0 32.93 Yes No 70 230 No No No No Yes 3 45 238
4 Men 47561 290 56 132 70 Yes 40 22.73 No No 100 90 No No No No Yes 3 54 236
5 Women 55965 298 72 109 60 No 0 26.2 No No 80 100 No No No No No 3 38 260

MACRO Variables with %LET

%LET creates a MACRO variable. MACRO variables allow us to store text such as the varaible name in our dataset of the p-value from a table in a variable for use later. We can reference the value of the macro variable by placing & before the macro variable name.

Let's use MACRO variables to explore the relationship between begin diabetic and having coronary hearth disease.

In [5]:
PROC FREQ DATA = fghmtemp;
TABLE diabetes*prevchd / CHISQ;
RUN;

%LET response = prevchd;
%LET predictor = diabetes;
%LET dataset = fghmtemp;

*&predictor will put diabetes into the SAS code;
PROC FREQ DATA = &dataset;
TITLE "&predictor vs &response";
TABLE &predictor*&response / CHISQ;
RUN;

TITLE;

%SYMDEL dataset predictor response;
Out[5]:
SAS Output

SAS Output

The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500


diabetes vs prevchd

The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500

A Simple MACRO

All macros start with %MACRO and end with %MEND. First, let's make a simple macro to create a two by two contingency table with chi-square test results.

In general, when writing a MACRO, we need to write a non macro version to test the code. Then, we can make it into a MACRO by subbing in the macro variabes. In this case, we have already written the code above.

In [6]:
%MACRO twobytwo(dataset, predictor, response);
PROC FREQ DATA = &dataset;
	TITLE "&predictor vs &response";
	TABLE &predictor*&response / CHISQ;
RUN;
TITLE;
%MEND;

*Call the MACRO function;
%twobytwo(fghmtemp, diabetes, prevchd);
Out[6]:
SAS Output

SAS Output

diabetes vs prevchd

The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500

When creating macros, it is not uncommon to have to debug them. SAS provides some useful debugging options for macros.

  • SYMBOLGEN - prints the value of the macro variables in your log file
  • MPRINT - prints the SAS code generated when the value of your macro variables are substituted in to your log file.
  • MLOGIC - prints messages about the execution of your macro in the log file.
In [7]:
OPTIONS SYMBOLGEN;
OPTIONS MPRINT;
OPTIONS MLOGIC;

%twobytwo(fghmtemp, prevstrk, prevchd);

OPTIONS NOSYMBOLGEN;
OPTIONS NOMPRINT;
OPTIONS NOMLOGIC;
Out[7]:
SAS Output

SAS Output

prevstrk vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVSTRK * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVSTRK by PREVCHD
PREVSTRK(Prevalent Stroke (Infarct,Hem)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
438
87.60
89.02
98.87
54
10.80
10.98
94.74
492
98.40
 
 
Yes
5
1.00
62.50
1.13
3
0.60
37.50
5.26
8
1.60
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVSTRK by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 5.4832 0.0192
Likelihood Ratio Chi-Square 1 3.7434 0.0530
Continuity Adj. Chi-Square 1 3.1716 0.0749
Mantel-Haenszel Chi-Square 1 5.4723 0.0193
Phi Coefficient   0.1047  
Contingency Coefficient   0.1042  
Cramer's V   0.1047  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 438
Left-sided Pr <= F 0.9925
Right-sided Pr >= F 0.0519
   
Table Probability (P) 0.0444
Two-sided Pr <= P 0.0519

Sample Size = 500

The code above results in the following output in the log file.

MLOGIC(TWOBYTWO):  Beginning execution.
MLOGIC(TWOBYTWO):  Parameter DATASET has value fghmtemp
MLOGIC(TWOBYTWO):  Parameter PREDICTOR has value prevstrk
MLOGIC(TWOBYTWO):  Parameter RESPONSE has value prevchd
SYMBOLGEN:  Macro variable DATASET resolves to fghmtemp
MPRINT(TWOBYTWO):   PROC FREQ DATA = fghmtemp;
SYMBOLGEN:  Macro variable PREDICTOR resolves to prevstrk
SYMBOLGEN:  Macro variable RESPONSE resolves to prevchd
MPRINT(TWOBYTWO):   TITLE "prevstrk vs prevchd";
SYMBOLGEN:  Macro variable PREDICTOR resolves to prevstrk
SYMBOLGEN:  Macro variable RESPONSE resolves to prevchd
MPRINT(TWOBYTWO):   TABLE prevstrk*prevchd / CHISQ;
MPRINT(TWOBYTWO):   RUN;

MACRO Loops

The macro %DO can be used outside of a PROC or DATA step (unlike a regulat DO loop), which will allow us to loop through different procedures or data steps inside a macro.

For the next example, we will examine if there is an association between being diabetic and the other adverse health outcomes: coronary heart disease (prevchd), myocardial infarction (prevmi), stroke (precstrk), angine pectoris (prevap), or hypertension (prevhyp).

To do this, we will make a contingency table for each of these pairs of variables and get a chi-squared test and Fisher's exact test results.

We could write out the PROC FREQ statement for each of these 5 pairs.

In [8]:
PROC FREQ DATA = fghmtemp;
TABLE prevap*prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
TABLE diabetes*prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
TABLE prevmi*prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
TABLE prevstrk*prevchd / CHISQ;
RUN;

PROC FREQ DATA = fghmtemp;
TABLE prevhyp*prevchd / CHISQ;
RUN;
Out[8]:
SAS Output

SAS Output

The FREQ Procedure

The FREQ Procedure

Table PREVAP * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
88.60
96.30
100.00
17
3.40
3.70
29.82
460
92.00
 
 
Yes
0
0.00
0.00
0.00
40
8.00
100.00
70.18
40
8.00
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVAP by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 337.9100 <.0001
Likelihood Ratio Chi-Square 1 209.3011 <.0001
Continuity Adj. Chi-Square 1 328.4425 <.0001
Mantel-Haenszel Chi-Square 1 337.2342 <.0001
Phi Coefficient   0.8221  
Contingency Coefficient   0.6350  
Cramer's V   0.8221  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500


The FREQ Procedure

The FREQ Procedure

Table PREVMI * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVMI by PREVCHD
PREVMI(Prevalent MI (Hosp,Silent)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
88.60
94.46
100.00
26
5.20
5.54
45.61
469
93.80
 
 
Yes
0
0.00
0.00
0.00
31
6.20
100.00
54.39
31
6.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVMI by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 256.8548 <.0001
Likelihood Ratio Chi-Square 1 153.8559 <.0001
Continuity Adj. Chi-Square 1 247.5882 <.0001
Mantel-Haenszel Chi-Square 1 256.3411 <.0001
Phi Coefficient   0.7167  
Contingency Coefficient   0.5826  
Cramer's V   0.7167  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


The FREQ Procedure

The FREQ Procedure

Table PREVSTRK * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVSTRK by PREVCHD
PREVSTRK(Prevalent Stroke (Infarct,Hem)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
438
87.60
89.02
98.87
54
10.80
10.98
94.74
492
98.40
 
 
Yes
5
1.00
62.50
1.13
3
0.60
37.50
5.26
8
1.60
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVSTRK by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 5.4832 0.0192
Likelihood Ratio Chi-Square 1 3.7434 0.0530
Continuity Adj. Chi-Square 1 3.1716 0.0749
Mantel-Haenszel Chi-Square 1 5.4723 0.0193
Phi Coefficient   0.1047  
Contingency Coefficient   0.1042  
Cramer's V   0.1047  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 438
Left-sided Pr <= F 0.9925
Right-sided Pr >= F 0.0519
   
Table Probability (P) 0.0444
Two-sided Pr <= P 0.0519

Sample Size = 500


The FREQ Procedure

The FREQ Procedure

Table PREVHYP * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVHYP by PREVCHD
PREVHYP(Prevalent Hypertension) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
191
38.20
93.63
43.12
13
2.60
6.37
22.81
204
40.80
 
 
Yes
252
50.40
85.14
56.88
44
8.80
14.86
77.19
296
59.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVHYP by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 8.6231 0.0033
Likelihood Ratio Chi-Square 1 9.2098 0.0024
Continuity Adj. Chi-Square 1 7.8028 0.0052
Mantel-Haenszel Chi-Square 1 8.6059 0.0034
Phi Coefficient   0.1313  
Contingency Coefficient   0.1302  
Cramer's V   0.1313  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 191
Left-sided Pr <= F 0.9993
Right-sided Pr >= F 0.0021
   
Table Probability (P) 0.0013
Two-sided Pr <= P 0.0039

Sample Size = 500

In this case, we only had a few pairs of variables to compare, but what if you had to do this for 10, 20, or 100 pairs of variables. Writing that code repeatedly would take a while. We can simplify our code by using a MACRO with a loop.

This MACRO has four paramters:

  • dataset - the name of the SAS dataset to be used
  • predictor - the single predictor
  • response - a list of variables names to compare with predictor
  • n - number of response variables given

Output: Create a table and output chi-square and Fisher's exact test for each pair of variables.

In [9]:
%MACRO twobytwov2(dataset, predictor, response, n);
	%DO i=1 %TO &n;
		%LET dep = %SCAN(&predictor, &i);
		PROC FREQ DATA = &dataset;
			TITLE "&dep vs &response";
			TABLE &dep*&response / CHISQ;
		RUN;
		TITLE;
	%END;
%MEND;

%twobytwov2(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd, 5);
Out[9]:
SAS Output

SAS Output

prevap vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVAP * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
88.60
96.30
100.00
17
3.40
3.70
29.82
460
92.00
 
 
Yes
0
0.00
0.00
0.00
40
8.00
100.00
70.18
40
8.00
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVAP by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 337.9100 <.0001
Likelihood Ratio Chi-Square 1 209.3011 <.0001
Continuity Adj. Chi-Square 1 328.4425 <.0001
Mantel-Haenszel Chi-Square 1 337.2342 <.0001
Phi Coefficient   0.8221  
Contingency Coefficient   0.6350  
Cramer's V   0.8221  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


diabetes vs prevchd

The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
81.60
89.87
92.10
46
9.20
10.13
80.70
454
90.80
 
 
Yes
35
7.00
76.09
7.90
11
2.20
23.91
19.30
46
9.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of DIABETES by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 7.8534 0.0051
Likelihood Ratio Chi-Square 1 6.3873 0.0115
Continuity Adj. Chi-Square 1 6.5483 0.0105
Mantel-Haenszel Chi-Square 1 7.8377 0.0051
Phi Coefficient   0.1253  
Contingency Coefficient   0.1244  
Cramer's V   0.1253  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 408
Left-sided Pr <= F 0.9972
Right-sided Pr >= F 0.0090
   
Table Probability (P) 0.0062
Two-sided Pr <= P 0.0119

Sample Size = 500


prevmi vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVMI * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVMI by PREVCHD
PREVMI(Prevalent MI (Hosp,Silent)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
88.60
94.46
100.00
26
5.20
5.54
45.61
469
93.80
 
 
Yes
0
0.00
0.00
0.00
31
6.20
100.00
54.39
31
6.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVMI by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 256.8548 <.0001
Likelihood Ratio Chi-Square 1 153.8559 <.0001
Continuity Adj. Chi-Square 1 247.5882 <.0001
Mantel-Haenszel Chi-Square 1 256.3411 <.0001
Phi Coefficient   0.7167  
Contingency Coefficient   0.5826  
Cramer's V   0.7167  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


prevstrk vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVSTRK * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVSTRK by PREVCHD
PREVSTRK(Prevalent Stroke (Infarct,Hem)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
438
87.60
89.02
98.87
54
10.80
10.98
94.74
492
98.40
 
 
Yes
5
1.00
62.50
1.13
3
0.60
37.50
5.26
8
1.60
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVSTRK by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 5.4832 0.0192
Likelihood Ratio Chi-Square 1 3.7434 0.0530
Continuity Adj. Chi-Square 1 3.1716 0.0749
Mantel-Haenszel Chi-Square 1 5.4723 0.0193
Phi Coefficient   0.1047  
Contingency Coefficient   0.1042  
Cramer's V   0.1047  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 438
Left-sided Pr <= F 0.9925
Right-sided Pr >= F 0.0519
   
Table Probability (P) 0.0444
Two-sided Pr <= P 0.0519

Sample Size = 500


prevhyp vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVHYP * PREVCHD

Cross-Tabular Freq Table

Frequency
Percent
Row Pct
Col Pct
Table of PREVHYP by PREVCHD
PREVHYP(Prevalent Hypertension) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
191
38.20
93.63
43.12
13
2.60
6.37
22.81
204
40.80
 
 
Yes
252
50.40
85.14
56.88
44
8.80
14.86
77.19
296
59.20
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVHYP by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
Chi-Square 1 8.6231 0.0033
Likelihood Ratio Chi-Square 1 9.2098 0.0024
Continuity Adj. Chi-Square 1 7.8028 0.0052
Mantel-Haenszel Chi-Square 1 8.6059 0.0034
Phi Coefficient   0.1313  
Contingency Coefficient   0.1302  
Cramer's V   0.1313  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 191
Left-sided Pr <= F 0.9993
Right-sided Pr >= F 0.0021
   
Table Probability (P) 0.0013
Two-sided Pr <= P 0.0039

Sample Size = 500

Now that is a much simpler program! The output contains a lot more tests than we actually want though. What if we want just the tables and p-value for the chi-square test from each test, or just a single table with the p-values for each test. In general SAS is verbose and gives us a lot more output than we want. So we will need to select just the parts of the output that we want to make our reports.

Data Driven MACROS

Often, we will want to make our MACROS data driven, that means we will want our MACRO to perform different actions depending what data we have. To do this, we will need to use SYMPUT and SYMGET and to use conditional logic %IF/%THEN/%ELSE inside a MACRO.

For the next example, we will update our MACRO to only return the two by two table and either the adjusted chi-square p-value or Fisher's exact test p-value if the expected cell counts are less than 5.

First, we will need to write and test our code for selecting only the output we want before we make our updated macro. We want to run a continuity adjusted chi-square test or Fisher's exact test if the expected counts are too small. We only want the table with row percentages and the result of the correct test in our output.

In [10]:
/*ODS TRACE ON;*/
PROC FREQ DATA = fghmtemp;
TABLE prevap*prevchd / EXPECTED chisq;
ODS OUTPUT CrossTabFreqs = ct ChiSq = chi FishersExact = fet;
RUN;
/*ODS TRACE OFF;*/

/*Lot's of missing data. We only want to use non missing values*/
PROC PRINT DATA = ct;
RUN;
Out[10]:
SAS Output

SAS Output

The FREQ Procedure

The FREQ Procedure

Table PREVAP * PREVCHD

Cross-Tabular Freq Table

Frequency
Expected
Percent
Row Pct
Col Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
407.56
88.60
96.30
100.00
17
52.44
3.40
3.70
29.82
460
 
92.00
 
 
Yes
0
35.44
0.00
0.00
0.00
40
4.56
8.00
100.00
70.18
40
 
8.00
 
 
Total
443
88.60
57
11.40
500
100.00

Statistics for Table of PREVAP by PREVCHD

Chi-Square Tests

Statistic DF Value Prob
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Chi-Square 1 337.9100 <.0001
Likelihood Ratio Chi-Square 1 209.3011 <.0001
Continuity Adj. Chi-Square 1 328.4425 <.0001
Mantel-Haenszel Chi-Square 1 337.2342 <.0001
Phi Coefficient   0.8221  
Contingency Coefficient   0.6350  
Cramer's V   0.8221  

Fisher's Exact Test

Fisher's Exact Test
Cell (1,1) Frequency (F) 443
Left-sided Pr <= F 1.0000
Right-sided Pr >= F <.0001
   
Table Probability (P) <.0001
Two-sided Pr <= P <.0001

Sample Size = 500


The PRINT Procedure

Data Set WORK.CT

Obs Table PREVAP PREVCHD _TYPE_ _TABLE_ Frequency Expected Percent RowPercent ColPercent Missing
1 Table PREVAP * PREVCHD No No 11 1 443 407.56 88.6 96.304 100.000 .
2 Table PREVAP * PREVCHD No Yes 11 1 17 52.44 3.4 3.696 29.825 .
3 Table PREVAP * PREVCHD No . 10 1 460 . 92.0 . . .
4 Table PREVAP * PREVCHD Yes No 11 1 0 35.44 0.0 0.000 0.000 .
5 Table PREVAP * PREVCHD Yes Yes 11 1 40 4.56 8.0 100.000 70.175 .
6 Table PREVAP * PREVCHD Yes . 10 1 40 . 8.0 . . .
7 Table PREVAP * PREVCHD . No 01 1 443 . 88.6 . . .
8 Table PREVAP * PREVCHD . Yes 01 1 57 . 11.4 . . .
9 Table PREVAP * PREVCHD . . 00 1 500 . 100.0 . . 0
In [11]:
title '/* Test how to clean the resulting datasets */';
PROC PRINT DATA=ct;
WHERE expected < 5 and prevap ne . and prevchd ne .;
RUN;
title;
Out[11]:
SAS Output

SAS Output

/* Test how to clean the resulting datasets */

The PRINT Procedure

Data Set WORK.CT

Obs Table PREVAP PREVCHD _TYPE_ _TABLE_ Frequency Expected Percent RowPercent ColPercent Missing
5 Table PREVAP * PREVCHD Yes Yes 11 1 40 4.56 8 100 70.1754 .
In [12]:
*Create an indicator variable that is 1 if there is at least one cell with expected count less than 5;
%LET low_count = 0;
DATA _NULL_; *Run through the data set without creating a new one;
SET ct;
IF EXPECTED < 5 and prevap ne . and prevchd ne . THEN CALL SYMPUT('low_count', 1);
RUN;

%PUT &=low_count; *Should show value of 1 in the log file.;
Out[12]:

17                                                         The SAS System                        16:09 Wednesday, September 13, 2017

294 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg; ods graphics on /
294 ! outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
295
296 *Create an indicator variable that is 1 if there is at least one cell with expected count less than 5;
297 %LET low_count = 0;
298 DATA _NULL_; *Run through the data set without creating a new one;
299 SET ct;
300 IF EXPECTED < 5 and prevap ne . and prevchd ne . THEN CALL SYMPUT('low_count', 1);
301 RUN;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
300:80
NOTE: There were 9 observations read from the data set WORK.CT.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


302
303 %PUT &=low_count; *Should show value of 1 in the log file.;
LOW_COUNT= 1
304
305 ods html5 (id=saspy_internal) close;ods listing;
306
In [13]:
/* Now, that we can check to see if the expected cell counts are less than 5,
 * we need to see how we can clean the chi-square or Fisher tables to print only 
 * the test name and p-value.
 */

PROC PRINT DATA=chi;
RUN;
Out[13]:
SAS Output

SAS Output

The PRINT Procedure

Data Set WORK.CHI

Obs Table Statistic DF Value Prob
1 Table PREVAP * PREVCHD Chi-Square 1 337.9100 <.0001
2 Table PREVAP * PREVCHD Likelihood Ratio Chi-Square 1 209.3011 <.0001
3 Table PREVAP * PREVCHD Continuity Adj. Chi-Square 1 328.4425 <.0001
4 Table PREVAP * PREVCHD Mantel-Haenszel Chi-Square 1 337.2342 <.0001
5 Table PREVAP * PREVCHD Phi Coefficient _ 0.8221 _
6 Table PREVAP * PREVCHD Contingency Coefficient _ 0.6350 _
7 Table PREVAP * PREVCHD Cramer's V _ 0.8221 _
In [14]:
/*Select the row for the continuity adjusted chi-square test*/
DATA chi2;
SET chi;
WHERE statistic="Continuity Adj. Chi-Square";
DROP TABLE DF VALUE;
RUN;

PROC PRINT DATA = chi2;
RUN;
Out[14]:
SAS Output

SAS Output

The PRINT Procedure

Data Set WORK.CHI2

Obs Statistic Prob
1 Continuity Adj. Chi-Square <.0001
In [15]:
/*Do the same thing for Fisher's exact test*/
PROC PRINT DATA = fet;
RUN;
Out[15]:
SAS Output

SAS Output

The PRINT Procedure

Data Set WORK.FET

Obs Table Name1 Label1 cValue1 nValue1
1 Table PREVAP * PREVCHD Cell1_FREQ Cell (1,1) Frequency (F) 443 443.000000
2 Table PREVAP * PREVCHD XPL_FISH Left-sided Pr <= F 1.0000 1.000000
3 Table PREVAP * PREVCHD XPR_FISH Right-sided Pr >= F <.0001 6.222358E-46
4 Table PREVAP * PREVCHD       .
5 Table PREVAP * PREVCHD P_TABLE Table Probability (P) <.0001 6.125565E-43
6 Table PREVAP * PREVCHD XP2_FISH Two-sided Pr <= P <.0001 6.222358E-46
In [16]:
DATA fet2 (RENAME = (cValue1 = Prob));
SET fet;
Statistic = "Fisher's Exact Test";
WHERE NAME1 = "XP2_FISH";
KEEP statistic cValue1;
RUN;

DATA fet2;
RETAIN Statistic Prob;
SET fet2;
RUN;

PROC PRINT DATA = fet2;
RUN;
Out[16]:
SAS Output

SAS Output

The PRINT Procedure

Data Set WORK.FET2

Obs Statistic Prob
1 Fisher's Exact Test <.0001

Now, let's make this into a MACRO.

In [17]:
%MACRO twobytwov3(dataset, predictor, response);
	%LET n = %LENGTH(predictor);
	%LET i = 1;
	%LET dep = %SCAN(&predictor, &i);
	%DO %WHILE(&dep ne);
		PROC FREQ DATA = &dataset;
			TABLE &dep*&response / CHISQ EXPECTED;
			ODS SELECT NONE;
			ODS OUTPUT crosstabfreqs = ct ChiSq = chi FishersExact = fet;
		RUN;
        
        PROC FREQ DATA = &dataset;
			TITLE "&dep vs &response";
			TABLE &dep*&response / NOCOL NOPERCENT;
			ODS SELECT CrossTabFreqs;
		RUN;
        TITLE;

		%LET low_count = 0;
		DATA _NULL_; *Run through the data set without creating a new one;
			SET ct;
			IF EXPECTED < 5 and &dep ne . and &response ne . THEN CALL SYMPUT('low_count', 1);
		RUN;
        
        %IF &low_count = 0 %THEN %DO;
		DATA chi;
			SET chi;
			WHERE statistic="Continuity Adj. Chi-Square";
			DROP TABLE DF VALUE;
		RUN;

		PROC PRINT DATA = chi;
		RUN;
		%END;
		%ELSE %DO;
		DATA fet (RENAME = (cValue1 = Prob));
			SET fet;
			Statistic = "Fisher's Exact Test";
			WHERE NAME1 = "XP2_FISH";
			KEEP statistic cValue1;
		RUN;

		DATA fet;
			RETAIN Statistic Prob;
			SET fet;
		RUN;

		PROC PRINT DATA = fet;
		RUN;
		%END;

		ODS STARTPAGE = NOW;
		%LET i = %EVAL(&i + 1);
		%LET dep = %SCAN(&predictor, &i);
	%END;
%MEND;
Out[17]:

22                                                         The SAS System                        16:09 Wednesday, September 13, 2017

370 ods listing close;ods html5 (id=saspy_internal) file=_tomods1 options(bitmap_mode='inline') device=svg; ods graphics on /
370 ! outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: _TOMODS1
371
372 %MACRO twobytwov3(dataset, predictor, response);
373 %LET n = %LENGTH(predictor);
374 %LET i = 1;
375 %LET dep = %SCAN(&predictor, &i);
376 %DO %WHILE(&dep ne);
377 PROC FREQ DATA = &dataset;
378 TABLE &dep*&response / CHISQ EXPECTED;
379 ODS SELECT NONE;
380 ODS OUTPUT crosstabfreqs = ct ChiSq = chi FishersExact = fet;
381 RUN;
382
383 PROC FREQ DATA = &dataset;
384 TITLE "&dep vs &response";
385 TABLE &dep*&response / NOCOL NOPERCENT;
386 ODS SELECT CrossTabFreqs;
387 RUN;
388 TITLE;
389
390 %LET low_count = 0;
391 DATA _NULL_; *Run through the data set without creating a new one;
392 SET ct;
393 IF EXPECTED < 5 and &dep ne . and &response ne . THEN CALL SYMPUT('low_count', 1);
394 RUN;
395
396 %IF &low_count = 0 %THEN %DO;
397 DATA chi;
398 SET chi;
399 WHERE statistic="Continuity Adj. Chi-Square";
400 DROP TABLE DF VALUE;
401 RUN;
402
403 PROC PRINT DATA = chi;
404 RUN;
405 %END;
406 %ELSE %DO;
407 DATA fet (RENAME = (cValue1 = Prob));
408 SET fet;
409 Statistic = "Fisher's Exact Test";
410 WHERE NAME1 = "XP2_FISH";
411 KEEP statistic cValue1;
412 RUN;
413
414 DATA fet;
415 RETAIN Statistic Prob;
416 SET fet;
417 RUN;
418
419 PROC PRINT DATA = fet;
420 RUN;
421 %END;
422
423 ODS STARTPAGE = NOW;
424 %LET i = %EVAL(&i + 1);
425 %LET dep = %SCAN(&predictor, &i);
426 %END;
427 %MEND;
428
429 ods html5 (id=saspy_internal) close;ods listing;
430
In [18]:
%twobytwov3(fghmtemp, prevap diabetes prevmi prevstrk prevhyp, prevchd);
Out[18]:
SAS Output

SAS Output

prevap vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVAP * PREVCHD

Cross-Tabular Freq Table

Frequency
Row Pct
Table of PREVAP by PREVCHD
PREVAP(Prevalent Angina) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
96.30
17
3.70
460
 
Yes
0
0.00
40
100.00
40
 
Total
443
57
500

The PRINT Procedure

Data Set WORK.FET

Obs Statistic Prob
1 Fisher's Exact Test <.0001

diabetes vs prevchd

The FREQ Procedure

The FREQ Procedure

Table DIABETES * PREVCHD

Cross-Tabular Freq Table

Frequency
Row Pct
Table of DIABETES by PREVCHD
DIABETES(Diabetic Y/N) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
408
89.87
46
10.13
454
 
Yes
35
76.09
11
23.91
46
 
Total
443
57
500

The PRINT Procedure

Data Set WORK.CHI

Obs Statistic Prob
1 Continuity Adj. Chi-Square 0.0105

prevmi vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVMI * PREVCHD

Cross-Tabular Freq Table

Frequency
Row Pct
Table of PREVMI by PREVCHD
PREVMI(Prevalent MI (Hosp,Silent)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
443
94.46
26
5.54
469
 
Yes
0
0.00
31
100.00
31
 
Total
443
57
500

The PRINT Procedure

Data Set WORK.FET

Obs Statistic Prob
1 Fisher's Exact Test <.0001

prevstrk vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVSTRK * PREVCHD

Cross-Tabular Freq Table

Frequency
Row Pct
Table of PREVSTRK by PREVCHD
PREVSTRK(Prevalent Stroke (Infarct,Hem)) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
438
89.02
54
10.98
492
 
Yes
5
62.50
3
37.50
8
 
Total
443
57
500

The PRINT Procedure

Data Set WORK.FET

Obs Statistic Prob
1 Fisher's Exact Test 0.0519

prevhyp vs prevchd

The FREQ Procedure

The FREQ Procedure

Table PREVHYP * PREVCHD

Cross-Tabular Freq Table

Frequency
Row Pct
Table of PREVHYP by PREVCHD
PREVHYP(Prevalent Hypertension) PREVCHD(Prevalent CHD (MI,AP,CI))
No Yes Total
No
191
93.63
13
6.37
204
 
Yes
252
85.14
44
14.86
296
 
Total
443
57
500

The PRINT Procedure

Data Set WORK.CHI

Obs Statistic Prob
1 Continuity Adj. Chi-Square 0.0052
In [ ]: