17. Solutions to Exercises

17.1. SAS Windowing Environment

Use the sample SAS program code provided above and the Charm City Circulator Ridership dataset to experiment with using SAS by making the following changes to the provided code.

  1. Go through and change the colors using the attribute map dataset palette to something other than what they originally were. See https://support.sas.com/content/dam/SAS/support/en/books/pro-template-made-easy-a-guide-for-sas-users/62007_Appendix.pdf for a large list of colors in SAS.

  2. Change the days you are keeping to show "Sunday" instead of “Saturday".

  3. Change the plot from a line plot to a scatterplot by changing series to scatter in proc sgpanel.

%let working_directory = data; *Replace data with your path to your 
                                directory containing the dataset;

proc import datafile = "&working_directory/Charm_City_Circulator_Ridership.csv" 
    out = circ dbms = csv replace;
    getnames = yes;
    guessingrows = max;
run;

proc sort data = circ;
  by day date daily;
run;

proc transpose data = circ out = long (rename=(_NAME_ = var col1 = number));
  var orangeboardings -- banneraverage;
  by day date daily;
run;

data long2;
  set long;
  var = tranwrd(var, 'Board', '_Board');
  var = tranwrd(var, 'Alight', '_Alight');
  var = tranwrd(var, 'Average', '_Average');
  line = scan(var, 1, '_');
  type = scan(var, 2, '_');
  drop var;
run;

data avg;
  set long2;
  where (type = "Average");
run;

data palette;
  input id $ value $ linecolor $;
  markercolor = linecolor;
  datalines;
lc banner red
lc green yellow
lc orange darkblue
lc purple brown
;
run;

proc sgpanel data = avg dattrmap=palette; 
  where day in ("Monday" "Tuesday" "Friday" "Sunday");
  panelby day;
  scatter Y = number X = date / group = line attrid = lc;
run;
SAS Output
The SGPanel Procedure

17.2. Basic SAS Programming

  1. Submit the following simple SAS program. What does it do?

    PROC PRINT DATA = sashelp.heart (OBS=10);
    RUN;
    
    PROC MEANS DATA = sashelp.heart;
      VAR height weight;
    RUN;
    
PROC PRINT DATA = sashelp.heart (OBS=10);
RUN;

PROC MEANS DATA = sashelp.heart;
  VAR height weight;
RUN;
SAS Output

The SAS System

Obs Status DeathCause AgeCHDdiag Sex AgeAtStart Height Weight Diastolic Systolic MRW Smoking AgeAtDeath Cholesterol Chol_Status BP_Status Weight_Status Smoking_Status
1 Dead Other . Female 29 62.50 140 78 124 121 0 55 .   Normal Overweight Non-smoker
2 Dead Cancer . Female 41 59.75 194 92 144 183 0 57 181 Desirable High Overweight Non-smoker
3 Alive   . Female 57 62.25 132 90 170 114 10 . 250 High High Overweight Moderate (6-15)
4 Alive   . Female 39 65.75 158 80 128 123 0 . 242 High Normal Overweight Non-smoker
5 Alive   . Male 42 66.00 156 76 110 116 20 . 281 High Optimal Overweight Heavy (16-25)
6 Alive   . Female 58 61.75 131 92 176 117 0 . 196 Desirable High Overweight Non-smoker
7 Alive   . Female 36 64.75 136 80 112 110 15 . 196 Desirable Normal Overweight Moderate (6-15)
8 Dead Other . Male 53 65.50 130 80 114 99 0 77 276 High Normal Normal Non-smoker
9 Alive   . Male 35 71.00 194 68 132 124 0 . 211 Borderline Normal Overweight Non-smoker
10 Dead Cerebral Vascular Disease . Male 52 62.50 129 78 124 106 5 82 284 High Normal Normal Light (1-5)

The SAS System

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
Height
Weight
5203
5203
64.8131847
153.0866808
3.5827074
28.9154261
51.5000000
67.0000000
76.5000000
300.0000000

This program prints out the first 10 rows of the sashelp.heart dataset and the summary statistics N (sample size), mean, standard deviation, minimum and maxium for the Height and Weight variables.

  1. Submit the following small program. How many DATA steps are there? How man PROC steps? Which steps are executed?

    DATA work.quant;
      SET sashelp.heart; /* Read in from the heart datset */
      KEEP height weight diastolic systolic; *Only keep these variable 
                                          in the new dataset;
    RUN;
    
    PROC PRINT DATA = work.quant (OBS=10);
    RUN;
    
    /*
    PROC MEANS DATA = work.quant MIN Q1 Median Q3 MAX MEAN STD;
    RUN;
    */
    
DATA work.quant;
  SET sashelp.heart; /* Read in from the heart datset */
  KEEP height weight diastolic systolic; *Only keep these variable 
                                       in the new dataset;
RUN;

PROC PRINT DATA = work.quant (OBS=10);
RUN;

/*
PROC MEANS DATA = work.quant MIN Q1 Median Q3 MAX MEAN STD;
RUN;
*/
SAS Output

The SAS System

Obs Height Weight Diastolic Systolic
1 62.50 140 78 124
2 59.75 194 92 144
3 62.25 132 90 170
4 65.75 158 80 128
5 66.00 156 76 110
6 61.75 131 92 176
7 64.75 136 80 112
8 65.50 130 80 114
9 71.00 194 68 132
10 62.50 129 78 124

There is one DATA step and there are two PROC steps; however, the last PROC MEANS is not run since it is within a paragraph comment.

  1. Run PROC CONTENTS on the dataset sashelp.heart. How many observations are in this datset?

PROC CONTENTS data = sashelp.heart;
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name SASHELP.HEART Observations 5209
Member Type DATA Variables 17
Engine V9 Indexes 0
Created 10/25/2018 02:21:26 Observation Length 168
Last Modified 10/25/2018 02:21:26 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label Framingham Heart Study    
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding us-ascii ASCII (ANSI)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 14
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 365
Number of Data Set Repairs 0
Filename /opt/sasinside/SASHome/SASFoundation/9.4/sashelp/heart.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 6235
Access Permission rw-r--r--
Owner Name sas
File Size 960KB
File Size (bytes) 983040
Alphabetic List of Variables and Attributes
# Variable Type Len Label
12 AgeAtDeath Num 8 Age at Death
5 AgeAtStart Num 8 Age at Start
3 AgeCHDdiag Num 8 Age CHD Diagnosed
15 BP_Status Char 7 Blood Pressure Status
14 Chol_Status Char 10 Cholesterol Status
13 Cholesterol Num 8  
2 DeathCause Char 26 Cause of Death
8 Diastolic Num 8  
6 Height Num 8  
10 MRW Num 8 Metropolitan Relative Weight
4 Sex Char 6  
11 Smoking Num 8  
17 Smoking_Status Char 17 Smoking Status
1 Status Char 5  
9 Systolic Num 8  
7 Weight Num 8  
16 Weight_Status Char 11 Weight Status

There are 5209 observations (rows) in the dataset.

  1. Submit the following program to SAS. What is the value of CurrentDate? (This value represents the number of days since Januar 1, 1960, the reference date for the SAS system).

    DATA work.date;
       CurrentDate = Today();
    RUN;
    
    PROC PRINT DATA = work.date;
    RUN;
    
DATA work.date;
  CurrentDate = Today();
RUN;
   
PROC PRINT DATA = work.date;
RUN;

PROC PRINT DATA = work.date;
   format currentdate mmddyy10.;
RUN;
SAS Output

The SAS System

Obs CurrentDate
1 22205

The SAS System

Obs CurrentDate
1 10/17/2020

The value of current date will change depending on when you submit this program. I submitted this program in 10/17/2020 and received the value 22205 or 22205 days since January 1, 1960.

17.3. Data I/O

  1. Use PROC IMPORT to read in the Youth Tobacco study, Youth_Tobacco_Survey_YTS_Data.csv and name it youth.

PROC IMPORT datafile = "/folders/myfolders/SAS_Notes/data/Youth_Tobacco_Survey_YTS_Data.csv"
            out = youth dbms = csv replace;
    getnames = yes;
    guessingrows = 500;
RUN;

PROC PRINT data = youth (OBS=5);
RUN;
SAS Output

The SAS System

Obs YEAR LocationAbbr LocationDesc TopicType TopicDesc MeasureDesc DataSource Response Data_Value_Unit Data_Value_Type Data_Value Data_Value_Footnote_Symbol Data_Value_Footnote Data_Value_Std_Err Low_Confidence_Limit High_Confidence_Limit Sample_Size Gender Race Age Education GeoLocation TopicTypeId TopicId MeasureId StratificationID1 StratificationID2 StratificationID3 StratificationID4 SubMeasureID DisplayOrder
1 2015 AZ Arizona Tobacco Use - Survey Data Cessation (Youth) Percent of Current Smokers Who Want to Quit YTS   % Percentage . * Data in these cells have been suppressed because of a small sample size . . . . Overall All Races All Ages Middle School (34.865970280000454, -111.76381127699972) BEH 105BEH 170CES 1GEN 8AGE 6RAC 1EDU YTS01 1
2 2015 AZ Arizona Tobacco Use - Survey Data Cessation (Youth) Percent of Current Smokers Who Want to Quit YTS   % Percentage . * Data in these cells have been suppressed because of a small sample size . . . . Male All Races All Ages Middle School (34.865970280000454, -111.76381127699972) BEH 105BEH 170CES 2GEN 8AGE 6RAC 1EDU YTS02 2
3 2015 AZ Arizona Tobacco Use - Survey Data Cessation (Youth) Percent of Current Smokers Who Want to Quit YTS   % Percentage . * Data in these cells have been suppressed because of a small sample size . . . . Female All Races All Ages Middle School (34.865970280000454, -111.76381127699972) BEH 105BEH 170CES 3GEN 8AGE 6RAC 1EDU YTS03 3
4 2015 AZ Arizona Tobacco Use - Survey Data Cessation (Youth) Quit Attempt in Past Year Among Current Cigarette Smokers YTS   % Percentage . * Data in these cells have been suppressed because of a small sample size . . . . Overall All Races All Ages Middle School (34.865970280000454, -111.76381127699972) BEH 105BEH 169QUA 1GEN 8AGE 6RAC 1EDU YTS04 4
5 2015 AZ Arizona Tobacco Use - Survey Data Cessation (Youth) Quit Attempt in Past Year Among Current Cigarette Smokers YTS   % Percentage . * Data in these cells have been suppressed because of a small sample size . . . . Male All Races All Ages Middle School (34.865970280000454, -111.76381127699972) BEH 105BEH 169QUA 2GEN 8AGE 6RAC 1EDU YTS05 5
  1. Download an Excel version of the Monuments dataset, Monuments.xlsx, from CANVAS. Use PROC IMPORT to read in the dataset and call the output mon.

PROC IMPORT datafile = "/folders/myfolders/SAS_Notes/data/Monuments.xlsx"
            out = mon dbms = xlsx replace;
    getnames = yes;
RUN;

PROC PRINT data = mon (OBS=5);
RUN;
SAS Output

The SAS System

Obs name zipCode neighborhood councilDistrict policeDistrict Location 1
1 James Cardinal Gibbons 21201 Downtown 11 CENTRAL 408 CHARLES ST Baltimore, MD
2 The Battle Monument 21202 Downtown 11 CENTRAL  
3 Negro Heroes of the U.S Monument 21202 Downtown 11 CENTRAL  
4 Star Bangled Banner 21202 Downtown 11 CENTRAL 100 HOLLIDAY ST Baltimore, MD
5 Flame at the Holocaust Monument 21202 Downtown 11 CENTRAL 50 MARKET PL Baltimore, MD
  1. Write a DATA step that will read in the following column formatted data using DATALINES and print the results using PROC PRINT. This dataset contains information on esophageal cancer in Ille-et-Vilaine, France. The variables are

    • Age group

    • Alcohol consumption

    • Tobacco consumption

    • Number of Cases

    • Number of Controls

Write a DATA step to read in the data and print the results. Note: the first line is to help you count the columns. Each number corresponds to a multiple of 5.

----1----2----3----4----5----6----7----8
75+ 0-39g/day 0-9g/day      1        18
75+ 0-39g/day    10-19      2         6
75+ 0-39g/day      30+      1         3
75+     40-79 0-9g/day      2         5
75+     40-79    10-19      1         3
75+     40-79    20-29      0         3
75+     40-79      30+      1         1
75+    80-119 0-9g/day      1         1
75+    80-119    10-19      1         1
75+      120+ 0-9g/day      2         2
75+      120+    10-19      1         1
DATA eso;
   INPUT aggrp $ 1-3 alc $ 5-13 smoke $ 15-22 ncases 29 ncontrl 38-39;
   DATALINES;
75+ 0-39g/day 0-9g/day      1        18
75+ 0-39g/day    10-19      2         6
75+ 0-39g/day      30+      1         3
75+     40-79 0-9g/day      2         5
75+     40-79    10-19      1         3
75+     40-79    20-29      0         3
75+     40-79      30+      1         1
75+    80-119 0-9g/day      1         1
75+    80-119    10-19      1         1
75+      120+ 0-9g/day      2         2
75+      120+    10-19      1         1
;
RUN;

PROC PRINT DATA=eso;
RUN;
SAS Connection established. Subprocess id is 2198
SAS Output

The SAS System

Obs aggrp alc smoke ncases ncontrl
1 75+ 0-39g/day 0-9g/day 1 18
2 75+ 0-39g/day 10-19 2 6
3 75+ 0-39g/day 30+ 1 3
4 75+ 40-79 0-9g/day 2 5
5 75+ 40-79 10-19 1 3
6 75+ 40-79 20-29 0 3
7 75+ 40-79 30+ 1 1
8 75+ 80-119 0-9g/day 1 1
9 75+ 80-119 10-19 1 1
10 75+ 120+ 0-9g/day 2 2
11 75+ 120+ 10-19 1 1

17.4. SAS Variables and Assignment Statements

  1. Use the following SAS data step to create the new variables Grade and Course defined below:

data school;
   input Age Quiz : $1. Midterm Final;
   /* Add you statements here */
datalines;
12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82
;

Using If-Then-Else statements, compute two new variables as follows:

  • Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.

  • The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75, D = 70, and F = 65. Using this information, compute a course grade (Course) as a weighted average of the Quiz (20%), Midterm (30%) and Final (50%).

data school;
   input Age Quiz : $1. Midterm Final;
   IF Age = 12 THEN Grade = 6;
   ELSE IF AGE = 13 THEN Grade = 8;
   IF quiz = "A" THEN Course = 0.2*95 + 0.3*midterm + 0.5*final;
   ELSE IF quiz = "B" THEN Course = 0.2*85 + 0.3*midterm + 0.5*final;
   ELSE IF quiz = "C" THEN Course = 0.2*75 + 0.3*midterm + 0.5*final;
   ELSE IF quiz = "D" THEN Course = 0.2*70 + 0.3*midterm + 0.5*final;
   ELSE IF quiz = "F" THEN Course = 0.2*65 + 0.3*midterm + 0.5*final;
datalines;
12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82
;
RUN;

PROC PRINT DATA = school;
RUN;
SAS Output

The SAS System

Obs Age Quiz Midterm Final Grade Course
1 12 A 92 95 6 94.1
2 12 B 88 88 6 87.4
3 13 C 78 75 8 75.9
4 13 A 92 93 8 93.1
5 12 F 55 62 6 60.5
6 13 B 88 82 8 84.4

17.5. Working with Your Data

  1. Create a temporary SAS dataset called heart from the sashelp.heart dataset. The library sashelp if a permanent library that comes with SAS.

DATA heart;
  SET sashelp.heart;
RUN;

1348  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
1348! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1349
1350 DATA heart;
1351 SET sashelp.heart;
1352 RUN;
NOTE: There were 5209 observations read from the data set SASHELP.HEART.
NOTE: The data set WORK.HEART has 5209 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

1353
1354 ods html5 (id=saspy_internal) close;ods listing;

1355
  1. View the description of the heart dataset by running PROC CONTENTS. How many observations (rows) and variables (columns) are in the dataset?

PROC CONTENTS DATA = heart;
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART Observations 5209
Member Type DATA Variables 17
Engine V9 Indexes 0
Created 10/17/2020 21:35:04 Observation Length 168
Last Modified 10/17/2020 21:35:04 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 14
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 365
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671720
Access Permission rw-r--r--
Owner Name sasdemo
File Size 960KB
File Size (bytes) 983040
Alphabetic List of Variables and Attributes
# Variable Type Len Label
12 AgeAtDeath Num 8 Age at Death
5 AgeAtStart Num 8 Age at Start
3 AgeCHDdiag Num 8 Age CHD Diagnosed
15 BP_Status Char 7 Blood Pressure Status
14 Chol_Status Char 10 Cholesterol Status
13 Cholesterol Num 8  
2 DeathCause Char 26 Cause of Death
8 Diastolic Num 8  
6 Height Num 8  
10 MRW Num 8 Metropolitan Relative Weight
4 Sex Char 6  
11 Smoking Num 8  
17 Smoking_Status Char 17 Smoking Status
9 Systolic Num 8  
7 Weight Num 8  
16 Weight_Status Char 11 Weight Status
1 condition Char 5  

There are 5209 observations (rows) and 17 variables (columns).

  1. Use the RENAME= option to rename Status to condition. Show the variables table from PROC CONTENTS to see that the name has changed.

DATA heart2 (RENAME = (status = condition));
   SET heart;
RUN;

PROC CONTENTS data = heart2;
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART2 Observations 5209
Member Type DATA Variables 17
Engine V9 Indexes 0
Created 10/17/2020 21:39:47 Observation Length 168
Last Modified 10/17/2020 21:39:47 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 14
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 365
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart2.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671720
Access Permission rw-r--r--
Owner Name sasdemo
File Size 960KB
File Size (bytes) 983040
Alphabetic List of Variables and Attributes
# Variable Type Len Label
12 AgeAtDeath Num 8 Age at Death
5 AgeAtStart Num 8 Age at Start
3 AgeCHDdiag Num 8 Age CHD Diagnosed
15 BP_Status Char 7 Blood Pressure Status
14 Chol_Status Char 10 Cholesterol Status
13 Cholesterol Num 8  
2 DeathCause Char 26 Cause of Death
8 Diastolic Num 8  
6 Height Num 8  
10 MRW Num 8 Metropolitan Relative Weight
4 Sex Char 6  
11 Smoking Num 8  
17 Smoking_Status Char 17 Smoking Status
9 Systolic Num 8  
7 Weight Num 8  
16 Weight_Status Char 11 Weight Status
1 condition Char 5  
  1. Create a subset of heart called heart_sub that contains only the columns height, weight, and diastolic. What are the dimensions of heart_sub? Print the first five rows of heart_sub.

DATA heart_sub;
   SET heart;
   KEEP height weight diastolic;
RUN;

PROC CONTENTS data = heart_sub;
RUN;

PROC PRINT data = heart_sub(OBS=5);
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART_SUB Observations 5209
Member Type DATA Variables 3
Engine V9 Indexes 0
Created 10/17/2020 21:39:50 Observation Length 24
Last Modified 10/17/2020 21:39:50 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 2714
Obs in First Data Page 2625
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671721
Access Permission rw-r--r--
Owner Name sasdemo
File Size 192KB
File Size (bytes) 196608
Alphabetic List of Variables and Attributes
# Variable Type Len
3 Diastolic Num 8
1 Height Num 8
2 Weight Num 8

The SAS System

Obs Height Weight Diastolic
1 62.50 140 78
2 59.75 194 92
3 62.25 132 90
4 65.75 158 80
5 66.00 156 76

The new dataset heart_sub contains 5209 rows and 3 columns.

  1. Subset the patients (rows) of heart that have a diastolic blood pressure (diastolic) more than 90 into a dataset called heart_sub2. How many are there? Print the first five rows of heart_sub2.

DATA heart_sub2;
   SET heart;
   WHERE diastolic > 90;
RUN;

PROC CONTENTS data = heart_sub2;
RUN;

PROC PRINT data = heart_sub2(OBS=5);
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART_SUB2 Observations 1366
Member Type DATA Variables 17
Engine V9 Indexes 0
Created 10/17/2020 21:39:55 Observation Length 168
Last Modified 10/17/2020 21:39:55 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 4
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 365
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub2.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671723
Access Permission rw-r--r--
Owner Name sasdemo
File Size 320KB
File Size (bytes) 327680
Alphabetic List of Variables and Attributes
# Variable Type Len Label
12 AgeAtDeath Num 8 Age at Death
5 AgeAtStart Num 8 Age at Start
3 AgeCHDdiag Num 8 Age CHD Diagnosed
15 BP_Status Char 7 Blood Pressure Status
14 Chol_Status Char 10 Cholesterol Status
13 Cholesterol Num 8  
2 DeathCause Char 26 Cause of Death
8 Diastolic Num 8  
6 Height Num 8  
10 MRW Num 8 Metropolitan Relative Weight
4 Sex Char 6  
11 Smoking Num 8  
17 Smoking_Status Char 17 Smoking Status
1 Status Char 5  
9 Systolic Num 8  
7 Weight Num 8  
16 Weight_Status Char 11 Weight Status

The SAS System

Obs Status DeathCause AgeCHDdiag Sex AgeAtStart Height Weight Diastolic Systolic MRW Smoking AgeAtDeath Cholesterol Chol_Status BP_Status Weight_Status Smoking_Status
1 Dead Cancer . Female 41 59.75 194 92 144 183 0 57 181 Desirable High Overweight Non-smoker
2 Alive   . Female 58 61.75 131 92 176 117 0 . 196 Desirable High Overweight Non-smoker
3 Alive   . Female 42 67.75 162 96 138 119 1 . 200 Borderline High Overweight Light (1-5)
4 Dead Coronary Heart Disease 71 Female 49 60.50 153 110 196 140 5 73 221 Borderline High Overweight Light (1-5)
5 Alive   43 Male 33 66.50 172 106 146 127 0 . 247 High High Overweight Non-smoker

There are 1366 patients with a diastolic blood pressure above 90.

  1. Subset the patients (rows) of heart that have a diastolic blood pressure less than or equal to 90 and have a weight more than 180 pounds into a dataset called heart_sub3. How many are there? Print the first five rows of heart_sub3.

DATA heart_sub3;
   SET heart;
   WHERE diastolic <= 90 AND weight > 180;
RUN;

PROC CONTENTS data = heart_sub3;
RUN;

PROC PRINT data = heart_sub3(OBS=5);
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART_SUB3 Observations 481
Member Type DATA Variables 17
Engine V9 Indexes 0
Created 10/17/2020 21:43:36 Observation Length 168
Last Modified 10/17/2020 21:43:36 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 2
First Data Page 1
Max Obs per Page 389
Obs in First Data Page 365
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub3.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671722
Access Permission rw-r--r--
Owner Name sasdemo
File Size 192KB
File Size (bytes) 196608
Alphabetic List of Variables and Attributes
# Variable Type Len Label
12 AgeAtDeath Num 8 Age at Death
5 AgeAtStart Num 8 Age at Start
3 AgeCHDdiag Num 8 Age CHD Diagnosed
15 BP_Status Char 7 Blood Pressure Status
14 Chol_Status Char 10 Cholesterol Status
13 Cholesterol Num 8  
2 DeathCause Char 26 Cause of Death
8 Diastolic Num 8  
6 Height Num 8  
10 MRW Num 8 Metropolitan Relative Weight
4 Sex Char 6  
11 Smoking Num 8  
17 Smoking_Status Char 17 Smoking Status
1 Status Char 5  
9 Systolic Num 8  
7 Weight Num 8  
16 Weight_Status Char 11 Weight Status

The SAS System

Obs Status DeathCause AgeCHDdiag Sex AgeAtStart Height Weight Diastolic Systolic MRW Smoking AgeAtDeath Cholesterol Chol_Status BP_Status Weight_Status Smoking_Status
1 Alive   . Male 35 71.00 194 68 132 124 0 . 211 Borderline Normal Overweight Non-smoker
2 Alive   . Female 50 67.50 185 88 150 136 15 . 228 Borderline High Overweight Moderate (6-15)
3 Alive   . Male 42 72.25 182 78 136 113 0 . 221 Borderline Normal Overweight Non-smoker
4 Alive   68 Male 40 70.00 189 78 124 124 0 . 319 High Normal Overweight Non-smoker
5 Alive   68 Male 40 70.00 195 76 132 128 20 . 205 Borderline Normal Overweight Heavy (16-25)

There are 481 patients with a diastolic blood pressure less than or equal to 90 and with a weight above 180 pounds.

  1. Create a subset of heart that only contains the columns weight, systolic, and cholesterol for those that are heavy smokers smoker (Smoking_Status = “Heavy (16-25)”). Save this to heart_sub. What are the dimensions of the dataset? Print the first five rows of heart_sub.

DATA heart_sub;
   SET heart;
   WHERE Smoking_Status = "Heavy (16-25)";
   KEEP weight systolic cholesterol;
RUN;

PROC CONTENTS data = heart_sub;
RUN;

PROC PRINT data = heart_sub(OBS=5);
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.HEART_SUB Observations 1046
Member Type DATA Variables 3
Engine V9 Indexes 0
Created 10/17/2020 21:47:19 Observation Length 24
Last Modified 10/17/2020 21:47:19 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 2714
Obs in First Data Page 1046
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671725
Access Permission rw-r--r--
Owner Name sasdemo
File Size 128KB
File Size (bytes) 131072
Alphabetic List of Variables and Attributes
# Variable Type Len
3 Cholesterol Num 8
2 Systolic Num 8
1 Weight Num 8

The SAS System

Obs Weight Systolic Cholesterol
1 156 110 281
2 195 132 205
3 181 144 223
4 152 144 263
5 196 128 .

The new heart_sub dataset contains 1046 rows and three columns.

  1. Reorder the rows of heart_sub in decreasing order by cholesterol. Print the first 5 rows of the sorted dataset.

PROC SORT data = heart_sub out = heart_sub_srt;
  BY descending cholesterol;
RUN;

PROC PRINT data = heart_sub_srt(OBS=5);
RUN;
SAS Output

The SAS System

Obs Weight Systolic Cholesterol
1 169 155 568
2 205 154 534
3 194 150 429
4 104 122 392
5 137 185 368

17.6. Data Summarization

  1. How many bike lanes are currently in Baltimore? You can assume that each observation/row is a different bike lane.

PROC IMPORT datafile="/folders/myfolders/SAS_Notes/data/Bike_Lanes.csv"
            out = bike dbms = csv replace;
    getnames = yes;
    guessingrows = max;
RUN;

PROC CONTENTS data = bike;
RUN;
SAS Output

The SAS System

The CONTENTS Procedure

Data Set Name WORK.BIKE Observations 1631
Member Type DATA Variables 9
Engine V9 Indexes 0
Created 10/17/2020 22:01:39 Observation Length 136
Last Modified 10/17/2020 22:01:39 Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64    
Encoding utf-8 Unicode (UTF-8)    
Engine/Host Dependent Information
Data Set Page Size 65536
Number of Data Set Pages 4
First Data Page 1
Max Obs per Page 481
Obs in First Data Page 459
Number of Data Set Repairs 0
Filename /tmp/SAS_workA5B9000010B1_localhost.localdomain/bike.sas7bdat
Release Created 9.0401M6
Host Created Linux
Inode Number 671721
Access Permission rw-r--r--
Owner Name sasdemo
File Size 320KB
File Size (bytes) 327680
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
3 block Char 29 $29. $29.
9 dateInstalled Num 8 BEST12. BEST32.
8 length Num 8 BEST12. BEST32.
2 name Char 20 $20. $20.
5 numLanes Num 8 BEST12. BEST32.
6 project Char 25 $25. $25.
7 route Char 13 $13. $13.
1 subType Char 6 $6. $6.
4 type Char 15 $15. $15.

There are 1631 rows.

  1. How many (a) feet and (b) miles of total bike lanes are currently in Baltimore? (The length variable provides the length in feet.)

PROC MEANS data = bike sum;
   VAR length;
RUN;
SAS Output

The SAS System

The MEANS Procedure

Analysis Variable : length
Sum
439447.59
DATA miles;
   length_mi = 439447.59 / 5280;
RUN;

PROC PRINT data = miles;
RUN;
SAS Output

The SAS System

Obs length_mi
1 83.2287
  1. How many types (type) bike lanes are there? Which type (a) occurs the most and (b) has the longest average bike lane length?

PROC FREQ data = bike;
  TABLE type;
RUN;

PROC MEANS data = bike noprint;
   CLASS type;
   VAR length;
   OUTPUT out = type_mean mean = length_mean;
RUN;

PROC SORT data = type_mean;
  BY descending length_mean;
RUN;

PROC PRINT DATA = type_mean;
RUN;
SAS Output

The SAS System

The FREQ Procedure

type Frequency Percent Cumulative
Frequency
Cumulative
Percent
Frequency Missing = 9
BIKE BOULEVARD 49 3.02 49 3.02
BIKE LANE 621 38.29 670 41.31
CONTRAFLOW 13 0.80 683 42.11
SHARED BUS BIKE 39 2.40 722 44.51
SHARROW 589 36.31 1311 80.83
SIDEPATH 7 0.43 1318 81.26
SIGNED ROUTE 304 18.74 1622 100.00

The SAS System

Obs type _TYPE_ _FREQ_ length_mean
1 SIDEPATH 1 7 665.87751019
2 BIKE LANE 1 621 300.24183502
3 SHARED BUS BIKE 1 39 276.66575695
4   0 1622 269.48876975
5 SIGNED ROUTE 1 304 263.57986593
6 SHARROW 1 589 243.8718592
7 BIKE BOULEVARD 1 49 197.27815495
8 CONTRAFLOW 1 13 136.46646088

There are 7 bike lane types (8 if you include missing as a category). The lane type that occurrs the most often is bike lane with a count of 621. The lane type sidepath has the longest average length at 665.88 ft.

  1. How many different projects (project) do the bike lanes fall into? Which project category has the longest average bike lane length?

PROC FREQ data = bike;
  TABLE project;
RUN;

PROC MEANS data = bike noprint;
   CLASS project;
   VAR length;
   OUTPUT out = project_mean mean = length_mean;
RUN;

PROC SORT data = project_mean;
  BY descending length_mean;
RUN;

PROC PRINT DATA = project_mean;
RUN;
SAS Output

The SAS System

The FREQ Procedure

project Frequency Percent Cumulative
Frequency
Cumulative
Percent
Frequency Missing = 74
CHARM CITY CIRCULATOR 39 2.50 39 2.50
COLLEGETOWN 339 21.77 378 24.28
COLLEGETOWN NETWORK 13 0.83 391 25.11
ENGINEERING CONSTRUCTION 12 0.77 403 25.88
GUILFORD AVE BIKE BLVD 49 3.15 452 29.03
MAINTENANCE 4 0.26 456 29.29
OPERATION ORANGE CONE 458 29.42 914 58.70
PARK HEIGHTS BIKE NETWORK 172 11.05 1086 69.75
PLANNING TRAFFIC 18 1.16 1104 70.91
SOUTHEAST BIKE NETWORK 323 20.75 1427 91.65
TRAFFIC 51 3.28 1478 94.93
TRAFFIC CALMING 79 5.07 1557 100.00

The SAS System

Obs project _TYPE_ _FREQ_ length_mean
1 MAINTENANCE 1 4 1942.1522856
2 ENGINEERING CONSTRUCTION 1 12 512.09762483
3 TRAFFIC 1 51 419.52878809
4 COLLEGETOWN 1 339 320.68363126
5 PARK HEIGHTS BIKE NETWORK 1 172 283.22519367
6 CHARM CITY CIRCULATOR 1 39 276.66575695
7   0 1557 272.0534698
8 TRAFFIC CALMING 1 79 268.53137627
9 OPERATION ORANGE CONE 1 458 250.07840273
10 COLLEGETOWN NETWORK 1 13 213.63728443
11 SOUTHEAST BIKE NETWORK 1 323 210.82827388
12 PLANNING TRAFFIC 1 18 209.42893827
13 GUILFORD AVE BIKE BLVD 1 49 197.27815495

There are 12 project types (13 if you include missing as a project type). The maintenance project has the longest average bike lane length at 1942.15 ft.

  1. What was the average bike lane length per year that they were installed? (Be sure to first set dateInstalled to . if it is equal to zero.)

DATA bike;
   SET bike;
   IF dateinstalled = 0 THEN dateinstalled = .;
RUN;

PROC MEANS data = bike mean;
  CLASS dateinstalled;
  VAR length;
RUN;
SAS Output

The SAS System

The MEANS Procedure

Analysis Variable : length
dateInstalled N Obs Mean
2006 2 1469.35
2007 368 309.8903287
2008 206 249.4810027
2009 86 407.2947463
2010 625 245.6015777
2011 101 232.8127593
2012 107 271.4321825
2013 10 290.4263702
  1. (a) Numerically and (b) graphically describe the distribution of bike lane lengths (length).

PROC UNIVARIATE data = bike plots;
  VAR length;
RUN;
SAS Output

The SAS System

The UNIVARIATE Procedure

Variable: length

Moments
N 1631 Sum Weights 1631
Mean 269.434449 Sum Observations 439447.587
Std Deviation 277.706646 Variance 77120.9813
Skewness 5.1297861 Kurtosis 43.3888947
Uncorrected SS 244109518 Corrected SS 125707200
Coeff Variation 103.070208 Std Error Mean 6.87637089
Basic Statistical Measures
Location Variability
Mean 269.4344 Std Deviation 277.70665
Median 200.3027 Variance 77121
Mode . Range 3749
    Interquartile Range 217.16479
Tests for Location: Mu0=0
Test Statistic p Value
Student's t t 39.18265 Pr > |t| <.0001
Sign M 815 Pr >= |M| <.0001
Signed Rank S 664632.5 Pr >= |S| <.0001
Quantiles (Definition 5)
Level Quantile
100% Max 3749.3226
99% 1341.8291
95% 664.2305
90% 502.2232
75% Q3 341.1493
50% Median 200.3027
25% Q1 123.9845
10% 67.8206
5% 45.3655
1% 14.8383
0% Min 0.0000
Extreme Observations
Lowest Highest
Value Obs Value Obs
0.00000 4 2536.73 1284
5.51372 523 2715.20 1344
6.85712 724 2916.53 903
8.16243 490 3467.68 904
9.71914 595 3749.32 3
Plots for length
  1. Describe the distribution of bike lane lengths numerically and graphically after stratifying them by (a) type and then by (b) number of lanes (numLanes).

PROC MEANS data=bike mean std;
   CLASS type;
   VAR length;
RUN;

PROC SGPLOT data = bike;
   VBOX length / category = type;
RUN;

PROC MEANS data=bike mean std;
   CLASS numlanes;
   VAR length;
RUN;

PROC SGPLOT data = bike;
   VBOX length / category = numlanes;
RUN;
SAS Output

The SAS System

The MEANS Procedure

Analysis Variable : length
type N Obs Mean Std Dev
BIKE BOULEVARD 49 197.2781549 113.1318019
BIKE LANE 621 300.2418350 314.6605599
CONTRAFLOW 13 136.4664609 48.5165725
SHARED BUS BIKE 39 276.6657570 140.1811711
SHARROW 589 243.8718592 188.4222254
SIDEPATH 7 665.8775102 619.3894227
SIGNED ROUTE 304 263.5798659 347.8501541

The SGPlot Procedure

The SAS System

The MEANS Procedure

Analysis Variable : length
numLanes N Obs Mean Std Dev
0 21 308.3767969 201.5548096
1 900 277.9016282 317.8281529
2 710 257.5495894 218.6179448