{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Solutions to Exercises\n", "\n", "## SAS Windowing Environment\n", "\n", "Use the sample SAS program code provided above and the Charm City Circulator Ridership dataset to experiment with using SAS by making the following changes to the provided code.\n", "\n", "1. Go through and change the colors using the attribute map dataset `palette` to something other than what they originally were. See https://support.sas.com/content/dam/SAS/support/en/books/pro-template-made-easy-a-guide-for-sas-users/62007_Appendix.pdf for a large list of colors in SAS.\n", "2. Change the days you are keeping to show `\"Sunday\"` instead of \"`Saturday\"`.\n", "3. Change the plot from a line plot to a scatterplot by changing `series` to `scatter` in `proc sgpanel`." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 76, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%let working_directory = data; *Replace data with your path to your \n", " directory containing the dataset;\n", "\n", "proc import datafile = \"&working_directory/Charm_City_Circulator_Ridership.csv\" \n", " out = circ dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "run;\n", "\n", "proc sort data = circ;\n", " by day date daily;\n", "run;\n", "\n", "proc transpose data = circ out = long (rename=(_NAME_ = var col1 = number));\n", " var orangeboardings -- banneraverage;\n", " by day date daily;\n", "run;\n", "\n", "data long2;\n", " set long;\n", " var = tranwrd(var, 'Board', '_Board');\n", " var = tranwrd(var, 'Alight', '_Alight');\n", " var = tranwrd(var, 'Average', '_Average');\n", " line = scan(var, 1, '_');\n", " type = scan(var, 2, '_');\n", " drop var;\n", "run;\n", "\n", "data avg;\n", " set long2;\n", " where (type = \"Average\");\n", "run;\n", "\n", "data palette;\n", " input id $ value $ linecolor $;\n", " markercolor = linecolor;\n", " datalines;\n", "lc banner red\n", "lc green yellow\n", "lc orange darkblue\n", "lc purple brown\n", ";\n", "run;\n", "\n", "proc sgpanel data = avg dattrmap=palette; \n", " where day in (\"Monday\" \"Tuesday\" \"Friday\" \"Sunday\");\n", " panelby day;\n", " scatter Y = number X = date / group = line attrid = lc;\n", "run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic SAS Programming\n", "\n", "1. Submit the following simple SAS program. What does it do?\n", "\n", " ```\n", " PROC PRINT DATA = sashelp.heart (OBS=10);\n", " RUN;\n", "\n", " PROC MEANS DATA = sashelp.heart;\n", " VAR height weight;\n", " RUN;\n", " ```" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsStatusDeathCauseAgeCHDdiagSexAgeAtStartHeightWeightDiastolicSystolicMRWSmokingAgeAtDeathCholesterolChol_StatusBP_StatusWeight_StatusSmoking_Status
1DeadOther.Female2962.5014078124121055.  NormalOverweightNon-smoker
2DeadCancer.Female4159.7519492144183057181DesirableHighOverweightNon-smoker
3Alive  .Female5762.251329017011410.250HighHighOverweightModerate (6-15)
4Alive  .Female3965.75158801281230.242HighNormalOverweightNon-smoker
5Alive  .Male4266.001567611011620.281HighOptimalOverweightHeavy (16-25)
6Alive  .Female5861.75131921761170.196DesirableHighOverweightNon-smoker
7Alive  .Female3664.751368011211015.196DesirableNormalOverweightModerate (6-15)
8DeadOther.Male5365.501308011499077276HighNormalNormalNon-smoker
9Alive  .Male3571.00194681321240.211BorderlineNormalOverweightNon-smoker
10DeadCerebral Vascular Disease.Male5262.5012978124106582284HighNormalNormalLight (1-5)
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
VariableNMeanStd DevMinimumMaximum
\n", "
\n", "
Height
\n", "
Weight
\n", "
\n", "
\n", "
\n", "
5203
\n", "
5203
\n", "
\n", "
\n", "
\n", "
64.8131847
\n", "
153.0866808
\n", "
\n", "
\n", "
\n", "
3.5827074
\n", "
28.9154261
\n", "
\n", "
\n", "
\n", "
51.5000000
\n", "
67.0000000
\n", "
\n", "
\n", "
\n", "
76.5000000
\n", "
300.0000000
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC PRINT DATA = sashelp.heart (OBS=10);\n", "RUN;\n", "\n", "PROC MEANS DATA = sashelp.heart;\n", " VAR height weight;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This program prints out the first 10 rows of the sashelp.heart dataset and the summary statistics N (sample size), mean, standard deviation, minimum and maxium for the Height and Weight variables.\n", "\n", "2. Submit the following small program. How many DATA steps are there? How man PROC steps? Which steps are executed?\n", "\n", " ```\n", " DATA work.quant;\n", " SET sashelp.heart; /* Read in from the heart datset */\n", " KEEP height weight diastolic systolic; *Only keep these variable \n", " in the new dataset;\n", " RUN;\n", "\n", " PROC PRINT DATA = work.quant (OBS=10);\n", " RUN;\n", "\n", " /*\n", " PROC MEANS DATA = work.quant MIN Q1 Median Q3 MAX MEAN STD;\n", " RUN;\n", " */\n", " ```" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsHeightWeightDiastolicSystolic
162.5014078124
259.7519492144
362.2513290170
465.7515880128
566.0015676110
661.7513192176
764.7513680112
865.5013080114
971.0019468132
1062.5012978124
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA work.quant;\n", " SET sashelp.heart; /* Read in from the heart datset */\n", " KEEP height weight diastolic systolic; *Only keep these variable \n", " in the new dataset;\n", "RUN;\n", "\n", "PROC PRINT DATA = work.quant (OBS=10);\n", "RUN;\n", "\n", "/*\n", "PROC MEANS DATA = work.quant MIN Q1 Median Q3 MAX MEAN STD;\n", "RUN;\n", "*/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There is one DATA step and there are two PROC steps; however, the last PROC MEANS is not run since it is within a paragraph comment.\n", "\n", "3. Run PROC CONTENTS on the dataset sashelp.heart. How many observations are in this datset?" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameSASHELP.HEARTObservations5209
Member TypeDATAVariables17
EngineV9Indexes0
Created10/25/2018 02:21:26Observation Length168
Last Modified10/25/2018 02:21:26Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
LabelFramingham Heart Study  
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingus-ascii ASCII (ANSI)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages14
First Data Page1
Max Obs per Page389
Obs in First Data Page365
Number of Data Set Repairs0
Filename/opt/sasinside/SASHome/SASFoundation/9.4/sashelp/heart.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number6235
Access Permissionrw-r--r--
Owner Namesas
File Size960KB
File Size (bytes)983040
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenLabel
12AgeAtDeathNum8Age at Death
5AgeAtStartNum8Age at Start
3AgeCHDdiagNum8Age CHD Diagnosed
15BP_StatusChar7Blood Pressure Status
14Chol_StatusChar10Cholesterol Status
13CholesterolNum8 
2DeathCauseChar26Cause of Death
8DiastolicNum8 
6HeightNum8 
10MRWNum8Metropolitan Relative Weight
4SexChar6 
11SmokingNum8 
17Smoking_StatusChar17Smoking Status
1StatusChar5 
9SystolicNum8 
7WeightNum8 
16Weight_StatusChar11Weight Status
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC CONTENTS data = sashelp.heart;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 5209 observations (rows) in the dataset.\n", "\n", "4. Submit the following program to SAS. What is the value of CurrentDate? (This value represents the number of days since Januar 1, 1960, the reference date for the SAS system).\n", "\n", " ```\n", " DATA work.date;\n", " CurrentDate = Today();\n", " RUN;\n", " \n", " PROC PRINT DATA = work.date;\n", " RUN;\n", " ```" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsCurrentDate
122205
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsCurrentDate
110/17/2020
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA work.date;\n", " CurrentDate = Today();\n", "RUN;\n", " \n", "PROC PRINT DATA = work.date;\n", "RUN;\n", "\n", "PROC PRINT DATA = work.date;\n", " format currentdate mmddyy10.;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The value of current date will change depending on when you submit this program. I submitted this program in 10/17/2020 and received the value 22205 or 22205 days since January 1, 1960.\n", "\n", "## Data I/O\n", "\n", "1. Use PROC IMPORT to read in the Youth Tobacco study, Youth_Tobacco_Survey_YTS_Data.csv and name it youth." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsYEARLocationAbbrLocationDescTopicTypeTopicDescMeasureDescDataSourceResponseData_Value_UnitData_Value_TypeData_ValueData_Value_Footnote_SymbolData_Value_FootnoteData_Value_Std_ErrLow_Confidence_LimitHigh_Confidence_LimitSample_SizeGenderRaceAgeEducationGeoLocationTopicTypeIdTopicIdMeasureIdStratificationID1StratificationID2StratificationID3StratificationID4SubMeasureIDDisplayOrder
12015AZArizonaTobacco Use - Survey DataCessation (Youth)Percent of Current Smokers Who Want to QuitYTS  %Percentage.*Data in these cells have been suppressed because of a small sample size....OverallAll RacesAll AgesMiddle School(34.865970280000454, -111.76381127699972)BEH105BEH170CES1GEN8AGE6RAC1EDUYTS011
22015AZArizonaTobacco Use - Survey DataCessation (Youth)Percent of Current Smokers Who Want to QuitYTS  %Percentage.*Data in these cells have been suppressed because of a small sample size....MaleAll RacesAll AgesMiddle School(34.865970280000454, -111.76381127699972)BEH105BEH170CES2GEN8AGE6RAC1EDUYTS022
32015AZArizonaTobacco Use - Survey DataCessation (Youth)Percent of Current Smokers Who Want to QuitYTS  %Percentage.*Data in these cells have been suppressed because of a small sample size....FemaleAll RacesAll AgesMiddle School(34.865970280000454, -111.76381127699972)BEH105BEH170CES3GEN8AGE6RAC1EDUYTS033
42015AZArizonaTobacco Use - Survey DataCessation (Youth)Quit Attempt in Past Year Among Current Cigarette SmokersYTS  %Percentage.*Data in these cells have been suppressed because of a small sample size....OverallAll RacesAll AgesMiddle School(34.865970280000454, -111.76381127699972)BEH105BEH169QUA1GEN8AGE6RAC1EDUYTS044
52015AZArizonaTobacco Use - Survey DataCessation (Youth)Quit Attempt in Past Year Among Current Cigarette SmokersYTS  %Percentage.*Data in these cells have been suppressed because of a small sample size....MaleAll RacesAll AgesMiddle School(34.865970280000454, -111.76381127699972)BEH105BEH169QUA2GEN8AGE6RAC1EDUYTS055
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile = \"/folders/myfolders/SAS_Notes/data/Youth_Tobacco_Survey_YTS_Data.csv\"\n", " out = youth dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = 500;\n", "RUN;\n", "\n", "PROC PRINT data = youth (OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Download an Excel version of the Monuments dataset, Monuments.xlsx, from CANVAS. Use PROC IMPORT to read in the dataset and call the output mon." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnamezipCodeneighborhoodcouncilDistrictpoliceDistrictLocation 1
1James Cardinal Gibbons21201Downtown11CENTRAL408 CHARLES ST\r", "\r\n", "Baltimore, MD\r", "\r\n", "
2The Battle Monument21202Downtown11CENTRAL  
3Negro Heroes of the U.S Monument21202Downtown11CENTRAL  
4Star Bangled Banner21202Downtown11CENTRAL100 HOLLIDAY ST\r", "\r\n", "Baltimore, MD\r", "\r\n", "
5Flame at the Holocaust Monument21202Downtown11CENTRAL50 MARKET PL\r", "\r\n", "Baltimore, MD\r", "\r\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile = \"/folders/myfolders/SAS_Notes/data/Monuments.xlsx\"\n", " out = mon dbms = xlsx replace;\n", " getnames = yes;\n", "RUN;\n", "\n", "PROC PRINT data = mon (OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Write a DATA step that will read in the following column formatted data using DATALINES and print the results using PROC PRINT. This dataset contains information on esophageal cancer in Ille-et-Vilaine, France. The variables are\n", "\n", " * Age group\n", " * Alcohol consumption\n", " * Tobacco consumption\n", " * Number of Cases\n", " * Number of Controls\n", "\n", "Write a DATA step to read in the data and print the results. Note: the first line is to help you count the columns. Each number corresponds to a multiple of 5.\n", "\n", "
\n",
    "----1----2----3----4----5----6----7----8\n",
    "75+ 0-39g/day 0-9g/day      1        18\n",
    "75+ 0-39g/day    10-19      2         6\n",
    "75+ 0-39g/day      30+      1         3\n",
    "75+     40-79 0-9g/day      2         5\n",
    "75+     40-79    10-19      1         3\n",
    "75+     40-79    20-29      0         3\n",
    "75+     40-79      30+      1         1\n",
    "75+    80-119 0-9g/day      1         1\n",
    "75+    80-119    10-19      1         1\n",
    "75+      120+ 0-9g/day      2         2\n",
    "75+      120+    10-19      1         1\n",
    "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SAS Connection established. Subprocess id is 2198\n", "\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsaggrpalcsmokencasesncontrl
175+0-39g/day0-9g/day118
275+0-39g/day10-1926
375+0-39g/day30+13
475+40-790-9g/day25
575+40-7910-1913
675+40-7920-2903
775+40-7930+11
875+80-1190-9g/day11
975+80-11910-1911
1075+120+0-9g/day22
1175+120+10-1911
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA eso;\n", " INPUT aggrp $ 1-3 alc $ 5-13 smoke $ 15-22 ncases 29 ncontrl 38-39;\n", " DATALINES;\n", "75+ 0-39g/day 0-9g/day 1 18\n", "75+ 0-39g/day 10-19 2 6\n", "75+ 0-39g/day 30+ 1 3\n", "75+ 40-79 0-9g/day 2 5\n", "75+ 40-79 10-19 1 3\n", "75+ 40-79 20-29 0 3\n", "75+ 40-79 30+ 1 1\n", "75+ 80-119 0-9g/day 1 1\n", "75+ 80-119 10-19 1 1\n", "75+ 120+ 0-9g/day 2 2\n", "75+ 120+ 10-19 1 1\n", ";\n", "RUN;\n", "\n", "PROC PRINT DATA=eso;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SAS Variables and Assignment Statements\n", "\n", "1. Use the following SAS data step to create the new variables `Grade` and `Course` defined below:\n", "\n", "
\n",
    "data school;\n",
    "   input Age Quiz : $1. Midterm Final;\n",
    "   /* Add you statements here */\n",
    "datalines;\n",
    "12 A 92 95\n",
    "12 B 88 88\n",
    "13 C 78 75\n",
    "13 A 92 93\n",
    "12 F 55 62\n",
    "13 B 88 82\n",
    ";\n",
    "
\n", "\n", "Using If-Then-Else statements, compute two new variables as follows: \n", "\n", "* Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.\n", "* The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75, D = 70, and F = 65. Using this information, compute a course grade (Course) as a weighted average of the Quiz (20\\%), Midterm (30\\%) and Final (50\\%)." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsAgeQuizMidtermFinalGradeCourse
112A9295694.1
212B8888687.4
313C7875875.9
413A9293893.1
512F5562660.5
613B8882884.4
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data school;\n", " input Age Quiz : $1. Midterm Final;\n", " IF Age = 12 THEN Grade = 6;\n", " ELSE IF AGE = 13 THEN Grade = 8;\n", " IF quiz = \"A\" THEN Course = 0.2*95 + 0.3*midterm + 0.5*final;\n", " ELSE IF quiz = \"B\" THEN Course = 0.2*85 + 0.3*midterm + 0.5*final;\n", " ELSE IF quiz = \"C\" THEN Course = 0.2*75 + 0.3*midterm + 0.5*final;\n", " ELSE IF quiz = \"D\" THEN Course = 0.2*70 + 0.3*midterm + 0.5*final;\n", " ELSE IF quiz = \"F\" THEN Course = 0.2*65 + 0.3*midterm + 0.5*final;\n", "datalines;\n", "12 A 92 95\n", "12 B 88 88\n", "13 C 78 75\n", "13 A 92 93\n", "12 F 55 62\n", "13 B 88 82\n", ";\n", "RUN;\n", "\n", "PROC PRINT DATA = school;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Working with Your Data\n", "\n", "1. Create a temporary SAS dataset called `heart` from the sashelp.heart dataset. The library sashelp if a permanent library that comes with SAS." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
1348  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
1348! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
1349
1350 DATA heart;
1351 SET sashelp.heart;
1352 RUN;
NOTE: There were 5209 observations read from the data set SASHELP.HEART.
NOTE: The data set WORK.HEART has 5209 observations and 17 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

1353
1354 ods html5 (id=saspy_internal) close;ods listing;

1355
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart;\n", " SET sashelp.heart;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. View the description of the `heart` dataset by running PROC CONTENTS. How many observations (rows) and variables (columns) are in the dataset?" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEARTObservations5209
Member TypeDATAVariables17
EngineV9Indexes0
Created10/17/2020 21:35:04Observation Length168
Last Modified10/17/2020 21:35:04Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages14
First Data Page1
Max Obs per Page389
Obs in First Data Page365
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671720
Access Permissionrw-r--r--
Owner Namesasdemo
File Size960KB
File Size (bytes)983040
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenLabel
12AgeAtDeathNum8Age at Death
5AgeAtStartNum8Age at Start
3AgeCHDdiagNum8Age CHD Diagnosed
15BP_StatusChar7Blood Pressure Status
14Chol_StatusChar10Cholesterol Status
13CholesterolNum8 
2DeathCauseChar26Cause of Death
8DiastolicNum8 
6HeightNum8 
10MRWNum8Metropolitan Relative Weight
4SexChar6 
11SmokingNum8 
17Smoking_StatusChar17Smoking Status
9SystolicNum8 
7WeightNum8 
16Weight_StatusChar11Weight Status
1conditionChar5 
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC CONTENTS DATA = heart;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 5209 observations (rows) and 17 variables (columns).\n", "\n", "3. Use the RENAME= option to rename `Status` to `condition`. Show the variables table from PROC CONTENTS to see that the name has changed." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEART2Observations5209
Member TypeDATAVariables17
EngineV9Indexes0
Created10/17/2020 21:39:47Observation Length168
Last Modified10/17/2020 21:39:47Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages14
First Data Page1
Max Obs per Page389
Obs in First Data Page365
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart2.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671720
Access Permissionrw-r--r--
Owner Namesasdemo
File Size960KB
File Size (bytes)983040
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenLabel
12AgeAtDeathNum8Age at Death
5AgeAtStartNum8Age at Start
3AgeCHDdiagNum8Age CHD Diagnosed
15BP_StatusChar7Blood Pressure Status
14Chol_StatusChar10Cholesterol Status
13CholesterolNum8 
2DeathCauseChar26Cause of Death
8DiastolicNum8 
6HeightNum8 
10MRWNum8Metropolitan Relative Weight
4SexChar6 
11SmokingNum8 
17Smoking_StatusChar17Smoking Status
9SystolicNum8 
7WeightNum8 
16Weight_StatusChar11Weight Status
1conditionChar5 
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart2 (RENAME = (status = condition));\n", " SET heart;\n", "RUN;\n", "\n", "PROC CONTENTS data = heart2;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Create a subset of `heart` called `heart_sub` that contains only the columns `height`, `weight`, and `diastolic`. What are the dimensions of `heart_sub`? Print the first five rows of `heart_sub`." ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEART_SUBObservations5209
Member TypeDATAVariables3
EngineV9Indexes0
Created10/17/2020 21:39:50Observation Length24
Last Modified10/17/2020 21:39:50Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages2
First Data Page1
Max Obs per Page2714
Obs in First Data Page2625
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671721
Access Permissionrw-r--r--
Owner Namesasdemo
File Size192KB
File Size (bytes)196608
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLen
3DiastolicNum8
1HeightNum8
2WeightNum8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsHeightWeightDiastolic
162.5014078
259.7519492
362.2513290
465.7515880
566.0015676
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart_sub;\n", " SET heart;\n", " KEEP height weight diastolic;\n", "RUN;\n", "\n", "PROC CONTENTS data = heart_sub;\n", "RUN;\n", "\n", "PROC PRINT data = heart_sub(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The new dataset `heart_sub` contains 5209 rows and 3 columns.\n", "\n", "5. Subset the patients (rows) of `heart` that have a diastolic blood pressure (diastolic) more than 90 into a dataset called `heart_sub2`. How many are there? Print the first five rows of `heart_sub2`." ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEART_SUB2Observations1366
Member TypeDATAVariables17
EngineV9Indexes0
Created10/17/2020 21:39:55Observation Length168
Last Modified10/17/2020 21:39:55Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages4
First Data Page1
Max Obs per Page389
Obs in First Data Page365
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub2.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671723
Access Permissionrw-r--r--
Owner Namesasdemo
File Size320KB
File Size (bytes)327680
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenLabel
12AgeAtDeathNum8Age at Death
5AgeAtStartNum8Age at Start
3AgeCHDdiagNum8Age CHD Diagnosed
15BP_StatusChar7Blood Pressure Status
14Chol_StatusChar10Cholesterol Status
13CholesterolNum8 
2DeathCauseChar26Cause of Death
8DiastolicNum8 
6HeightNum8 
10MRWNum8Metropolitan Relative Weight
4SexChar6 
11SmokingNum8 
17Smoking_StatusChar17Smoking Status
1StatusChar5 
9SystolicNum8 
7WeightNum8 
16Weight_StatusChar11Weight Status
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsStatusDeathCauseAgeCHDdiagSexAgeAtStartHeightWeightDiastolicSystolicMRWSmokingAgeAtDeathCholesterolChol_StatusBP_StatusWeight_StatusSmoking_Status
1DeadCancer.Female4159.7519492144183057181DesirableHighOverweightNon-smoker
2Alive  .Female5861.75131921761170.196DesirableHighOverweightNon-smoker
3Alive  .Female4267.75162961381191.200BorderlineHighOverweightLight (1-5)
4DeadCoronary Heart Disease71Female4960.50153110196140573221BorderlineHighOverweightLight (1-5)
5Alive  43Male3366.501721061461270.247HighHighOverweightNon-smoker
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart_sub2;\n", " SET heart;\n", " WHERE diastolic > 90;\n", "RUN;\n", "\n", "PROC CONTENTS data = heart_sub2;\n", "RUN;\n", "\n", "PROC PRINT data = heart_sub2(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 1366 patients with a diastolic blood pressure above 90.\n", "\n", "6. Subset the patients (rows) of `heart` that have a diastolic blood pressure less than or equal to 90 and have a weight more than 180 pounds into a dataset called `heart_sub3`. How many are there? Print the first five rows of `heart_sub3`." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEART_SUB3Observations481
Member TypeDATAVariables17
EngineV9Indexes0
Created10/17/2020 21:43:36Observation Length168
Last Modified10/17/2020 21:43:36Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages2
First Data Page1
Max Obs per Page389
Obs in First Data Page365
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub3.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671722
Access Permissionrw-r--r--
Owner Namesasdemo
File Size192KB
File Size (bytes)196608
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenLabel
12AgeAtDeathNum8Age at Death
5AgeAtStartNum8Age at Start
3AgeCHDdiagNum8Age CHD Diagnosed
15BP_StatusChar7Blood Pressure Status
14Chol_StatusChar10Cholesterol Status
13CholesterolNum8 
2DeathCauseChar26Cause of Death
8DiastolicNum8 
6HeightNum8 
10MRWNum8Metropolitan Relative Weight
4SexChar6 
11SmokingNum8 
17Smoking_StatusChar17Smoking Status
1StatusChar5 
9SystolicNum8 
7WeightNum8 
16Weight_StatusChar11Weight Status
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsStatusDeathCauseAgeCHDdiagSexAgeAtStartHeightWeightDiastolicSystolicMRWSmokingAgeAtDeathCholesterolChol_StatusBP_StatusWeight_StatusSmoking_Status
1Alive  .Male3571.00194681321240.211BorderlineNormalOverweightNon-smoker
2Alive  .Female5067.501858815013615.228BorderlineHighOverweightModerate (6-15)
3Alive  .Male4272.25182781361130.221BorderlineNormalOverweightNon-smoker
4Alive  68Male4070.00189781241240.319HighNormalOverweightNon-smoker
5Alive  68Male4070.001957613212820.205BorderlineNormalOverweightHeavy (16-25)
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart_sub3;\n", " SET heart;\n", " WHERE diastolic <= 90 AND weight > 180;\n", "RUN;\n", "\n", "PROC CONTENTS data = heart_sub3;\n", "RUN;\n", "\n", "PROC PRINT data = heart_sub3(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 481 patients with a diastolic blood pressure less than or equal to 90 and with a weight above 180 pounds.\n", "\n", "7. Create a subset of `heart` that only contains the columns `weight`, `systolic`, and `cholesterol` for those that are heavy smokers smoker (Smoking_Status = \"Heavy (16-25)\"). Save this to `heart_sub`. What are the dimensions of the dataset? Print the first five rows of `heart_sub`." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.HEART_SUBObservations1046
Member TypeDATAVariables3
EngineV9Indexes0
Created10/17/2020 21:47:19Observation Length24
Last Modified10/17/2020 21:47:19Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page2714
Obs in First Data Page1046
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/heart_sub.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671725
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLen
3CholesterolNum8
2SystolicNum8
1WeightNum8
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsWeightSystolicCholesterol
1156110281
2195132205
3181144223
4152144263
5196128.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA heart_sub;\n", " SET heart;\n", " WHERE Smoking_Status = \"Heavy (16-25)\";\n", " KEEP weight systolic cholesterol;\n", "RUN;\n", "\n", "PROC CONTENTS data = heart_sub;\n", "RUN;\n", "\n", "PROC PRINT data = heart_sub(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The new heart_sub dataset contains 1046 rows and three columns.\n", "\n", "8. Reorder the rows of `heart_sub` in decreasing order by cholesterol. Print the first 5 rows of the sorted dataset." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsWeightSystolicCholesterol
1169155568
2205154534
3194150429
4104122392
5137185368
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SORT data = heart_sub out = heart_sub_srt;\n", " BY descending cholesterol;\n", "RUN;\n", "\n", "PROC PRINT data = heart_sub_srt(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Summarization\n", "\n", "1. How many bike lanes are currently in Baltimore? You can assume that each observation/row is a different bike lane." ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.BIKEObservations1631
Member TypeDATAVariables9
EngineV9Indexes0
Created10/17/2020 22:01:39Observation Length136
Last Modified10/17/2020 22:01:39Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages4
First Data Page1
Max Obs per Page481
Obs in First Data Page459
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/bike.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671721
Access Permissionrw-r--r--
Owner Namesasdemo
File Size320KB
File Size (bytes)327680
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
3blockChar29$29.$29.
9dateInstalledNum8BEST12.BEST32.
8lengthNum8BEST12.BEST32.
2nameChar20$20.$20.
5numLanesNum8BEST12.BEST32.
6projectChar25$25.$25.
7routeChar13$13.$13.
1subTypeChar6$6.$6.
4typeChar15$15.$15.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/Bike_Lanes.csv\"\n", " out = bike dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC CONTENTS data = bike;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 1631 rows.\n", "\n", "2. How many (a) feet and (b) miles of total bike lanes are currently in Baltimore? (The length variable provides the length in feet.)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : length
Sum
439447.59
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC MEANS data = bike sum;\n", " VAR length;\n", "RUN;" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obslength_mi
183.2287
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA miles;\n", " length_mi = 439447.59 / 5280;\n", "RUN;\n", "\n", "PROC PRINT data = miles;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. How many types (type) bike lanes are there? Which type (a) occurs the most and (b) has the longest average bike lane length?" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
Frequency Missing = 9
BIKE BOULEVARD493.02493.02
BIKE LANE62138.2967041.31
CONTRAFLOW130.8068342.11
SHARED BUS BIKE392.4072244.51
SHARROW58936.31131180.83
SIDEPATH70.43131881.26
SIGNED ROUTE30418.741622100.00
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obstype_TYPE__FREQ_length_mean
1SIDEPATH17665.87751019
2BIKE LANE1621300.24183502
3SHARED BUS BIKE139276.66575695
4  01622269.48876975
5SIGNED ROUTE1304263.57986593
6SHARROW1589243.8718592
7BIKE BOULEVARD149197.27815495
8CONTRAFLOW113136.46646088
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FREQ data = bike;\n", " TABLE type;\n", "RUN;\n", "\n", "PROC MEANS data = bike noprint;\n", " CLASS type;\n", " VAR length;\n", " OUTPUT out = type_mean mean = length_mean;\n", "RUN;\n", "\n", "PROC SORT data = type_mean;\n", " BY descending length_mean;\n", "RUN;\n", "\n", "PROC PRINT DATA = type_mean;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 7 bike lane types (8 if you include missing as a category). The lane type that occurrs the most often is bike lane with a count of 621. The lane type sidepath has the longest average length at 665.88 ft. \n", "\n", "4. How many different projects (project) do the bike lanes fall into? Which project category has the longest average bike lane length?" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
projectFrequencyPercentCumulative
Frequency
Cumulative
Percent
Frequency Missing = 74
CHARM CITY CIRCULATOR392.50392.50
COLLEGETOWN33921.7737824.28
COLLEGETOWN NETWORK130.8339125.11
ENGINEERING CONSTRUCTION120.7740325.88
GUILFORD AVE BIKE BLVD493.1545229.03
MAINTENANCE40.2645629.29
OPERATION ORANGE CONE45829.4291458.70
PARK HEIGHTS BIKE NETWORK17211.05108669.75
PLANNING TRAFFIC181.16110470.91
SOUTHEAST BIKE NETWORK32320.75142791.65
TRAFFIC513.28147894.93
TRAFFIC CALMING795.071557100.00
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsproject_TYPE__FREQ_length_mean
1MAINTENANCE141942.1522856
2ENGINEERING CONSTRUCTION112512.09762483
3TRAFFIC151419.52878809
4COLLEGETOWN1339320.68363126
5PARK HEIGHTS BIKE NETWORK1172283.22519367
6CHARM CITY CIRCULATOR139276.66575695
7  01557272.0534698
8TRAFFIC CALMING179268.53137627
9OPERATION ORANGE CONE1458250.07840273
10COLLEGETOWN NETWORK113213.63728443
11SOUTHEAST BIKE NETWORK1323210.82827388
12PLANNING TRAFFIC118209.42893827
13GUILFORD AVE BIKE BLVD149197.27815495
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FREQ data = bike;\n", " TABLE project;\n", "RUN;\n", "\n", "PROC MEANS data = bike noprint;\n", " CLASS project;\n", " VAR length;\n", " OUTPUT out = project_mean mean = length_mean;\n", "RUN;\n", "\n", "PROC SORT data = project_mean;\n", " BY descending length_mean;\n", "RUN;\n", "\n", "PROC PRINT DATA = project_mean;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 12 project types (13 if you include missing as a project type). The maintenance project has the longest average bike lane length at 1942.15 ft.\n", "\n", "5. What was the average bike lane length per year that they were installed? (Be sure to first set dateInstalled to . if it is equal to zero.)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : length
dateInstalledN ObsMean
200621469.35
2007368309.8903287
2008206249.4810027
200986407.2947463
2010625245.6015777
2011101232.8127593
2012107271.4321825
201310290.4263702
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA bike;\n", " SET bike;\n", " IF dateinstalled = 0 THEN dateinstalled = .;\n", "RUN;\n", "\n", "PROC MEANS data = bike mean;\n", " CLASS dateinstalled;\n", " VAR length;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. (a) Numerically and (b) graphically describe the distribution of bike lane lengths (length)." ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The UNIVARIATE Procedure

\n", "

Variable: length

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Moments
N1631Sum Weights1631
Mean269.434449Sum Observations439447.587
Std Deviation277.706646Variance77120.9813
Skewness5.1297861Kurtosis43.3888947
Uncorrected SS244109518Corrected SS125707200
Coeff Variation103.070208Std Error Mean6.87637089
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Basic Statistical Measures
LocationVariability
Mean269.4344Std Deviation277.70665
Median200.3027Variance77121
Mode.Range3749
  Interquartile Range217.16479
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt39.18265Pr > |t|<.0001
SignM815Pr >= |M|<.0001
Signed RankS664632.5Pr >= |S|<.0001
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Quantiles (Definition 5)
LevelQuantile
100% Max3749.3226
99%1341.8291
95%664.2305
90%502.2232
75% Q3341.1493
50% Median200.3027
25% Q1123.9845
10%67.8206
5%45.3655
1%14.8383
0% Min0.0000
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Extreme Observations
LowestHighest
ValueObsValueObs
0.0000042536.731284
5.513725232715.201344
6.857127242916.53903
8.162434903467.68904
9.719145953749.323
\n", "
\n", "
\n", "
\n", "\"Plots\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC UNIVARIATE data = bike plots;\n", " VAR length;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Describe the distribution of bike lane lengths numerically and graphically after stratifying them by (a) type and then by (b) number of lanes (numLanes)." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : length
typeN ObsMeanStd Dev
BIKE BOULEVARD49197.2781549113.1318019
BIKE LANE621300.2418350314.6605599
CONTRAFLOW13136.466460948.5165725
SHARED BUS BIKE39276.6657570140.1811711
SHARROW589243.8718592188.4222254
SIDEPATH7665.8775102619.3894227
SIGNED ROUTE304263.5798659347.8501541
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : length
numLanesN ObsMeanStd Dev
021308.3767969201.5548096
1900277.9016282317.8281529
2710257.5495894218.6179448
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC MEANS data=bike mean std;\n", " CLASS type;\n", " VAR length;\n", "RUN;\n", "\n", "PROC SGPLOT data = bike;\n", " VBOX length / category = type;\n", "RUN;\n", "\n", "PROC MEANS data=bike mean std;\n", " CLASS numlanes;\n", " VAR length;\n", "RUN;\n", "\n", "PROC SGPLOT data = bike;\n", " VBOX length / category = numlanes;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SAS Formats and Dates\n", "\n", "1. Use proc freq to make a table of the different bike lane types." ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
Frequency Missing = 9
BIKE BOULEVARD493.02493.02
BIKE LANE62138.2967041.31
CONTRAFLOW130.8068342.11
SHARED BUS BIKE392.4072244.51
SHARROW58936.31131180.83
SIDEPATH70.43131881.26
SIGNED ROUTE30418.741622100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/Bike_Lanes.csv\"\n", " out = bike dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC FREQ data = bike;\n", " TABLE type;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Create a format that changes changes the bike lane types by mapping \"SIDEPATH\", \"BIKE BOULEVARD\", and \"BIKE LANE\" to itself but does not provide a value for the other types. Apply this format to type and make a table of type using PROC FREQ. What happens to the other bike lane types?" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
Frequency Missing = 954
BIKE BOULEVARD497.24497.24
BIKE LANE62191.7367098.97
SIDEPATH71.03677100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FORMAT;\n", " VALUE $typefmt \"SIDEPATH\" = \"SIDEPATH\"\n", " \"BIKE BOULEVARD\" = \"BIKE BOULEVARD\"\n", " \"BIKE LANE\" = \"BIKE LANE\"\n", " OTHER = \" \";\n", "RUN;\n", "\n", "DATA bike2;\n", " SET bike;\n", " FORMAT type $typefmt.;\n", "RUN;\n", "\n", "PROC FREQ data = bike2;\n", " TABLE type;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have translated all but BIKE BOULEVARD, BIKE LANE, and SIDEPATE to missing, so we only get these three listed as categories in the table.\n", "\n", "3. Create a format that changes the bike lane types to be \"CONTRAFLOW\", \"SHARED BUS BIKE\", \"SHARROW\", \"SIGNED ROUTE\", or \"OTHER\" if the lane type is anything else." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
Frequency Missing = 9
OTHER67741.7467741.74
CONTRAFLOW130.8069042.54
SHARED BUS BIKE392.4072944.94
SHARROW58936.31131881.26
SIGNED ROUTE30418.741622100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FORMAT;\n", " VALUE $type2fmt \"CONTRAFLOW\" = \"CONTRAFLOW\"\n", " \"SHARED BUS BIKE\" = \"SHARED BUS BIKE\"\n", " \"SHARROW\" = \"SHARROW\"\n", " \"SIGNED ROUTE\" = \"SIGNED ROUTE\"\n", " \"SIDEPATH\", \"BIKE BOULEVARD\", \"BIKE LANE\" = \"OTHER\";\n", "RUN;\n", "\n", "DATA bike3;\n", " SET bike;\n", " FORMAT type $type2fmt.;\n", "RUN;\n", "\n", "PROC FREQ data = bike3;\n", " TABLE type;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Write a datatep that can be used to read in the following small dataset with three dates and one time. Then print the dataset using readable date formats.\n", "
\n",
    "\n",
    "DATA temp;\n",
    "  INPUT date1 /*informat1*/ +1 date2 /*informat2*/ +1\n",
    "        date3 /*informat3*/ @32 time /*informat4*/;\n",
    "  DATALINES;\n",
    "2014/02/14 06Jan2018 4/5/2016   03:2:22\n",
    ";\n",
    "RUN;\n",
    "\n",
    "
" ] }, { "cell_type": "code", "execution_count": 62, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsdate1date2date3time
102/14/201401/06/201804/05/20163:02:02
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 62, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA temp;\n", "INPUT date1 yymmdd10. +1 date2 DATE9. +1\n", " date3 mmddyy8. @32 time time7.;\n", "DATALINES;\n", "2014/02/14 06Jan2018 4/5/2016 03:2:22\n", ";\n", "RUN;\n", "\n", "PROC PRINT data = temp;\n", " FORMAT date1 date2 date3 mmddyy10. time time7.;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Character Functions\n", "\n", "1. Read in the Baltimore city employee salaries dataset, Baltimore_City_Employee_Salaries_FY2015.csv, using PROC IMPORT." ] }, { "cell_type": "code", "execution_count": 64, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnameJobTitleAgencyIDAgencyHireDateAnnualSalaryGrossPay
1Aaron,Patricia GFacilities/Office Services IIA03031OED-Employment Dev (031)10/24/197955,31453,626
2Aaron,Petra LASSISTANT STATE'S ATTORNEYA29045States Attorneys Office (045)09/25/200674,00073,000
3Abaineh,Yohannes TEPIDEMIOLOGISTA65026HLTH-Health Department (026)07/23/200964,50064,404
4Abbene,Anthony MPOLICE OFFICERA99005Police Department (005)07/24/201346,30959,620
5Abbey,EmmanuelCONTRACT SERV SPEC IIA40001M-R Info Technology (001)05/01/201360,06054,060
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 64, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile = \"/folders/myfolders/SAS_Notes/data/Baltimore_City_Employee_Salaries_FY2015.csv\"\n", " out = sal dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC PRINT data = sal(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Make a temporary SAS dataset called `health_sal` using the salaries dataset, with only the agencies (`JobTitle`) of those with \"fire\" (anywhere in the job title), if any, in the name. Be sure to ignore case in the string matching for 'fire'." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsJobTitle
1EMT Firefighter Suppression
2Fire Pump Operator Supp ALS
3Firefighter/Paramedic Suppress
4Fire Pump Operator Supp ALS
5Firefighter Suppression
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 68, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA health_sal;\n", " SET sal;\n", " WHERE index(lowcase(JobTitle), 'fire') > 0;\n", "RUN;\n", "\n", "PROC PRINT data=health_sal(OBS=5);\n", " VAR JobTitle;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Make a temporary data set called `trans` which contains only agencies that contain “TRANS” (case sensitive)." ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsJobTitle
1TRANSPORTATION ASSOC II
2TAX TRANSFER CLERK I
3GENL SUPT TRANSPORTATION MAINT
4TAX TRANSFER CLERK I
5TAX TRANSFER CLERK I
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA trans;\n", " SET sal;\n", " WHERE index(JobTitle, 'TRANS') > 0;\n", "RUN;\n", "\n", "PROC PRINT data=trans(OBS=5);\n", " VAR JobTitle;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. What is/are the profession(s) of people who have “abra” in their name for Baltimore’s Salaries? Case should be ignored." ] }, { "cell_type": "code", "execution_count": 71, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnameJobTitle
26Abraham,Donta DLABORER (Hourly)
27Abraham,SanthoshACCOUNTANT I
28Abraham,Sharon MHOUSING INSPECTOR
29Abrahams,Brandon APOLICE OFFICER TRAINEE
30Abrams,MariaOFFICE SERVICES ASSISTANT II S
31Abrams,MaxineCOLLECTIONS REPRESENTATIVE I
32Abrams,TerryRECREATION ARTS INSTRUCTOR
958Bey,AbrahamPROCUREMENT SPECIALIST II
3761Elgamil,Abraham DAUDITOR SUPV
4545Gatto,Abraham MPOLICE OFFICER
11107Schwartz,Abraham MGENERAL COUNSEL
12789Velez,Abraham LPOLICE OFFICER (EID)
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 71, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC PRINT data=sal;\n", " VAR name JobTitle;\n", " WHERE index(lowcase(name), 'abra') > 0;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Create a subset of the salaries dataset that only includes the Police Department, Fire Department and Sheriff’s Office. Use the Agency variable with (case sensitive) string matching. Call this `emer`. How many employees are in this new dataset?" ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsAgency
1Police Department (005)
2Fire Department (120)
3Police Department (127)
4Sheriff's Office (410)
5Fire Department (465)
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA emer;\n", " SET sal;\n", " WHERE index(agency, \"Sheriff's Office\") > 0 OR\n", " index(agency, \"Police Department\") > 0 OR\n", " index(agency, \"Fire Department\") > 0;\n", "RUN;\n", "\n", "PROC PRINT data=emer(OBS=5);\n", " VAR agency;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Create a variable called `dept` in the `emer` data set by extracting the the part of the `Agency` variable up until 'ment' or 'ice'. E.g. we want to extract all characters up until ment or ice (we can group in regex using parentheses) and then discard the rest. Use:\n", "
\n",
    "re = prxparse('/.*(ment|ice)/');\n",
    "call prxsubstr(re, Agency, pos, len);\n",
    "dept = substr(Agency, pos, len);\n",
    "
" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
deptFrequencyPercentCumulative
Frequency
Cumulative
Percent
Fire Department161532.81161532.81
Police Department309762.92471295.73
Sheriff's Office2104.274922100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA emer;\n", " SET emer;\n", " re = prxparse('/.*(ment|ice)/');\n", " call prxsubstr(re, Agency, pos, len);\n", " dept = substr(Agency, pos, len);\n", "RUN;\n", "\n", "PROC FREQ data = emer;\n", " TABLE dept;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Cleaning\n", "\n", "1. Read in the bike lanes dataset Bike_Lanes.csv using PROC IMPORT and call it bike." ] }, { "cell_type": "code", "execution_count": 78, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObssubTypenameblocktypenumLanesprojectroutelengthdateInstalled
1      BIKE BOULEVARD1GUILFORD AVE BIKE BLVD  435.737877020
2      SIDEPATH1  NORTHERN1024.67462492010
3      SIGNED ROUTE1SOUTHEAST BIKE NETWORK  3749.32263772010
4  HUNTINGDON PATH  SIDEPATH1    00
5STCLNEDMONDSON AVE5300 BLK EDMONDSON AVEBIKE LANE1OPERATION ORANGE CONE  180.925464892011
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 78, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/Bike_Lanes.csv\"\n", " out = bike dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC PRINT DATA = bike(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. How many rows are are in the bike dataset? How many are complete cases? Hint: Use cmiss(of _ALL_) in a dataset to create a 0/1 variable to indicate if it complete or not and then use PROC MEANS to sum this 0/1 variable." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.BIKEObservations1631
Member TypeDATAVariables9
EngineV9Indexes0
Created10/18/2020 11:55:08Observation Length136
Last Modified10/18/2020 11:55:08Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages4
First Data Page1
Max Obs per Page481
Obs in First Data Page459
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/bike.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671716
Access Permissionrw-r--r--
Owner Namesasdemo
File Size320KB
File Size (bytes)327680
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
3blockChar29$29.$29.
9dateInstalledNum8BEST12.BEST32.
8lengthNum8BEST12.BEST32.
2nameChar20$20.$20.
5numLanesNum8BEST12.BEST32.
6projectChar25$25.$25.
7routeChar13$13.$13.
1subTypeChar6$6.$6.
4typeChar15$15.$15.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 79, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC CONTENTS data = bike;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 1631 rows in the bike data set." ] }, { "cell_type": "code", "execution_count": 87, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObssubTypenameblocktypenumLanesprojectroutelengthdateInstalledcomplete
1      BIKE BOULEVARD1GUILFORD AVE BIKE BLVD  435.7378770200
2      SIDEPATH1  NORTHERN1024.674624920100
3      SIGNED ROUTE1SOUTHEAST BIKE NETWORK  3749.322637720100
4  HUNTINGDON PATH  SIDEPATH1    000
5STCLNEDMONDSON AVE5300 BLK EDMONDSON AVEBIKE LANE1OPERATION ORANGE CONE  180.9254648920110
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : complete
Sum
257.0000000
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 87, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA bike;\n", " SET bike;\n", " IF cmiss(of _ALL_) > 0 THEN complete = 0;\n", " ELSE complete = 1;\n", "RUN;\n", "\n", "PROC PRINT data = bike(OBS=5);\n", "RUN;\n", "\n", "PROC MEANS data = bike sum;\n", " VAR complete;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Filter rows of bike that are NOT missing the `route` variable, assign this to the object `have_route`. Create a frequency table of the `subType` variable using PROC FREQ, including the missing `subType`s." ] }, { "cell_type": "code", "execution_count": 90, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
subTypeFrequencyPercentCumulative
Frequency
Cumulative
Percent
 10.2810.28
STRALY30.8341.10
STRPRD35898.90362100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 90, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA have_route;\n", " SET bike;\n", " WHERE NOT MISSING(route);\n", "RUN;\n", "\n", "PROC FREQ data = have_route;\n", " TABLE subType / missing;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data Manipulation\n", "\n", "1. Read in the Bike_Lanes_Wide.csv dataset and call is `wide` using PROC IMPORT. Print the first few rows of the dataset." ] }, { "cell_type": "code", "execution_count": 97, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.WIDEObservations134
Member TypeDATAVariables9
EngineV9Indexes0
Created10/18/2020 16:40:52Observation Length141
Last Modified10/18/2020 16:40:52Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page464
Obs in First Data Page134
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/wide.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671707
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
2BIKE BOULEVARDChar11$11.$11.
3BIKE LANEChar11$11.$11.
4CONTRAFLOWChar11$11.$11.
9NAChar11$11.$11.
5SHARED BUS BIKEChar11$11.$11.
6SHARROWChar11$11.$11.
7SIDEPATHChar11$11.$11.
8SIGNED ROUTEChar11$11.$11.
1nameChar53$53.$53.
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The SAS System

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnameBIKE BOULEVARDBIKE LANECONTRAFLOWSHARED BUS BIKESHARROWSIDEPATHSIGNED ROUTENA
1ALBEMARLE STNANANANA109.9141727NA234.9190031NA
2ALICEANNA STNANANANA250.7103035NANANA
3ARGONNE DRNA1312.608373NANANANANANA
4ART MUSEUM DRNANANANANANA277.6579871NA
5AUCHENTOROLY TERRNA1341.82908NANANANANANA
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 97, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/Bike_Lanes_Wide.csv\"\n", " out = wide dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC CONTENTS data = wide;\n", "RUN;\n", "\n", "PROC PRINT DATA = wide(OBS=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Reshape `wide` using either PROC TRANSPOSE or a DATA step. You will need to gather all columns except the name column. Transform into a long dataset with two new columns `lanetype` (the former column names) and `the_length` the data values. In the variable `the_length`, replace 'NA' values with . and convert it to a numeric column." ] }, { "cell_type": "code", "execution_count": 107, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Using a DATA step

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnamelanetypethe_length
1ALBEMARLE STBIKE BOULEVARD.
2ALBEMARLE STBIKE LANE.
3ALBEMARLE STCONTRAFLOW.
4ALBEMARLE STSHARED BUS BIKE.
5ALBEMARLE STSHARROW109.914
6ALBEMARLE STSIDEPATH.
7ALBEMARLE STSIGNED ROUTE234.919
8ALBEMARLE STNA.
9ALICEANNA STBIKE BOULEVARD.
10ALICEANNA STBIKE LANE.
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

Using PROC TRANSPOSE

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnamelanetypethe_length
1ALBEMARLE STBIKE BOULEVARD.
2ALBEMARLE STBIKE LANE.
3ALBEMARLE STCONTRAFLOW.
4ALBEMARLE STSHARED BUS BIKE.
5ALBEMARLE STSHARROW109.914
6ALBEMARLE STSIDEPATH.
7ALBEMARLE STSIGNED ROUTE234.919
8ALBEMARLE STNA.
9ALICEANNA STBIKE BOULEVARD.
10ALICEANNA STBIKE LANE.
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 107, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA long;\n", " SET wide;\n", " array col{*} 'BIKE BOULEVARD'n -- NA;\n", " DO i = 1 TO dim(col);\n", " lanetype = vname(col(i));\n", " IF col(i) = 'NA' THEN the_length = .;\n", " ELSE the_length = INPUT(col(i), 20.7);\n", " OUTPUT;\n", " END;\n", " DROP 'BIKE BOULEVARD'n -- NA i;\n", "RUN;\n", "\n", "PROC PRINT data = long(obs=10);\n", " TITLE 'Using a DATA step';\n", "RUN;\n", "\n", "PROC SORT data=wide;\n", " BY name;\n", "RUN;\n", "\n", "PROC TRANSPOSE data = wide\n", " out = long2 (RENAME = (COL1 = the_length \n", " _NAME_ = lanetype));\n", " BY name;\n", " VAR 'BIKE BOULEVARD'n -- NA;\n", "RUN;\n", "\n", "DATA long2;\n", " SET long2(RENAME = (the_length = length));\n", " IF length = 'NA' THEN the_length = .;\n", " ELSE the_length = INPUT(length, 20.7);\n", " DROP length;\n", "RUN;\n", "\n", "PROC PRINT data = long2(OBS=10);\n", " TITLE 'Using PROC TRANSPOSE';\n", "RUN;\n", "\n", "TITLE;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Read in the roads and crashes .csv files and call them `road` and `crash`." ] }, { "cell_type": "code", "execution_count": 108, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsRoadDistrictLength
1Interstate 65Greenfield262
2Interstate 70Vincennes156
3US-36Crawfordsville139
4US-40Greenfield150
5US-52Crawfordsville172
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsYearRoadN_CrashesVolume
11991Interstate 652540000
21992Interstate 653741000
31993Interstate 654545000
41994Interstate 654645600
51995Interstate 654649000
61996Interstate 655951000
71997Interstate 657652000
81998Interstate 659058000
91999Interstate 659565000
102000Interstate 659574000
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 108, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/roads.csv\"\n", " out = road dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/crashes.csv\"\n", " out = crash dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC PRINT data = road;\n", "RUN;\n", "\n", "PROC PRINT data = crash(OBS=10);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Replace (using `tranwrd`) any hyphens (`-`) with a space in `Road` variable of `crash`. Call this data `crash2`. Table the `Road` variable with PROC FREQ." ] }, { "cell_type": "code", "execution_count": 109, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
RoadFrequencyPercentCumulative
Frequency
Cumulative
Percent
Interstate 2752220.002220.00
Interstate 652220.004440.00
Interstate 702220.006660.00
US 362220.008880.00
US 402220.00110100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 109, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA crash2;\n", " SET crash;\n", " road = tranwrd(road, '-', ' ');\n", "RUN;\n", "\n", "PROC FREQ data = crash2;\n", " TABLE road;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. How many observations are in each of the `crash` and `road` datasets?" ] }, { "cell_type": "code", "execution_count": 110, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.CRASHObservations110
Member TypeDATAVariables4
EngineV9Indexes0
Created10/18/2020 17:00:48Observation Length40
Last Modified10/18/2020 17:00:48Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page1632
Obs in First Data Page110
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/crash.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671741
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
3N_CrashesNum8BEST12.BEST32.
2RoadChar14$14.$14.
4VolumeNum8BEST12.BEST32.
1YearNum8BEST12.BEST32.
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.ROADObservations5
Member TypeDATAVariables3
EngineV9Indexes0
Created10/18/2020 17:00:48Observation Length40
Last Modified10/18/2020 17:00:48Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page1632
Obs in First Data Page5
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/road.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671709
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
2DistrictChar14$14.$14.
3LengthNum8BEST12.BEST32.
1RoadChar13$13.$13.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 110, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC CONTENTS data = crash;\n", "RUN;\n", "\n", "PROC CONTENTS data = road;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The crash dataset has 110 observations and the road datset has 5.\n", "\n", "6. Separate the `Road` column (using `scan`) into (`type` and `number`) in `crash2`. Reassign this to `crash2`. Table `type` from `crash2` using PROC FREQ. Then create a new variable calling it `road_hyphen` using one of the concatenate functions (such as CAT). Unite the `type` and `number` columns using a hyphen (`-`) and then table `road_hyphen` using PROC FREQ." ] }, { "cell_type": "code", "execution_count": 111, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
Interstate6660.006660.00
US4440.00110100.00
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
road_hyphenFrequencyPercentCumulative
Frequency
Cumulative
Percent
Interstate-2752220.002220.00
Interstate-652220.004440.00
Interstate-702220.006660.00
US-362220.008880.00
US-402220.00110100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 111, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA crash2;\n", " SET crash2;\n", " type = scan(road, 1);\n", " number = scan(road, 2);\n", "RUN;\n", "\n", "PROC FREQ data=crash2;\n", " TABLE type;\n", "RUN;\n", "\n", "DATA crash2;\n", " SET crash2;\n", " road_hyphen = CATX('-', type, number);\n", "RUN;\n", "\n", "PROC FREQ data=crash2;\n", " TABLE road_hyphen;\n", "RUN;\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Which and how many years were data collected in the `crash` dataset?" ] }, { "cell_type": "code", "execution_count": 112, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
YearFrequencyPercentCumulative
Frequency
Cumulative
Percent
199154.5554.55
199254.55109.09
199354.551513.64
199454.552018.18
199554.552522.73
199654.553027.27
199754.553531.82
199854.554036.36
199954.554540.91
200054.555045.45
200154.555550.00
200254.556054.55
200354.556559.09
200454.557063.64
200554.557568.18
200654.558072.73
200754.558577.27
200854.559081.82
200954.559586.36
201054.5510090.91
201154.5510595.45
201254.55110100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 112, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FREQ data=crash;\n", " TABLE year;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The crash dataset contains data on the years from 1991 to 2012 for a total of 22 years.\n", "\n", "8. Read in the dataset Bike_Lanes.csv and call it `bike`." ] }, { "cell_type": "code", "execution_count": 113, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObssubTypenameblocktypenumLanesprojectroutelengthdateInstalled
1      BIKE BOULEVARD1GUILFORD AVE BIKE BLVD  435.737877020
2      SIDEPATH1  NORTHERN1024.67462492010
3      SIGNED ROUTE1SOUTHEAST BIKE NETWORK  3749.32263772010
4  HUNTINGDON PATH  SIDEPATH1    00
5STCLNEDMONDSON AVE5300 BLK EDMONDSON AVEBIKE LANE1OPERATION ORANGE CONE  180.925464892011
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 113, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile=\"/folders/myfolders/SAS_Notes/data/Bike_Lanes.csv\"\n", " out = bike dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "PROC PRINT data=bike(obs=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. Keep rows where the record is not missing `type` and not missing `name` and re-assign the output to `bike`." ] }, { "cell_type": "code", "execution_count": 114, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObssubTypenameblocktypenumLanesprojectroutelengthdateInstalled
1      BIKE BOULEVARD1GUILFORD AVE BIKE BLVD  435.737877020
2      SIDEPATH1  NORTHERN1024.67462492010
3      SIGNED ROUTE1SOUTHEAST BIKE NETWORK  3749.32263772010
4  HUNTINGDON PATH  SIDEPATH1    00
5STCLNEDMONDSON AVE5300 BLK EDMONDSON AVEBIKE LANE1OPERATION ORANGE CONE  180.925464892011
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 114, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA bike;\n", " SET bike;\n", " WHERE NOT MISSING(type);\n", "RUN;\n", "\n", "PROC PRINT data=bike(obs=5);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Using PROC MEANS with a BY statement grouping `name` and `type` (i.e for each type within each name), find the `sum` of the `length`. Use an OUTPUT statement to get this summary dataset and only keep the `name`, `type` and sum of the `length` column (after renaming this column `length`). Call this data set `sub`." ] }, { "cell_type": "code", "execution_count": 117, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsnametypelength
1  BIKE BOULEVARD1
2  BIKE LANE1
3  SHARROW3
4  SIDEPATH1
5  SIGNED ROUTE8
6ALBEMARLE STSHARROW8
7ALBEMARLE STSIGNED ROUTE8
8ALICEANNA STSHARROW38
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 117, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SORT data = bike;\n", " BY name type;\n", "RUN;\n", "\n", "PROC MEANS data = bike noprint;\n", " BY name type;\n", " OUTPUT out=sub sum=length;\n", "RUN;\n", "\n", "DATA sub;\n", " SET sub;\n", " keep name type length;\n", "RUN;\n", "\n", "PROC PRINT data=sub(obs=8);\n", "RUN;" ] }, { "cell_type": "code", "execution_count": 118, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
typeFrequencyPercentCumulative
Frequency
Cumulative
Percent
BIKE BOULEVARD31.8031.80
BIKE LANE5432.345734.13
CONTRAFLOW42.406136.53
SHARED BUS BIKE42.406538.92
SHARROW5029.9411568.86
SIDEPATH52.9912071.86
SIGNED ROUTE4728.14167100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 118, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC FREQ data=sub;\n", " TABLE type;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. Reshape `sub` from long to wide by taking the `type` to be the new columns and `length` to be the value in those columns. (NOTE: the `names` have spaces in them. Do we need to replace the spaces with a character before changing them to column names?)" ] }, { "cell_type": "code", "execution_count": 139, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Wide Dataset Using DATA Step

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnameBIKE_BOULEVARDBIKE_LANECONTRAFLOWSHARED_BUS_BIKESHARROWSIDEPATHSIGNED_ROUTE
1  11..318
2ALBEMARLE ST....8.8
3ALICEANNA ST....38..
4ARGONNE DR.2.....
5ART MUSEUM DR......6
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

Wide Dataset Using PROC TRANSPOSE

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsnameBIKE BOULEVARDBIKE LANESHARROWSIDEPATHSIGNED ROUTECONTRAFLOWSHARED BUS BIKE
1  11318..
2ALBEMARLE ST..8.8..
3ALICEANNA ST..38....
4ARGONNE DR.2.....
5ART MUSEUM DR....6..
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 139, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA new_wide;\n", " SET sub;\n", " BY name;\n", " RETAIN BIKE_BOULEVARD BIKE_LANE CONTRAFLOW SHARED_BUS_BIKE \n", " SHARROW SIDEPATH SIGNED_ROUTE;\n", " array atype (*) BIKE_BOULEVARD BIKE_LANE CONTRAFLOW SHARED_BUS_BIKE \n", " SHARROW SIDEPATH SIGNED_ROUTE;\n", " \n", " IF first.name THEN call missing(of atype(*));\n", " \n", " IF type = 'BIKE BOULEVARD' THEN atype(1) = length;\n", " ELSE IF type = 'BIKE LANE' THEN atype(2) = length;\n", " ELSE IF type = 'CONTRAFLOW' THEN atype(3) = length;\n", " ELSE IF type = 'SHARED BUS BIKE' THEN atype(4) = length;\n", " ELSE IF type = 'SHARROW' THEN atype(5) = length;\n", " ELSE IF type = 'SIDEPATH' THEN atype(6) = length;\n", " ELSE IF type = 'SIGNED ROUTE' THEN atype(7) = length;\n", " \n", " IF last.name THEN OUTPUT;\n", " DROP type length;\n", "RUN;\n", "\n", "PROC PRINT data=new_wide(OBS=5);\n", " TITLE 'Wide Dataset Using DATA Step';\n", "RUN;\n", "\n", "PROC TRANSPOSE data = sub out = new_wide2(drop=_NAME_);\n", " BY name;\n", " VAR length;\n", " ID type;\n", "RUN;\n", "\n", "PROC PRINT data = new_wide2(OBS=5);\n", " TITLE 'Wide Dataset Using PROC TRANSPOSE';\n", "RUN;\n", "\n", "TITLE ;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. Join data in the crash and road datasets to retain only complete data, (using an inner join on road) Merge by the variable `Road`. Call the output merged. How many observations are there?" ] }, { "cell_type": "code", "execution_count": 141, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.MERGEDObservations88
Member TypeDATAVariables6
EngineV9Indexes0
Created10/18/2020 20:34:03Observation Length64
Last Modified10/18/2020 20:34:03Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page1021
Obs in First Data Page88
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/merged.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671747
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
2DistrictChar14$14.$14.
3LengthNum8BEST12.BEST32.
5N_CrashesNum8BEST12.BEST32.
1RoadChar13$13.$13.
6VolumeNum8BEST12.BEST32.
4YearNum8BEST12.BEST32.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 141, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SORT data = road;\n", " BY road;\n", "RUN;\n", "\n", "PROC SORT data = crash;\n", " BY road;\n", "RUN;\n", "\n", "DATA merged;\n", " MERGE road (IN = in1) \n", " crash (IN = in2);\n", " BY road;\n", " IF in1 = 1 and in2 = 1;\n", "RUN;\n", "\n", "PROC CONTENTS data = merged;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 88 rows.\n", "\n", "13. Join data using a `full_join`. Call the output `full`. How many observations are there?" ] }, { "cell_type": "code", "execution_count": 143, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsRoadDistrictLengthYearN_CrashesVolume
1Interstate 27  .19912720350
2Interstate 27  .19922621200
3Interstate 27  .19932223200
4Interstate 27  .19942121200
5Interstate 27  .19952823200
6Interstate 27  .19962220000
7Interstate 27  .19972718000
8Interstate 27  .19982119500
9Interstate 27  .19992221000
10Interstate 27  .20002920700
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 143, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA full_join;\n", " MERGE road crash;\n", " BY road;\n", "RUN;\n", "\n", "PROC PRINT data = full_join(OBS=10);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "14. Do a left join of the `road` and `crash`. ORDER matters here! How many observations are there?" ] }, { "cell_type": "code", "execution_count": 144, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.LEFT_JOINObservations89
Member TypeDATAVariables6
EngineV9Indexes0
Created10/18/2020 20:38:16Observation Length64
Last Modified10/18/2020 20:38:16Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page1021
Obs in First Data Page89
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/left_join.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671749
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
2DistrictChar14$14.$14.
3LengthNum8BEST12.BEST32.
5N_CrashesNum8BEST12.BEST32.
1RoadChar13$13.$13.
6VolumeNum8BEST12.BEST32.
4YearNum8BEST12.BEST32.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 144, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA left_join;\n", " MERGE road (in = in1) crash;\n", " BY road;\n", " IF in1 = 1;\n", "RUN;\n", "\n", "PROC CONTENTS data = left_join;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 89 rows.\n", "\n", "15. Repeat above with a `right_join` with the same order of the arguments. How many observations are there?" ] }, { "cell_type": "code", "execution_count": 145, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CONTENTS Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Data Set NameWORK.RIGHT_JOINObservations110
Member TypeDATAVariables6
EngineV9Indexes0
Created10/18/2020 20:39:48Observation Length64
Last Modified10/18/2020 20:39:48Deleted Observations0
Protection CompressedNO
Data Set Type SortedNO
Label   
Data RepresentationSOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64  
Encodingutf-8 Unicode (UTF-8)  
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Engine/Host Dependent Information
Data Set Page Size65536
Number of Data Set Pages1
First Data Page1
Max Obs per Page1021
Obs in First Data Page110
Number of Data Set Repairs0
Filename/tmp/SAS_workA5B9000010B1_localhost.localdomain/right_join.sas7bdat
Release Created9.0401M6
Host CreatedLinux
Inode Number671750
Access Permissionrw-r--r--
Owner Namesasdemo
File Size128KB
File Size (bytes)131072
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Alphabetic List of Variables and Attributes
#VariableTypeLenFormatInformat
2DistrictChar14$14.$14.
3LengthNum8BEST12.BEST32.
5N_CrashesNum8BEST12.BEST32.
1RoadChar13$13.$13.
6VolumeNum8BEST12.BEST32.
4YearNum8BEST12.BEST32.
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 145, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA right_join;\n", " MERGE road crash (in = in1);\n", " BY road;\n", " IF in1 = 1;\n", "RUN;\n", "\n", "PROC CONTENTS data = right_join;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are 110 observations in the right joined dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The Output Delivery System and Graphics" ] }, { "cell_type": "code", "execution_count": 146, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsdaydatedailynumberroutetype
1Monday01/11/2010952877orangeBoardings
2Monday01/11/20109521027orangeAlightings
3Monday01/11/2010952952orangeAverage
4Monday01/11/2010952.purpleBoardings
5Monday01/11/2010952.purpleAlightings
6Monday01/11/2010952.purpleAverage
7Monday01/11/2010952.greenBoardings
8Monday01/11/2010952.greenAlightings
9Monday01/11/2010952.greenAverage
10Monday01/11/2010952.bannerBoardings
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Obsdaydatedailynumberroutetype
1Monday01/11/2010952952.0orangeAverage
2Tuesday01/12/2010796796.0orangeAverage
3Wednesday01/13/20101211.51211.5orangeAverage
4Thursday01/14/20101213.51213.5orangeAverage
5Friday01/15/201016441644.0orangeAverage
6Saturday01/16/20101490.51490.5orangeAverage
7Sunday01/17/2010888.5888.5orangeAverage
8Monday01/18/2010999.5999.5orangeAverage
9Tuesday01/19/201010351035.0orangeAverage
10Wednesday01/20/20101395.51395.5orangeAverage
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 146, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC IMPORT datafile = \"/folders/myfolders/SAS_Notes/data/Charm_City_Circulator_Ridership.csv\"\n", " out = circ dbms = csv replace;\n", " getnames = yes;\n", " guessingrows = max;\n", "RUN;\n", "\n", "DATA long;\n", " SET circ;\n", " ARRAY larray(*) orangeBoardings -- bannerAverage;\n", " DO i = 1 TO dim(larray);\n", " var = vname(larray(i));\n", " number = larray(i);\n", " var = tranwrd(var, 'Board', ' Board');\n", " var = tranwrd(var, 'Alight', ' Alight');\n", " var = tranwrd(var, 'Average', ' Average');\n", " route = scan(var, 1);\n", " type = scan(var, 2);\n", " OUTPUT;\n", " END;\n", " \n", " DROP i var orangeBoardings -- bannerAverage;\n", "RUN;\n", "\n", "DATA avg;\n", " SET long;\n", " WHERE type = 'Average' and number ne .;\n", "RUN;\n", "\n", "PROC PRINT data = long(obs = 10);\n", "RUN;\n", "\n", "PROC PRINT data = avg(obs=10);\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Plot average ridership (avg data set) by date using a scatterplot.\n", " a. Color the points by route (orange, purple, green, banner)\n", " b. Add black smoothed curves for each route\n", " c. Color the points by day of the week" ] }, { "cell_type": "code", "execution_count": 147, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 147, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SGPLOT data = avg;\n", " TITLE '1a';\n", " SCATTER Y = number X = date / group = route;\n", "RUN;\n", "\n", "PROC SGPLOT data = avg;\n", " TITLE '1b';\n", " SCATTER Y = number X = date / group = route;\n", " LOESS Y = number X = date / group = route lineattrs=(color = black);\n", "RUN;\n", "\n", "PROC SGPLOT data = avg;\n", " TITLE '1c';\n", " SCATTER Y = number X = date / group = day;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Replot 1a where the colors of the points are the name of the route (with banner –> blue). Note: you will need to make a data attribute map." ] }, { "cell_type": "code", "execution_count": 151, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 151, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA marker_map;\n", " INPUT id $2. +1 VALUE $6. +1 MARKERCOLOR $6.;\n", "DATALINES;\n", "mc banner blue\n", "mc green green\n", "mc orange orange\n", "mc purple purple\n", ";\n", "RUN;\n", "\n", "PROC SGPLOT data = avg dattrmap = marker_map;\n", " TITLE '2';\n", " SCATTER Y = number X = date / group = route attrid = mc;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Plot a scatterplot of average ridership by date with one panel per route." ] }, { "cell_type": "code", "execution_count": 162, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 162, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SGPANEL data = avg;\n", " TITLE '3';\n", " PANELBY route;\n", " SCATTER Y = number X = date;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Plot a scatterplpot of average ridership by date with separate panels by day of the week, colored by route." ] }, { "cell_type": "code", "execution_count": 165, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 165, "metadata": {}, "output_type": "execute_result" } ], "source": [ "*Default colors are fine here to, but we can also use a data attribute map like we did before;\n", "PROC SGPANEL data = avg dattrmap = marker_map;\n", " TITLE '4';\n", " PANELBY day;\n", " SCATTER Y = number X = date / group = route attrid = mc;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Plot a scatterplot of average ridership (avg) by date, colored by route (same as 1a). (do not take an average, use the average column for each route). Make the x-label \"Year\". Make the y-label \"Number of People\"." ] }, { "cell_type": "code", "execution_count": 166, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 166, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SGPLOT data = avg dattrmap = marker_map;\n", " TITLE '5';\n", " SCATTER Y = number X = date / group = route attrid = mc;\n", " XAXIS LABEL = \"Year\";\n", " YAXIS LABEL = \"Number of People\";\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Plot average ridership on the orange route versus date as a solid line, and add dashed “error” lines based on the boardings and alightings. The line colors should be orange." ] }, { "cell_type": "code", "execution_count": 179, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 179, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA orange;\n", " SET long;\n", " WHERE route = \"orange\";\n", "RUN;\n", "\n", "DATA line_map;\n", " INPUT id $2. +1 VALUE $10. +1 linepattern $5.;\n", "DATALINES;\n", "lp Alightings dash\n", "lp Average solid\n", "lp Boardings dash\n", ";\n", "RUN;\n", "\n", "PROC SGPLOT data = orange dattrmap = line_map;\n", " TITLE '6';\n", " SERIES Y = number X = date / group = type attrid = lp lineattrs=(color = orange);\n", "RUN;\n", "\n", "TITLE;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Statistical Analysis in SAS" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "FILENAME cardata '/folders/myfolders/SAS_Notes/data/kaggleCarAuction.csv';\n", "\n", "PROC IMPORT datafile = cardata out = cars dbms = CSV replace;\n", " getnames = yes;\n", " guessingrows = 1000;\n", "RUN;\n", "\n", "FILENAME mortdat '/folders/myfolders/SAS_Notes/data/indicatordeadkids35.csv';\n", "\n", "PROC IMPORT datafile = mortdat out = mort dbms = CSV replace;\n", " getnames = yes;\n", " guessingrows = 500;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Compute the correlation between the `1980`, `1990`, `2000`, and `2010` mortality data. Just display the result to the screen. Then compute using the NOMMISS option. (Note: The column names are numbers, which are invalid standard SAS names, so to refer to the variable 1980 in your code use '1980'n.)" ] }, { "cell_type": "code", "execution_count": 180, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CORR Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
4 Variables:1980 1990 2000 2010
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
19801970.947310.88694186.620130.070493.49035
19901970.709110.77929139.694280.049223.28983
20001970.550040.68922108.357860.032322.81186
20101930.396010.5002176.429460.031732.02857
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Pearson Correlation Coefficients
Prob > |r| under H0: Rho=0
Number of Observations
 1980199020002010
1980\n", "
\n", "
1.00000
\n", "
 
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.96015
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.88884
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.84683
\n", "
<.0001
\n", "
193
\n", "
\n", "
1990\n", "
\n", "
0.96015
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.96138
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.92472
\n", "
<.0001
\n", "
193
\n", "
\n", "
2000\n", "
\n", "
0.88884
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.96138
\n", "
<.0001
\n", "
197
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
197
\n", "
\n", "
\n", "
\n", "
0.98623
\n", "
<.0001
\n", "
193
\n", "
\n", "
2010\n", "
\n", "
0.84683
\n", "
<.0001
\n", "
193
\n", "
\n", "
\n", "
\n", "
0.92472
\n", "
<.0001
\n", "
193
\n", "
\n", "
\n", "
\n", "
0.98623
\n", "
<.0001
\n", "
193
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
193
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The CORR Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
4 Variables:1980 1990 2000 2010
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
19801930.962500.88966185.761750.070493.49035
19901930.721320.78263139.215500.055123.28983
20001930.559760.69297108.034590.032712.81186
20101930.396010.5002176.429460.031732.02857
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Pearson Correlation Coefficients, N = 193
Prob > |r| under H0: Rho=0
 1980199020002010
1980\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
0.95968
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.88774
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.84683
\n", "
<.0001
\n", "
\n", "
1990\n", "
\n", "
0.95968
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
0.96103
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.92472
\n", "
<.0001
\n", "
\n", "
2000\n", "
\n", "
0.88774
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.96103
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
0.98623
\n", "
<.0001
\n", "
\n", "
2010\n", "
\n", "
0.84683
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.92472
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.98623
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 180, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC CORR data = mort;\n", " VAR '1980'n '1990'n '2000'n '2010'n;\n", "RUN;\n", "\n", "PROC CORR data = mort nomiss;\n", " VAR '1980'n '1990'n '2000'n '2010'n;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. \n", " a. Compute the correlation between the `Myanmar`, `China`, and `United States` mortality data. Store this correlation matrix in an object called `country_cor` using ODS OUTPUT.\n", " b. Extract the Myanmar-US correlation from the correlation matrix." ] }, { "cell_type": "code", "execution_count": 182, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The CORR Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
3 Variables:China Myanmar United States
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Simple Statistics
VariableNMeanStd DevSumMinimumMaximum
China2142.376081.15555508.481530.043253.61453
Myanmar2142.822151.12767603.940310.051503.76418
United States2141.360801.14407291.211320.028673.17018
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Pearson Correlation Coefficients, N = 214
Prob > |r| under H0: Rho=0
 ChinaMyanmarUnited States
China\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
0.97436
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.69093
\n", "
<.0001
\n", "
\n", "
Myanmar\n", "
\n", "
0.97436
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
0.62926
\n", "
<.0001
\n", "
\n", "
United States\n", "
\n", "
0.69093
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
0.62926
\n", "
<.0001
\n", "
\n", "
\n", "
\n", "
1.00000
\n", "
 
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsMyanmar
30.62926
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 182, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA mort_sub;\n", " SET mort(RENAME=(Var1 = Country));\n", " WHERE country in ('Myanmar' 'China' 'United States');\n", "RUN;\n", "\n", "PROC TRANSPOSE data = mort_sub out = mort_sub_transposed;\n", " VAR '1760'n -- '2099'n;\n", " ID country;\n", "RUN;\n", "\n", "PROC CORR DATA = mort_sub_transposed;\n", " VAR China Myanmar 'United States'n;\n", " ODS OUTPUT PearsonCorr = country_cor;\n", "RUN;\n", "\n", "PROC PRINT data = country_cor;\n", " Var Myanmar;\n", " WHERE Variable = 'United States';\n", "RUN;\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Is there a difference between mortality information from `1990` and `2000`? Run a paired t-test and a Wilcoxon signed rank test to assess this. Hint: to extract the column of information for `1990`, use '1990'n." ] }, { "cell_type": "code", "execution_count": 185, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "

The TTEST Procedure

\n", "

 

\n", "

Difference: 1990 - 2000

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NMeanStd DevStd ErrMinimumMaximum
1970.15910.22270.0159-0.54560.8726
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Mean95% CL MeanStd Dev95% CL Std Dev
0.15910.12780.19040.22270.20270.2472
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
DFt ValuePr > |t|
19610.03<.0001
\n", "
\n", "
\n", "
\n", "\"Summary\n", "
\n", "
\n", "
\n", "
\n", "\"Profiles\n", "
\n", "
\n", "
\n", "
\n", "\"Agreement\n", "
\n", "
\n", "
\n", "
\n", "\"Q-Q\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The UNIVARIATE Procedure

\n", "

Variable: diff

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Moments
N197Sum Weights197
Mean-0.1590681Sum Observations-31.336415
Std Deviation0.22269839Variance0.04959457
Skewness-0.9205453Kurtosis2.01791651
Uncorrected SS14.7051599Corrected SS9.72053615
Coeff Variation-140.00192Std Error Mean0.0158666
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Basic Statistical Measures
LocationVariability
Mean-0.15907Std Deviation0.22270
Median-0.09007Variance0.04959
Mode.Range1.41823
  Interquartile Range0.21805
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt-10.0253Pr > |t|<.0001
SignM-84.5Pr >= |M|<.0001
Signed RankS-8245.5Pr >= |S|<.0001
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Quantiles (Definition 5)
LevelQuantile
100% Max0.5456100
99%0.3990751
95%0.0830986
90%-0.0104786
75% Q3-0.0286619
50% Median-0.0900725
25% Q1-0.2467130
10%-0.4798594
5%-0.6188205
1%-0.8710244
0% Min-0.8726242
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Extreme Observations
LowestHighest
ValueObsValueObs
-0.872624190.27262132
-0.8710241310.321500196
-0.808735960.32995622
-0.804626540.39907599
-0.7942681060.545610197
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 185, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC TTEST data = mort;\n", " PAIRED '1990'n*'2000'n;\n", "RUN;\n", "\n", "DATA diff;\n", " SET mort;\n", " diff = '2000'n - '1990'n;\n", "RUN;\n", "\n", "PROC UNIVARIATE data = diff;\n", " VAR diff;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. Using the cars dataset, fit a linear regression model with vehicle cost (`VehBCost`) as the outcome and vehicle age (`VehicleAge`) and whether it's an online sale (`IsOnlineSale`) as predictors as well as their interaction." ] }, { "cell_type": "code", "execution_count": 186, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The GLM Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Number of Observations Read72983
Number of Observations Used72983
\n", "
\n", "
\n", "
\n", "
\n", "

The GLM Procedure

\n", "

 

\n", "

Dependent Variable: VehBCost

\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
SourceDFSum of SquaresMean SquareF ValuePr > F
Model3224220712937474023764.42652.08<.0001
Error729792056671880032818169.4461  
Corrected Total72982228089259296   
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
R-SquareCoeff VarRoot MSEVehBCost Mean
0.09830424.940671678.7406730.934
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
SourceDFType I SSMean SquareF ValuePr > F
VehicleAge122249980049222499800497895.19<.0001
IsOnlineSale115886562115886562156.37<.0001
VehicleAg*IsOnlineSa113225623132256234.690.0303
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
SourceDFType III SSMean SquareF ValuePr > F
VehicleAge121533779839215337798397641.05<.0001
IsOnlineSale1642525356425253522.80<.0001
VehicleAg*IsOnlineSa113225623132256234.690.0303
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ParameterEstimateStandard
Error
t ValuePr > |t|
Intercept8062.70199916.5870226486.08<.0001
VehicleAge-320.6621083.6683525-87.41<.0001
IsOnlineSale514.308268107.71146314.77<.0001
VehicleAg*IsOnlineSa-55.37292625.5607123-2.170.0303
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 186, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC GLM data = cars;\n", " MODEL VehBCost = VehicleAge IsOnlineSale VehicleAge*IsOnlineSale;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Create a variable called `expensive` in the `cars` data that indicates if the \n", "vehicle cost is over `$10,000`. Use a chi-squared test to assess if there is a\n", "relationship between a car being expensive and it being labeled as a \"bad buy\" (`IsBadBuy`)." ] }, { "cell_type": "code", "execution_count": 187, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "
\n", "
\n", "
Frequency
\n", "
Percent
\n", "
Row Pct
\n", "
Col Pct
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Table of expensive by IsBadBuy
expensiveIsBadBuy
01Total
0\n", "
\n", "
61505
\n", "
84.27
\n", "
87.67
\n", "
96.09
\n", "
\n", "
\n", "
\n", "
8652
\n", "
11.85
\n", "
12.33
\n", "
96.39
\n", "
\n", "
\n", "
\n", "
70157
\n", "
96.13
\n", "
 
\n", "
 
\n", "
\n", "
1\n", "
\n", "
2502
\n", "
3.43
\n", "
88.54
\n", "
3.91
\n", "
\n", "
\n", "
\n", "
324
\n", "
0.44
\n", "
11.46
\n", "
3.61
\n", "
\n", "
\n", "
\n", "
2826
\n", "
3.87
\n", "
 
\n", "
 
\n", "
\n", "
Total\n", "
\n", "
64007
\n", "
87.70
\n", "
\n", "
\n", "
\n", "
8976
\n", "
12.30
\n", "
\n", "
\n", "
\n", "
72983
\n", "
100.00
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

Statistics for Table of expensive by IsBadBuy

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
StatisticDFValueProb
Chi-Square11.89480.1687
Likelihood Ratio Chi-Square11.93140.1646
Continuity Adj. Chi-Square11.81520.1779
Mantel-Haenszel Chi-Square11.89480.1687
Phi Coefficient -0.0051 
Contingency Coefficient 0.0051 
Cramer's V -0.0051 
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Fisher's Exact Test
Cell (1,1) Frequency (F)61505
Left-sided Pr <= F0.0880
Right-sided Pr >= F0.9211
  
Table Probability (P)0.0091
Two-sided Pr <= P0.1790
\n", "
\n", "
\n", "

Sample Size = 72983

\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 187, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA cars;\n", " SET cars;\n", " IF VehBCost > 10000 THEN expensive = 1;\n", " ELSE expensive = 0;\n", "RUN;\n", "\n", "PROC FREQ data = cars;\n", " TABLES expensive*IsBadBuy / chisq;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. Fit a logistic regression model where the outcome is \"bad buy\" status and predictors are the `expensive` status and vehicle age (`VehicleAge`). Request confidence intervals for the odds ratios." ] }, { "cell_type": "code", "execution_count": 188, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The LOGISTIC Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Model Information
Data SetWORK.CARS
Response VariableIsBadBuy
Number of Response Levels2
Modelbinary logit
Optimization TechniqueFisher's scoring
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Number of Observations Read72983
Number of Observations Used72983
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Response Profile
Ordered
Value
IsBadBuyTotal
Frequency
1064007
218976
\n", "
\n", "

Probability modeled is IsBadBuy='1'.

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Model Fit Statistics
CriterionIntercept OnlyIntercept and Covariates
AIC54423.30752447.788
SC54432.50552475.381
-2 Log L54421.30752441.788
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Testing Global Null Hypothesis: BETA=0
TestChi-SquareDFPr > ChiSq
Likelihood Ratio1979.51982<.0001
Score2042.57602<.0001
Wald1961.86622<.0001
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis of Maximum Likelihood Estimates
ParameterDFEstimateStandard
Error
Wald
Chi-Square
Pr > ChiSq
Intercept1-3.24950.03329554.1782<.0001
expensive1-0.08040.06071.75420.1853
VehicleAge10.28660.006481958.7086<.0001
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Association of Predicted Probabilities and Observed Responses
Percent Concordant56.9Somers' D0.279
Percent Discordant29.1Gamma0.324
Percent Tied14.0Tau-a0.060
Pairs574526832c0.639
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Odds Ratio Estimates and Wald Confidence Intervals
EffectUnitEstimate95% Confidence Limits
expensive1.00000.9230.8191.039
VehicleAge1.00001.3321.3151.349
\n", "
\n", "
\n", "
\n", "\"Plot\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 188, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC LOGISTIC data = cars;\n", " MODEL IsBadBuy(event='1') = expensive VehicleAge / CLODDS = WALD;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Macros\n", "\n", "1. Write a macro that discretizes a quantitative variable into four categories based on quantiles. That is:\n", "\n", " * if X < Q1 then group = 1\n", " * if Q1 < X < M then group = 2\n", " * if M < X < Q3 then group = 3\n", " * if Q3 < X then group = 4\n", "\n", "The macro should have the following defnition - %quartilesmacro(mydata, qvar, round, out);\n", "\n", "* mydata: dataset containing the quantitative variable\n", "* qvar: name of quantitative variable\n", "* round: integer representing number of decimal places to round to\n", "* out: name of output dataset which contains the categorized variable\n", "\n", "We will explore this by using the Framingham dataset as an example to apply the macro to." ] }, { "cell_type": "code", "execution_count": 189, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
5880  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
5880! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
5881
5882 LIBNAME phc6089 "/folders/myfolders/SAS_Notes/data/";
NOTE: Libref PHC6089 was successfully assigned as follows:
Engine: V9
Physical Name: /folders/myfolders/SAS_Notes/data
5883
5884 *Create a temporary data set, so that we don't save changes to the original data set.;
5885 DATA fghmTemp;
NOTE: Data file PHC6089.FGHM113.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
5886 SET phc6089.fghm113;
5887 RUN;
NOTE: There were 500 observations read from the data set PHC6089.FGHM113.
NOTE: The data set WORK.FGHMTEMP has 500 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.00 seconds

5888
5889 /*Now let's code some variables with some more descriptive values.
5890 SEX (Gender): 1=Men
5891 \t\t\t\t2=Women
5892 Period (Examination cycle): 1=Period1
5893 \t\t\t\t\t\t\t 2=Period2
5894 \t\t\t\t\t\t\t 3=Period3
5895 BPMEDS (Use of anti-hypertensive meds): 0=Not currently
5896 \t\t\t\t\t\t\t\t\t\t 1=Currently use
5897 CURSMOKE (Currently smoke?): 0=No
5898 \t\t\t\t\t\t\t 1=Yes
5899 DIABETES: 0=Not diabetic
5900 \t\t\t1=Diabetic
5901 PREVAP (Have angina pectoric?): 0=No
5902 \t\t\t\t\t\t\t\t 1=Yes
5903 PREVCHD (Coronary heart disease?): 0=No
5904 \t\t\t\t\t\t\t\t\t 1=Yes
5905 PREVMI (Myocardial infarction?): 0=No
5906 \t\t\t\t\t\t\t\t 1=Yes
5907 PREVSTRK (Had a stroke?): 0=No
5908 \t\t\t\t \t\t\t1=Yes
5909 PREVHYP (Hypertensive? sys bp >=140 or dyas bp >= 90): 0=no
5910 \t\t\t\t\t\t\t\t\t\t\t\t\t\t 1=yes
5911 */
5912
5913 PROC FORMAT;
5914 VALUE YNfmt 0="No"
NOTE: Format YNFMT has been output.
5915 \t\t\t1="Yes";
5916 VALUE perfmt 1="Period 1"
5917 \t\t\t 2="Period 2"
NOTE: Format PERFMT has been output.
5918 \t\t\t 3="Period 3";
5919 VALUE gndrfmt 1="Men"
NOTE: Format GNDRFMT has been output.
5920 \t\t\t 2="Women";
5921 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

5922
5923 DATA fghmtemp;
5924 SET fghmtemp;
5925 FORMAT prevap ynfmt.
5926 \t diabetes ynfmt.
5927 \t cursmoke ynfmt.
5928 \t bpmeds ynfmt.
5929 \t prevchd ynfmt.
5930 \t prevmi ynfmt.
5931 \t prevstrk ynfmt.
5932 \t prevhyp ynfmt.
5933 \t sex gndrfmt.;
5934 RUN;
NOTE: There were 500 observations read from the data set WORK.FGHMTEMP.
NOTE: The data set WORK.FGHMTEMP has 500 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

5935
5936 ods html5 (id=saspy_internal) close;ods listing;

5937
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 189, "metadata": {}, "output_type": "execute_result" } ], "source": [ "LIBNAME phc6089 \"/folders/myfolders/SAS_Notes/data/\";\n", "\n", "*Create a temporary data set, so that we don't save changes to the original data set.;\n", "DATA fghmTemp;\n", " SET phc6089.fghm113;\n", "RUN;\n", "\n", "/*Now let's code some variables with some more descriptive values.\n", " SEX (Gender): 1=Men \n", "\t\t\t\t2=Women\n", " Period (Examination cycle): 1=Period1 \n", "\t\t\t\t\t\t\t 2=Period2\n", "\t\t\t\t\t\t\t 3=Period3\n", " BPMEDS (Use of anti-hypertensive meds): 0=Not currently\n", "\t\t\t\t\t\t\t\t\t\t 1=Currently use\n", " CURSMOKE (Currently smoke?): 0=No\n", "\t\t\t\t\t\t\t 1=Yes\n", " DIABETES: 0=Not diabetic\n", "\t\t\t1=Diabetic\n", " PREVAP (Have angina pectoric?): 0=No\n", "\t\t\t\t\t\t\t\t 1=Yes\n", " PREVCHD (Coronary heart disease?): 0=No\n", "\t\t\t\t\t\t\t\t\t 1=Yes\n", " PREVMI (Myocardial infarction?): 0=No\n", "\t\t\t\t\t\t\t\t 1=Yes\n", " PREVSTRK (Had a stroke?): 0=No\n", "\t\t\t\t \t\t\t1=Yes\n", " PREVHYP (Hypertensive? sys bp >=140 or dyas bp >= 90): 0=no\n", "\t\t\t\t\t\t\t\t\t\t\t\t\t\t 1=yes\n", "*/\n", "\n", "PROC FORMAT;\n", "VALUE YNfmt 0=\"No\"\n", "\t\t\t1=\"Yes\";\n", "VALUE perfmt 1=\"Period 1\"\n", "\t\t\t 2=\"Period 2\"\n", "\t\t\t 3=\"Period 3\";\n", "VALUE gndrfmt 1=\"Men\"\n", "\t\t\t 2=\"Women\";\n", "RUN;\n", "\n", "DATA fghmtemp;\n", "SET fghmtemp;\n", "FORMAT prevap ynfmt.\n", "\t diabetes ynfmt.\n", "\t cursmoke ynfmt.\n", "\t bpmeds ynfmt.\n", "\t prevchd ynfmt.\n", "\t prevmi ynfmt.\n", "\t prevstrk ynfmt.\n", "\t prevhyp ynfmt.\n", "\t sex gndrfmt.;\n", "RUN;" ] }, { "cell_type": "code", "execution_count": 190, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : BMI Body Mass Index (kr/(M*M)
Lower QuartileMedianUpper Quartile
23.100000025.280000027.6300000
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : BMI Body Mass Index (kr/(M*M)
Lower QuartileMedianUpper Quartile
23.100000025.280000027.6300000
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
ObsBMI_Q1BMI_MedianBMI_Q3
123.125.2827.63
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
bmi_catFrequencyPercentCumulative
Frequency
Cumulative
Percent
112024.0012024.00
213026.0025050.00
312525.0037575.00
412525.00500100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 190, "metadata": {}, "output_type": "execute_result" } ], "source": [ "* Step 1 - write code with only macros variables - Let's categorize BMI by quartiles;\n", "\n", "%let qvar = bmi;\n", "\n", "* Find table name;\n", "ODS trace on;\n", "proc means data = fghmtemp Q1 Median Q3;\n", " var &qvar;\n", "run;\n", "ODS trace off;\n", "\n", "proc means data = fghmtemp Q1 Median Q3;\n", " var &qvar;\n", " ODS OUTPUT summary = quartiles;\n", "run;\n", "\n", "* See the contents of the ODS table;\n", "proc print data = quartiles;\n", "run;\n", "\n", "data _null_;\n", " set quartiles;\n", " call symput(\"Q1\", strip(&qvar._Q1));\n", " call symput(\"M\", strip(&qvar._Median));\n", " call symput(\"Q3\", strip(&qvar._Q3));\n", "run;\n", "\n", "%put _user_; * see the value of our macro variables in the log file;\n", "\n", "data fghmtemp;\n", " set fghmtemp;\n", " if not missing(&qvar) and &qvar < &Q1 then &qvar._cat = 1;\n", " else if &qvar < &M then &qvar._cat = 2;\n", " else if &qvar < &Q3 then &qvar._cat = 3;\n", " else &qvar._cat = 4;\n", "run;\n", "\n", "proc freq data = fghmtemp;\n", " tables &qvar._cat;\n", "run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have the basic code working, let's turn this into a MACRO." ] }, { "cell_type": "code", "execution_count": 191, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "
\n", "

The FREQ Procedure

\n", "
\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
bmi_catFrequencyPercentCumulative
Frequency
Cumulative
Percent
112024.0012024.00
213026.0025050.00
312525.0037575.00
412525.00500100.00
\n", "
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 191, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%macro quartilesmacro(mydata, qvar, round, out = out_dat);\n", "\n", "ODS select none;\n", "proc means data = &mydata Q1 Median Q3 maxdec = &round;\n", " var &qvar;\n", " ODS OUTPUT summary = quartiles;\n", "run;\n", "ods select all;\n", "\n", "data _null_;\n", " set quartiles;\n", " call symput(\"Q1\", strip(&qvar._Q1));\n", " call symput(\"M\", strip(&qvar._Median));\n", " call symput(\"Q3\", strip(&qvar._Q3));\n", "run;\n", "\n", "data &out;\n", " set &mydata;\n", " if not missing(&qvar) and &qvar < &Q1 then &qvar._cat = 1;\n", " else if &qvar < &M then &qvar._cat = 2;\n", " else if &qvar < &Q3 then &qvar._cat = 3;\n", " else &qvar._cat = 4;\n", "run;\n", "\n", "%mend;\n", "\n", "%quartilesmacro(fghmtemp, bmi, round = 2, out = out_bmi);\n", "\n", "proc freq data = out_bmi;\n", " tables bmi_cat;\n", "run;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulations\n", "\n", "1. In this example, we will explore the sampling distribution of a sample proportion.\n", " a) Generate 500 sample of size 50 from a Binomial distribution with n = 1 and p = 0.3. Do this by generating a dataset with 500 rows and 50 columns all filled with random variates from this binomial distribution. Hint: Use an array and a nested DO loop.\n", " b) Calculate the mean of each of these 500 rows using mean(of ) in a DATA step. Save this mean as a new column.\n", " c) Plot a histogram of the 500 means calculated in part b and use PROC MEANS to calculate the MEAN and standard deviation of the 500 means calculated in part b. This describes the sampling distribution of p-hat when the population distiribution has probability of success p = 0.3 when the sample size is 30." ] }, { "cell_type": "code", "execution_count": 195, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "
\n", "\"The\n", "
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

The MEANS Procedure

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Analysis Variable : p_hat
MeanStd Dev
0.29804000.0635376
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 195, "metadata": {}, "output_type": "execute_result" } ], "source": [ "DATA sim;\n", " ARRAY data{*} x1-x50;\n", " DO numsim = 1 TO 500;\n", " DO n = 1 to 50;\n", " data(n) = ranbin(1234, 1, 0.3);\n", " END;\n", " p_hat = mean(of x1-x50);\n", " OUTPUT;\n", " END;\n", "RUN;\n", "\n", "PROC SGPLOT data = sim;\n", " HISTOGRAM p_hat;\n", "RUN;\n", "\n", "PROC MEANS data = sim mean std;\n", " VAR p_hat;\n", "RUN;" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "SAS", "language": "sas", "name": "sas" }, "language_info": { "codemirror_mode": "sas", "file_extension": ".sas", "mimetype": "text/x-sas", "name": "sas" } }, "nbformat": 4, "nbformat_minor": 2 }