8. SAS Formats and Dates¶
We previously learned how to use a FORMAT statement to tell SAS to display certain variable values in a particular way. For example, we might tell SAS to display a date variable saledate, say, using the SAS mmddyy10. format, so that August 19, 2008 is displayed as 08/19/2008. There are a whole slew of informats and formats that SAS provides that you can find in the SAS Help and Documentation. Our focus in this lesson will be on creating informats and formats to fill in for those that SAS doesn’t provide.
That is, in this lesson, we will extend our formatting capabilities by investigating how to create user-defined informats and formats using the FORMAT procedure. In particular, we will take a look at the following techniques:
how to translate values of a character variable when they are read in a SAS data set into more meaningful values using the INVALUE statement
how to create customized formats for character and numeric variables, using the VALUE statement, so variables can be printed in a meaningful format
how to create templates, using the PICTURE statement, for printing numbers with a special format, such as leading zeros, decimal and comma punctuation, fill characters, and prefixes.
We wil also investigate various aspects of processing dates and times within the SAS System. Specifically, we will learn:
how SAS defines numeric date and time values
how to use informats to read dates and times into a SAS data set
how to use formats to display SAS dates and times
how to use dates and times in calculations
how to compare a SAS date to some date constant, and how to compare a SAS time to some time constant
how to use several of the available SAS date and time functions
how to change the system options that pertain to processing date and times
As always, you’ll probably want to follow along in the lesson by downloading and running the provided SAS programs yourself.
8.1. The Format Procedure¶
Throughout this section, we will investigate a number of examples that illustrate how to create different informats and formats for several different variables. To do so, we will use a subset of the demographic (or “background”) data collected on 638 subjects once enrolled in the National Institute of Health’s Interstitial Cystitis Data Base (ICDB) Study. Not surprisingly, the ICDB Study collected data on people who were diagnosed as having interstitial cystitis! The primary reason for conducting the study was that interstitial cystitis is a poorly understood condition that causes severe bladder and pelvic pain, urinary frequency, and painful urination in the absence of any identifiable cause. Although the disease is more prevalent in women, it affects both men and women of all ages. For the ICDB Study, each subject was enrolled at one of seven clinical centers and was evaluated four times a year for as many as four years.
It will probably be helpful for you to take a peek at the background data form on which the data were collected. In order to run the SAS programs in this lesson, you’ll need to save the background data set, back.sas7bdat, to a directory on your computer. See the course website for the dataset.
Because there are 638 observations and 16 variables in the permanent background data set icdb.back, the data on just ten subjects and nine variables are selected when creating the temporary working background data set back. The following SAS program creates the subset:
LIBNAME phc6089 '/folders/myfolders/SAS_Notes/data/';
DATA back;
set phc6089.back;
age = (v_date - b_date)/365.25;
if subj in (110051, 110088, 210012, 220004, 230006,
310083, 410012, 420037, 510027, 520017);
keep subj v_date b_date age sex state country race relig;
format age 4.1;
RUN;
PROC PRINT data=back;
title 'Output Dataset: BACK';
RUN;
SAS Connection established. Subprocess id is 5079
Obs | subj | v_date | b_date | sex | state | country | race | relig | age |
---|---|---|---|---|---|---|---|---|---|
1 | 110051 | 01/25/94 | 12/02/42 | 2 | 42 | 1 | 4 | 3 | 51.1 |
2 | 110088 | 02/28/95 | 10/03/27 | 2 | 23 | 1 | 4 | 2 | 67.4 |
3 | 210012 | 07/16/93 | 06/27/24 | 2 | . | 6 | 4 | 1 | 69.1 |
4 | 220004 | 07/27/93 | 08/07/72 | 2 | 38 | 1 | 4 | 1 | 21.0 |
5 | 230006 | 01/06/94 | 04/24/49 | 2 | 21 | 1 | 4 | 3 | 44.7 |
6 | 310083 | 01/20/95 | 05/13/54 | 1 | . | 17 | 2 | 1 | 40.7 |
7 | 410012 | 09/16/93 | 11/01/47 | 2 | 22 | 1 | 4 | 3 | 45.9 |
8 | 420037 | 02/02/94 | 07/25/41 | 2 | 22 | 1 | 4 | 1 | 52.5 |
9 | 510027 | 02/15/94 | 08/14/63 | 2 | 49 | 1 | 4 | 1 | 30.5 |
10 | 520017 | 11/17/93 | 09/24/54 | 2 | 14 | 1 | 4 | 1 | 39.1 |
We’ll also need to work with an raw data file version of the subset data set. The following SAS code creates the ascii raw data file, in column format, from the temporary back data set:
DATA _NULL_;
set back;
file '/folders/myfolders/SAS_Notes/data/back.dat';
put subj 1-6 @8 b_date mmddyy8. sex 17 race 19
relig 21 state 23-24 country 26-27
@29 age 4.1 @34 v_date mmddyy8.;
RUN;
The SAS data set name NULL tells SAS to execute the DATA step as if it were creating a new SAS data set, but no observations and no variables are written to an output data set. The PUT statement tells SAS to write the variables — in the format specified — to the filename specified (back.dat) in the FILE statement. The specifications used in the PUT statement are similar to the specifications used in the INPUT statement.
Launch the SAS program. Then, edit the FILE statement so it reflects the location where you would like the raw data file saved. Then, run the program. Open the newly created back.dat file in an ascii editor, such as NotePad, to convince yourself that its structure and contents are similar to the back data set.
8.1.1. The INVALUE Statement¶
The INVALUE statement in the FORMAT procedure allows you to create your own customized informats for character variables. That is, it allows you to tell SAS how you’d like the program to read in special character values. In doing so, SAS effectively translates the values of a character variable into different, typically more meaningful character or numeric values. For example, the following INVALUE statement:
INVALUE $french 'OUI'= 'YES'
'NON'= 'NO';
prepares SAS to translate a character variable in French to a character variable in English.
Restrictions on the INVALUE statement include:
You can only translate a character variable to another variable. You cannot translate a numeric variable using the INVALUE statement.
The name of the informat must begin with a $ sign, since it refers to a character variable.
The name of the informat (for example, french) must be a valid SAS name with no more than 30 additional characters following the imperative $ sign. The name cannot end in a number nor can the name be a standard SAS informat name.
When you refer to the informat later, you must follow the name with a period.
The INVALUE statement in the FORMAT procedure merely defines an informat so that it is available for use. In order for the informat to take effect, you must associate the character variable with the informat either explicitly in the INPUT statement:
INPUT resp $french.;
or in a FORMAT statement:
FORMAT resp $french.;
Let’s take a look at an example!
Example
The following SAS code illustrates the use of the FORMAT procedure to define how SAS should translate the two character variables sex and race during input:
PROC FORMAT;
invalue $insex '1' = 'M'
'2' = 'F';
invalue $inrace '1' = 'Indian'
'2' = 'Asian'
'3' = 'Black'
'4' = 'White';
RUN;
54 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
54 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
55
56 PROC FORMAT;
57 invalue $insex '1' = 'M'
NOTE: Informat $INSEX has been output.
58 '2' = 'F';
59
60 invalue $inrace '1' = 'Indian'
61 '2' = 'Asian'
62 '3' = 'Black'
NOTE: Informat $INRACE has been output.
63 '4' = 'White';
64 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
65
66 ods html5 (id=saspy_internal) close;ods listing;
67
Because the INVALUE statement is used, the translation is restricted to taking place on input. As a result of this code, providing the character variable sex
is later associated with the informat $insex
, whenever SAS encounters the character value '1' for the variable sex
it will instead store the character value 'M'. Similarly, whenever SAS encounters the character value '2' for the variable sex
it will instead store the character value 'F'.
Launch and run the SAS program. The only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like what is shown above.
As we'll learn later in this lesson, in order to make the definitions for reading in sex
and race
permanently stored beyond our current work session, we'd need to attach a "LIBRARY =" option to the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this format procedure are temporary only. That is, they are not stored beyond your current SAS session.
All we've done so far is define the informats so that they are available for use. Now let's use them!
Example
The following data step uses the informats that we defined in the previous example to read in a subset of the data from the input raw data file back.dat:
DATA temp1;
infile '/folders/myfolders/SAS_Notes/data/back.dat';
length sex $ 1 race $ 6;
input subj 1-6 @17 sex $insex1. @19 race $inrace1.;
RUN;
PROC CONTENTS data=temp1;
title 'Output Dataset: TEMP1';
RUN;
PROC PRINT data=temp1;
var subj sex race;
RUN;
The CONTENTS Procedure
Data Set Name | WORK.TEMP1 | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 3 |
Engine | V9 | Indexes | 0 |
Created | 09/24/2020 22:34:56 | Observation Length | 16 |
Last Modified | 09/24/2020 22:34:56 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 4061 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
Filename | /tmp/SAS_work6AF500006468_localhost.localdomain/temp1.sas7bdat |
Release Created | 9.0401M6 |
Host Created | Linux |
Inode Number | 671542 |
Access Permission | rw-r--r-- |
Owner Name | sasdemo |
File Size | 128KB |
File Size (bytes) | 131072 |
Alphabetic List of Variables and Attributes | |||
---|---|---|---|
# | Variable | Type | Len |
2 | race | Char | 6 |
1 | sex | Char | 1 |
3 | subj | Num | 8 |
Obs | subj | sex | race |
---|---|---|---|
1 | 110051 | F | White |
2 | 110088 | F | White |
3 | 210012 | F | White |
4 | 220004 | F | White |
5 | 230006 | F | White |
6 | 310083 | M | Asian |
7 | 410012 | F | White |
8 | 420037 | F | White |
9 | 510027 | F | White |
10 | 520017 | F | White |
Only a subset of the variables in the back.dat data file is read. Column numbers ("1-6") are used to read the variable subj, and absolute pointer controls are used to read the variables sex ("@17") and race ("@19") from the file. Note that:
- Because we want to translate the variables, we must read sex and race as character variables, even though they are numbers.
- On input, we have the option of specifying the length of the variables being read in. The length of the variables is specified in the informat name between the name and the period. For example, the length of the variable race being read in is defined as 1 in the informat \$inrace1.
- The LENGTH statement defines the length of sex and race after translation.
Launch the SAS program. Then, edit the INFILE statement so that it reflects the location of your stored back.dat file. Then, run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, note that the variables sex and race are both character variables, as indicated by "Char" appearing under the Type column in the output from the CONTENTS procedure. Also, note that the contents procedure gives no indication that the variables sex and race are formatted in any particular way for output. We'd have to take care of that by using a VALUE statement (as opposed to an INVALUE statement)!
Finally, as a little sidebar, recall that the TITLE statement is a toggle statement. That is, its value remains in effect until it is changed with another TITLE statement. Therefore, the title in the PRINT procedure is the same that is used in the CONTENTS procedure.
8.1.2. The VALUE Statement¶
The INVALUE statement in the FORMAT procedure allows you to create your own customized informats, so that variables can be read in meaningful ways, whereas the VALUE statement allows you to create your own customized formats, so that variables can be displayed in meaningful ways. Customized formats do not alter variable types; they merely tell SAS to print variables according to your customized definitions. For example, providing the numeric variable sex is associated with the format sexfmt that is defined in the following VALUE statement:
VALUE sexfmt 1 = 'Male'
2 = 'Female';
SAS will print “Male” when the variable sex = 1 and “Female” when sex = 2. The variable type of sex remains numeric. Restrictions on the VALUE statement include:
The name of the format for numeric variables (for example, sexfmt) must be a valid SAS name up to 32 characters, not ending in a number.
The name of the format for a character variable must begin with a $ sign, and have no more than 31 additional characters.
When you define the format in the VALUE statement, the format name cannot end in a period.
But when you use the format later, you must follow the name with a period. (Is this confusing or what?)
The maximum length for a format label is 32,767 characters (ehhhhh…?)
Just as is true for the INVALUE statement, the VALUE statement in the FORMAT procedure merely defines a format. In order for the format to take effect, you must associate the variable with the format you’ve defined by using a FORMAT statement in either a DATA step or a PROC step.
Example
The following FORMAT procedure defines how SAS should display numeric variables associated with the two formats sexfmt and racefmt during output:
PROC FORMAT;
value sexfmt 1 = 'Male'
2 = 'Female';
value racefmt 1 = 'Indian'
2 = 'Asian'
3 = 'Black'
4 = 'White';
RUN;
69 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
69 ! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
70
71 PROC FORMAT;
72 value sexfmt 1 = 'Male'
NOTE: Format SEXFMT has been output.
73 2 = 'Female';
74
75 value racefmt 1 = 'Indian'
76 2 = 'Asian'
77 3 = 'Black'
NOTE: Format RACEFMT has been output.
78 4 = 'White';
79 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
80
81 ods html5 (id=saspy_internal) close;ods listing;
82
The translation is restricted to taking place on output, since the VALUE statement is used. As a result of this code, providing the numeric variable sex is later associated with the format sexfmt, whenever SAS goes to print the numeric value 1 for the variable sex, it will instead print the character value 'Male'. Similarly, whenever SAS goes to print the numeric value 2 for the variable sex, it will instead print the character value 'Female'.
Launch and run the SAS program. Again, the only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like the one shown above.
Again, in order to make the definitions for printing sex and race permanently stored beyond your current work session, you'd need to put a "LIBRARY =" option on the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this FORMAT procedure are temporary only.
All we've done so far is define the formats so that they are available for use. Now let's use them!
Example
The following SAS code uses the formats to print in a meaningful way the sex
and race
variables contained in the back data set:
DATA temp2;
set back;
f_race=race;
f_sex=sex;
format f_race racefmt. f_sex sexfmt.;
RUN;
PROC PRINT data=temp2;
title 'Output Dataset: TEMP2';
var subj sex f_sex race f_race;
RUN;
PROC CONTENTS data=temp2;
RUN;
Obs | subj | sex | f_sex | race | f_race |
---|---|---|---|---|---|
1 | 110051 | 2 | Female | 4 | White |
2 | 110088 | 2 | Female | 4 | White |
3 | 210012 | 2 | Female | 4 | White |
4 | 220004 | 2 | Female | 4 | White |
5 | 230006 | 2 | Female | 4 | White |
6 | 310083 | 1 | Male | 2 | Asian |
7 | 410012 | 2 | Female | 4 | White |
8 | 420037 | 2 | Female | 4 | White |
9 | 510027 | 2 | Female | 4 | White |
10 | 520017 | 2 | Female | 4 | White |
The CONTENTS Procedure
Data Set Name | WORK.TEMP2 | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 11 |
Engine | V9 | Indexes | 0 |
Created | 09/25/2020 16:32:50 | Observation Length | 88 |
Last Modified | 09/25/2020 16:32:50 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 743 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
Filename | /tmp/SAS_work0D27000013D7_localhost.localdomain/temp2.sas7bdat |
Release Created | 9.0401M6 |
Host Created | Linux |
Inode Number | 671547 |
Access Permission | rw-r--r-- |
Owner Name | sasdemo |
File Size | 128KB |
File Size (bytes) | 131072 |
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
9 | age | Num | 8 | 4.1 |
3 | b_date | Num | 8 | MMDDYY8. |
6 | country | Num | 8 | |
10 | f_race | Num | 8 | RACEFMT. |
11 | f_sex | Num | 8 | SEXFMT. |
7 | race | Num | 8 | |
8 | relig | Num | 8 | |
4 | sex | Num | 8 | |
5 | state | Num | 8 | |
1 | subj | Num | 8 | |
2 | v_date | Num | 8 | MMDDYY8. |
Well, that's not precisely true! First, in creating the new data set temp2 from the back data set, two additional (numeric) variables are created, f_sex
and f_race
. They are equated, respectively, to the variables sex
and race
associates the f_race variable with the racefmt. format and the f_sex variable with the sexfmt. format. Again, just as is true for SAS formats, you can place the FORMAT statement in either a DATA step or a PROC step. If you place the FORMAT in a PROC step, the format is associated with the variable only for the procedure in which the association is made. If you instead place the FORMAT statement in a DATA step, the format becomes available for all subsequent procedures.
format f_race racefmt. f_sex sexfmt.;
associates the f_race
variable with the racefmt.
format and the f_sex
variable with the sexfmt.
format. Again, just as is true for SAS formats, you can place the FORMAT statement in either a DATA step or a PROC step. If you place the FORMAT in a PROC step, the format is associated with the variable only for the procedure in which the association is made. If you instead place the FORMAT statement in a DATA step, the format becomes available for all subsequent procedures.
Incidentally, note that it is not necessary to create a formatted and unformatted version of the same variables as we did in this example merely for educational purposes. Creating two versions of the same variables merely helps us see the effect the formatting has on the sex
and race
variables.
Launch and run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, observe the difference in the printed output between the formatted and unformatted versions of the variables f_sex
and sex
(and f_race
and race
). Also, note that the CONTENTS procedure indicates that the variables sex
and race
are unformatted, numeric variables (since there is no special format specified), while f_sex
and f_race
are formatted, numeric variables (a special format is specified).
Example
The FORMAT procedure is useful in defining meaningful categories once you've converted one or more (perhaps continuous) variables into one categorical variable. The following SAS code illustrates the technique:
PROC FORMAT;
value age2fmt 1 = 'LT 20'
2 = '20-44'
3 = '45-54'
4 = 'GE 54'
OTHER = 'Missing';
RUN;
DATA temp3;
set back;
if age = . then age2 = .;
else if age lt 20 then age2 = 1;
else if age ge 20 and age lt 45 then age2 = 2;
else if age ge 45 and age lt 54 then age2 = 3;
else if age ge 54 then age2 = 4;
format age2 age2fmt.;
RUN;
PROC FREQ data=temp3;
title 'Age Frequency in TEMP3';
table age2;
RUN;
The FREQ Procedure
age2 | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
20-44 | 5 | 50.00 | 5 | 50.00 |
45-54 | 3 | 30.00 | 8 | 80.00 |
GE 54 | 2 | 20.00 | 10 | 100.00 |
The FORMAT procedure defines the AGE2FMT format, so that ages are groups into five categories: less than 20, 20-44, 45-54, 54 or greater, and missing. The special range keyword OTHER groups all other age values into one single gropu. Since, here, a missing value is the only other possible, values falling in the OTHER category are labeled "Missing".
The data set temp3 is derived from the back dataset. The only difference between these two datasets is that the new variable age2 is created in temp3 using an if-then-else statement to group values of age. The format statement associates the variable age2 with the format age2fmt defined in the FORMAT procedure. Note that the if statement codes for missing ages. If this were not done, missing ages would be incorrectly coded as 1 and output as "LT 20".
The FREQ procedure determines the frequencies of the various levels of the categorical variable defined in the TABLE statement. Since only one variable (age2) is identified in the TABLE statement, SAS oututs one table which contains the number of subjects with age < 20, between 20 and 44, between 45 and 54, greater than or equal to 54, and missing. Note that any categories that have a 0 count (LT 20 and Missing) are not shown in the table.
Launch and run the program and review the original data set as well as the output from the FREQ procedure to convince yourself that the age categories have been appropriately labeled.
Example
Now, as long as we are interested in grouping values of only one variable, rather than doing it as we did in the previous program, we can actually accomplish it a bit more efficiently directly within the FORMAT procedure. For example, the following SAS code uses the FORMAT procedure to define the format agefmt based on the possible values of the variable age:
PROC FORMAT;
value agefmt LOW-<20 = 'LT 20'
20-<45 = '20-44'
45-<54 = '45-54'
54-HIGH = 'GE 54'
OTHER = 'Missing';
RUN;
PROC FREQ data=back;
title 'Age Frequency in BACK';
format age agefmt.;
table age;
RUN;
The FREQ Procedure
age | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
20-44 | 5 | 50.00 | 5 | 50.00 |
45-54 | 3 | 30.00 | 8 | 80.00 |
GE 54 | 2 | 20.00 | 10 | 100.00 |
In defining groups of values right within the FORMAT procedure, note that as illustrated in this program:
- The potential ranges are defined using a dash (-). You can also list a range of values by separating the values with commas:
1,2,3 = 'Low'
- The < symbol means "not including." Therefore, here for example, 20-<45 means all ages between 20 and 45, including 20, but not including 45.
- The special LOW and HIGH ranges allow you to group values without knowing the smallest and largest values, respectively. (The keyword LOW does not include missing numeric values, but if applied to a character format, it does include missing character values.)
The FREQ procedure tallies the number of subjects falling within each of the age groups as defined in the FORMAT procedure. Here, the variable age is associated with the format agefmt using a FORMAT statement right within the FREQ procedure.
Now, launch and run the program and review the original data set as well as the output from the frequency procedure to convince yourself that the age categories have again been appropriately labeled.
8.1.3. Permanent Formats¶
All of the customized informat and format definitions in this lesson thus far have been stored only temporarily. That is, the informats and formats are valid only for the duration of the SAS session in which they are defined. If you wanted to use the informats or formats again in a different SAS program, you would have to create them again using another FORMAT procedure. If you plan to use a customized informat or format repeatedly, you can store it permanently in a “formats catalog” by using the LIBRARY= option in the PROC FORMAT statement. Basically, the LIBRARY= option tells SAS where the formats catalog is (to be) stored. You tell SAS the library (which again you can think of as a directory or location) by using a LIBNAME statement:
LIBNAME libref 'c:\directory\where\formats\stored';
where libref is technically a name of your choosing. Note though that when a user-defined informat or format is called by a DATA or PROC step, SAS first looks in a temporary catalog named work.formats. (Recall that “work” is what SAS always treats as your temporary working library that goes away at the end of your SAS session.) If SAS does not find the format or informat in the temporary catalog, it then by default looks in a permanent catalog called library.formats. So, while, yes, libref is technically a name of your choosing, it behooves you to call it library since that what SAS looks for first. That’s why SAS recommends, but does not require, that you use the word library as the libref when creating permanent formats.
To make this blather a bit more concrete, suppose we have the following LIBNAME statement in our SAS program:
LIBNAME library 'C:\Simon\Stat480WCDEV\08format\sasndata\';
and have a format procedure that starts with:
PROC FORMAT library=library;
Then, upon running the program, SAS creates a permanent catalog containing all of the formats and informats that are defined in the FORMAT procedure and stores it in the folder referenced above, as illustrated here:
A formats catalog, regardless of whether it is temporary (work.formats) or permanent (library.formats), contains one entry for each format or informat defined in a FORMAT procedure. Because library.formats is the reserved name for permanent formats catalogs, you can create only one catalog called formats per SAS library (directory). There are ways around this restriction, but let’s not get into that now. Let’s jump to an example instead.
Example
The following SAS program illustrates a FORMAT procedure that creates a permanent formats catalog in the directory referenced by library, that is, in /folders/myfolders/SAS_Notes/data:
LIBNAME library '/folders/myfolders/SAS_Notes/data';
PROC FORMAT library=library;
value sex2fmt 1 = 'Male'
2 = 'Female';
value race2fmt 3 = 'Black'
4 = 'White'
OTHER = 'Other';
RUN;
DATA temp4;
infile '/folders/myfolders/SAS_Notes/data/back.dat';
input subj 1-6 sex 17 race 19;
format sex sex2fmt. race race2fmt.;
RUN;
PROC CONTENTS data=temp4;
title 'Output Dataset: TEMP4';
RUN;
PROC PRINT data=temp4;
RUN;
The CONTENTS Procedure
Data Set Name | WORK.TEMP4 | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 3 |
Engine | V9 | Indexes | 0 |
Created | 09/25/2020 20:57:10 | Observation Length | 24 |
Last Modified | 09/25/2020 20:57:10 | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | SOLARIS_X86_64, LINUX_X86_64, ALPHA_TRU64, LINUX_IA64 | ||
Encoding | utf-8 Unicode (UTF-8) |
Engine/Host Dependent Information | |
---|---|
Data Set Page Size | 65536 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 2714 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
Filename | /tmp/SAS_work0D27000013D7_localhost.localdomain/temp4.sas7bdat |
Release Created | 9.0401M6 |
Host Created | Linux |
Inode Number | 671549 |
Access Permission | rw-r--r-- |
Owner Name | sasdemo |
File Size | 128KB |
File Size (bytes) | 131072 |
Alphabetic List of Variables and Attributes | ||||
---|---|---|---|---|
# | Variable | Type | Len | Format |
3 | race | Num | 8 | RACE2FMT. |
2 | sex | Num | 8 | SEX2FMT. |
1 | subj | Num | 8 |
Obs | subj | sex | race |
---|---|---|---|
1 | 110051 | Female | White |
2 | 110088 | Female | White |
3 | 210012 | Female | White |
4 | 220004 | Female | White |
5 | 230006 | Female | White |
6 | 310083 | Male | Other |
7 | 410012 | Female | White |
8 | 420037 | Female | White |
9 | 510027 | Female | White |
10 | 520017 | Female | White |
The DATA step creates a temporary data set called temp4 by reading in the variables subj, sex, and race from the raw data file back.dat, and associates the variables sex and race, respectively, with the formats sex2fmt and race2fmt that are defined in the FORMAT procedure. SAS first looks for the occurrence of these two formats in the temporary catalog work.formats and then when it doesn't find them there, it looks for them in the catalog of the permanent format in the /folders/myfolders/SAS_Notes/data directory.
Launch the SAS program, and edit the INFILE statement so it reflects the location of your back.dat file. And, edit the LIBNAME statement so it reflects your desired location for the catalog of the permanent format. Then, run the program and review the output from the CONTENTS and PRINT procedures to convince yourself that the variables sex and race are associated with the permanent formats sex2fmt and race2fmt, not the temporary formats sexfmt and racefmt previously associated with f_sex and f_race. Also, view the directory referenced in your LIBNAME statement to convince yourself that SAS created and stored a permanent formats catalog there.
Just a few more comments on this permanent formats stuff. One of the problems with permanent informats and formats is that once a variable has been associated permanently with an informat or format, SAS must be able to refer to the library to access the formats catalog. As long as the formats catalog exists, and you have permission to the file, you just have to specify the appropriate LIBNAME statement:
LIBNAME library '/folders/myfolders/SAS_Notes/data';
to access the catalog. If for some reason, you do not have access to the formats catalog, SAS will give you an error that looks something like this:
If you specify the NOFMTERR in the OPTIONS statement:
OPTIONS NOFMTERR;
you can use the SAS data sets without getting errors. SAS will just display a note (not a program-halting error!) in the log file:
You will be able to run SAS programs that use the data sets containing the permanent formats. You will just not have access to the formats.
Example
Rather than creating a permanent formats catalog, you can create a SAS program file which contains only a FORMAT procedure with the desired value and invalue statements. Then you need merely include this secondary program file in your main SAS program using the %INCLUDE statement, as illustrated here:
%INCLUDE '/folders/myfolders/SAS_Notes/data/backfmt.sas';
PROC FREQ data=back;
title 'Frequency Count of STATE (statefmt)';
format state statefmt.;
table state / missing;
RUN;
The FREQ Procedure
state | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Missing | 2 | 20.00 | 2 | 20.00 |
Ind | 1 | 10.00 | 3 | 30.00 |
Mass | 1 | 10.00 | 4 | 40.00 |
Mich | 2 | 20.00 | 6 | 60.00 |
Minn | 1 | 10.00 | 7 | 70.00 |
Other | 1 | 10.00 | 8 | 80.00 |
Tenn | 1 | 10.00 | 9 | 90.00 |
Wisc | 1 | 10.00 | 10 | 100.00 |
To make it clear, here's the only thing contained in the backfmt.sas file:
PROC FORMAT;
value statefmt 14 = 'Ind'
21 = 'Mass'
22 = 'Mich'
23 = 'Minn'
42 = 'Tenn'
49 = 'Wisc'
. = 'Missing'
Other = 'Other';
RUN;
Since the FORMAT procedure in the backfmt.sas file does not refer to a permanent library, the format statefmt is stored in the temporary work.formats catalog.
To run this program, first download and save the backfmt.sas (see the data folder in CANVAS) file to a convenient location on your computer. Then, launch the SAS program and edit the %INCLUDE statement so it reflects the location of your backfmt.sas file. Finally, run the program and review the output from the FREQ procedure. Convince yourself that the format statement in the FREQ procedure appropriately associates the state variable with the statefmt format created by the FORMAT procedure in backfmt.sas. You may as well also take note of the effect of the MISSING option in the FREQ procedure. Basically, it tells SAS to include missing values as a countable category.
The technique illustrated in this example is particularly useful when you work in an open environment, in which data sets are shared. Different users may not have access to the format file, or different users may prefer different formats.
8.1.4. Using Codebooks to Help Define Formats¶
It is very common for discrete (categorical) variables to have many, many (hundreds, perhaps even thousands of) possible values. Examples include:
diseases may be coded by an integer
surgical treatments may be coded by an integer
medications may be coded by an integer
An electronic “codebook” is typically used to keep track of the meaning of each of the integer codes. That is, codebooks contain two variables, the code and a text description of the code. For example, disease 1124 could be defined in text as “Rheumatoid Arthritis.”
One would find it extremely tedious and time-consuming to have to type a FORMAT procedure which re-defines the codes and text contained in these codebooks. Instead, one can take advantage of the fact that the codebook is already in an electronic format, such as a database table, an ascii file, or even a SAS data set.
When the codebook is contained in a SAS data set with three required variables:
start: the variable that contains the starting range values (that is, the codes)
label: the variable that contains the text definition
fmtname: the format name
SAS can create the appropriate format using the CNTLIN = option in the PROC FORMAT statement. Let’s take a look at an example.
Example
The following SAS program creates a SAS data set called states from state_cd, which is the codebook for the variable state that is collected on the ICDB background form. Here's what the first ten observations of the state_cd data set look like:
PROC PRINT data=phc6089.state_cd(obs=10);
title 'Codebook for States';
RUN;
Obs | NAME | CODE |
---|---|---|
1 | alabama | 1 |
2 | alaska | 2 |
3 | arizona | 3 |
4 | arkansas | 4 |
5 | california | 5 |
6 | colorado | 6 |
7 | connecticut | 7 |
8 | delaware | 8 |
9 | florida | 9 |
10 | georgia | 10 |
The data set states is then used in the FORMAT procedure to define the format for the variable state in the back data set:
DATA states;
set phc6089.state_cd (rename = (code = start name=label));
fmtname = 'stat2fmt';
RUN;
PROC FORMAT cntlin=states;
RUN;
PROC FREQ data=back;
title 'Freqency Count of STATE (stat2fmt)';
format state stat2fmt.;
table state;
RUN;
The FREQ Procedure
state | Frequency | Percent | Cumulative Frequency |
Cumulative Percent |
---|---|---|---|---|
Frequency Missing = 2 | ||||
indiana | 1 | 12.50 | 1 | 12.50 |
massachusetts | 1 | 12.50 | 2 | 25.00 |
michigan | 2 | 25.00 | 4 | 50.00 |
minnesota | 1 | 12.50 | 5 | 62.50 |
pennsylvania | 1 | 12.50 | 6 | 75.00 |
tennessee | 1 | 12.50 | 7 | 87.50 |
wisconsin | 1 | 12.50 | 8 | 100.00 |
Before running this program, you'll have to download the codebook state_cd (see the data folder on the course website). Save it to the location on your computer that you referenced in the earlier LIBNAME statement by the libref phc6089. Then, go ahead and launch and run the program.
As you can see from the output, the PRINT procedure merely prints the (first 10 rows of the) phc6089.state_cd codebook for your review. You should notice that the variable names in state_cd do not meet SAS requirements for codebooks. Therefore, the DATA step that creates the data set states merely renames the code and name variables in phc6089.state_cd so that they meet SAS requirements. The RENAME= option on the SET statement is what is used to change the code variable to start and the name variable to label. The general syntax of the RENAME= option on the SET statement is:
set dsname (rename = (oldvr1 = newvr1 oldvr2 = newvr2 ...));
An assignment statement is then used to assign the value stat2fmt to the variable fmtname for each observation (that is, code) that appears in the phc6089.state_cd data set.
Then, the FORMAT procedure with the CNTLIN = states option tells SAS to create the format stat2fmt based on the contents of the data set states. Finally, the FREQ procedure illustrates the use of the stat2fmt after it was created in this manner. SAS merely counts and reports the number of subjects coming from each of the states. Note that since we didn't include the MISSING option on the TABLE statement, SAS reports the number of missing values after the table, rather than as a row of the table.
8.1.5. The FMTLIB Option¶
You might have taken note that the FORMAT procedure by itself does not generate any output. Indeed, the FORMAT procedure prints output only when you specify the FMTLIB option in the PROC FORMAT statement. The FMTLIB option of the FORMAT procedure tells SAS to display a list of all the formats and/or informats in your catalog, along with desciptions of their values. The FMTLIB option can be particularly helpful when you are working with a large catalog of formats, and have forgotten the exact spelling of a specific format name or its range of values.
Example
The following code uses the FORMAT procedure's FMTLIB option to request that SAS display information about three formats appearing in the work.format
catalog:
PROC FORMAT FMTLIB;
title 'Selected Formats from WORK.FORMAT Catalog';
select racefmt;
RUN;
---------------------------------------------------------------------------- | FORMAT NAME: RACEFMT LENGTH: 6 NUMBER OF VALUES: 4 | | MIN LENGTH: 1 MAX LENGTH: 40 DEFAULT LENGTH: 6 FUZZ: STD | |--------------------------------------------------------------------------| |START |END |LABEL (VER. V7|V8 25SEP2020:16:30:29)| |----------------+----------------+----------------------------------------| | 1| 1|Indian | | 2| 2|Asian | | 3| 3|Black | | 4| 4|White | ----------------------------------------------------------------------------
Launch and run the SAS program and review the output. Since the FORMAT procedure here does not refer to a permanent library, the contents of the temporary work.formats
catalog are printed. The SELECT statement tells SAS to print information only on a select few formats rather than on the entire catalog. (See SAS Help for more details on the SELECT statement and its sister EXCLUDE statement.)
Although not used in this example, the PAGE option may be used additionally to tell SAS to print the information about each format and informat in the catalog on a separate page.
PROC FORMAT FMTLIB PAGE;
RUN;
The FORMAT procedure's PAGE option is meaningless unless the FMTLIB option is also invoked.
8.2. Date and Time Processing¶
SAS stores dates as single, unique numbers, so that they can be used in your programs like any other numeric value. Specifically, SAS stores dates as numeric values equal to the number of days from January 1, 1960. That is, dates prior to January 1, 1960 are stored as unique negative integers, and dates after January 1, 1960 are stored as unique positive integers. So, for example, SAS stores:
a 0 for January 1, 1960
a 1 for January 2, 1960
a 2 for January 3, 1960
and so on …
And, SAS stores:
a -1 for December 31, 1959
a -2 for December 30, 1959
a -3 for December 29, 1959
and so on …
No matter what method is used in creating a SAS date, SAS always converts the date to an integer as just defined.
8.2.1. Date Informats and Formats¶
In order to read variables that are dates, we need to tell SAS what form the date takes. For example, is the date in the form Dec 1, 2005? Or is it 12/01/05? Or 01 December 2005? The form that a date takes on input is known as a date informat. There seem to be as many SAS date informats as there are ways that you could imagine writing a date. Well, okay, maybe not quite that many. We’ll take a look at several of the informats that are available in SAS later in this lesson. For now, we’ll just refresh our memory of how to write the formatted style input statement that is necessary to read in dates.
Example
The following SAS program reads five observations into a SAS data set called diet. Two of the variables — weight date (wt_date) and birth date (b_date) — are in mm/dd/yy format, and therefore SAS is told to read the dates using the mmddyy8. informat:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
TITLE 'The unformatted diet data set';
RUN;
Obs | subj | l_name | weight | wt_date | b_date |
---|---|---|---|---|---|
1 | 1024 | Smith | 125 | 16771 | 0 |
2 | 1167 | White | 140 | 16771 | -365 |
3 | 1168 | Jones | 190 | 16772 | 166 |
4 | 1201 | Arnold | 190 | 16770 | 365 |
5 | 1302 | Ho | 115 | 16802 | -565 |
First, note that the mmddyy8. informat must immediately follow the date's variable name. Here, it immediately follows wt_date, and then again follows b_date. Incidentally, the 8 in mmddyy8. defines, in general, the width of the informat. It tells SAS that the dates to be read into SAS contain as many as 8 positions. Here, two of the positions are taken up by forward slashes (/). You could alternatively use hypens (-) or blank spaces between the mm and dd and yy. Also note that the period is a very important part of the informat name. Without it, SAS may attempt to interpret the informat as a variable name instead.
Then, launch and run the SAS program, and review the resulting output to familiarize yourself with the contents of the diet data set. Note, in particular, the numeric values that are stored for the wt_date and b_date variables. As expected, the 01/01/60 birth date is stored as a 0, the 01/01/59 birthdate is stored as -365, and the 12/31/60 birthdate is stored as +365. Well, geez, I guess the other thing that the output illustrates is that it is not enough just to tell SAS what informat to use to read in a date's value, but you also have to tell SAS what format to use to display a date's value. If you don't, as you see here, the dates that are displayed are not particularly user-friendly!
As the preceding example illustrates, we have to tell SAS in what form we would like our dates displayed. The form that a date takes in output is known as a date format. Do we want the date displayed in the form Dec 1, 2005? Or 12/01/05? Or 01 December 2005? Again, there seem to be as many SAS date formats as there are ways that you could imagine writing a date. To tell SAS in which form we want our dates displayed, we use a FORMAT statement.
Example
The following SAS program is identical to the previous program, except a FORMAT statement has been added to tell SAS to display the wt_date and b_date variables in date7. format:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
format wt_date b_date date7.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
title 'The formatted diet data set';
RUN;
Obs | subj | l_name | weight | wt_date | b_date |
---|---|---|---|---|---|
1 | 1024 | Smith | 125 | 01DEC05 | 01JAN60 |
2 | 1167 | White | 140 | 01DEC05 | 01JAN59 |
3 | 1168 | Jones | 190 | 02DEC05 | 15JUN60 |
4 | 1201 | Arnold | 190 | 30NOV05 | 31DEC60 |
5 | 1302 | Ho | 115 | 01JAN06 | 15JUN58 |
First, take note of the FORMAT statement in which the selected format date7. follows the two variables — wt_date and b_date — whose values we want displayed as ddMonyy. Then, launch and run the SAS program, and review the resulting output to convince yourself of the effect of the FORMAT statement.
The best thing about SAS dates is that, because SAS date values are numeric values, you can easily sort them, subtract them, and add them. You can also compare dates. Or, you can use them in many of the available numeric functions.
Example
The following SAS program illustrates how you can treat date variables as any other numeric variable, and therefore can use the dates in numeric calculations. Assuming that individuals in the diet data set need to be weighed every 14 days, a new variable nxt_date, the anticipated date of the individual's next visit, is determined by merely adding 14 to the individual's current weight date (wt_date). Then, a crude estimate of each individual's age is also calculated by subtracting b_date from wt_date and dividing the resulting number of days by 365.25 to get an approximate age in years. And, the MEAN function is used to calculate avg_date, the average of each individual's birth and weight dates:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
nxt_date = wt_date + 14;
age_wt = (wt_date - b_date)/365.25;
avg_date = MEAN(wt_date, b_date);
format wt_date b_date nxt_date avg_date date7.
age_wt 4.1;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
title 'The diet data set with three new variables';
RUN;
Obs | subj | l_name | weight | wt_date | b_date | nxt_date | age_wt | avg_date |
---|---|---|---|---|---|---|---|---|
1 | 1024 | Smith | 125 | 01DEC05 | 01JAN60 | 15DEC05 | 45.9 | 16DEC82 |
2 | 1167 | White | 140 | 01DEC05 | 01JAN59 | 15DEC05 | 46.9 | 17JUN82 |
3 | 1168 | Jones | 190 | 02DEC05 | 15JUN60 | 16DEC05 | 45.5 | 10MAR83 |
4 | 1201 | Arnold | 190 | 30NOV05 | 31DEC60 | 14DEC05 | 44.9 | 16JUN83 |
5 | 1302 | Ho | 115 | 01JAN06 | 15JUN58 | 15JAN06 | 47.5 | 24MAR82 |
First, review the code to see how the three new variables — nxt_date, age_wt and avg_date — are calculated using standard numeric expressions. You should also acknowledge that the calculation of avg_date is just a desperate attempt by a desperate instructor to illustrate the use of dates in a standard numeric function, and is otherwise probably fairly useless. Then, launch and run the SAS program, and review the resulting output to convince yourself that the results of the calculations seem reasonable.
Example
The following SAS program illustrates again how you can treat date variables as any other numeric variable, and therefore can sort dates. The diet data set is sorted by nxt_date in ascending order, so that the individuals whose next weigh-in date is closest in time appear first:
PROC SORT data = diet out = sorteddiet;
by nxt_date;
RUN;
PROC PRINT data = sorteddiet;
TITLE 'The diet data set sorted by nxt_date';
RUN;
Obs | subj | l_name | weight | wt_date | b_date | nxt_date | age_wt | avg_date |
---|---|---|---|---|---|---|---|---|
1 | 1201 | Arnold | 190 | 30NOV05 | 31DEC60 | 14DEC05 | 44.9 | 16JUN83 |
2 | 1024 | Smith | 125 | 01DEC05 | 01JAN60 | 15DEC05 | 45.9 | 16DEC82 |
3 | 1167 | White | 140 | 01DEC05 | 01JAN59 | 15DEC05 | 46.9 | 17JUN82 |
4 | 1168 | Jones | 190 | 02DEC05 | 15JUN60 | 16DEC05 | 45.5 | 10MAR83 |
5 | 1302 | Ho | 115 | 01JAN06 | 15JUN58 | 15JAN06 | 47.5 | 24MAR82 |
First review the code, and then launch and run the SAS program. Then, review the resulting output to convince yourself that the variable nxt_date is sorted as indeed claimed.
Again, because SAS date values are numeric values, you can easily compare two or more dates. The comparisons are made just as the comparisons between any two numbers would take place. For example, because the date 01/03/60 is stored as a 2 in SAS, it is considered smaller than the date 01/10/60, which is stored as a 9 in SAS.
Example
The following SAS program illustrates how to compare the values of a date variable, not to the values of some other date variable, but rather to a date constant. Specifically, the WHERE= option that appears on the DATA statement tells SAS to output to the diet data set only those individuals whose b_date is before January 1, 1960:
DATA diet (where = (b_date < '01jan1960'd));
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
format wt_date b_date date9.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
title 'Birthdays in the diet data set before 01/01/1960';
RUN;
Obs | subj | l_name | weight | wt_date | b_date |
---|---|---|---|---|---|
1 | 1167 | White | 140 | 01DEC2005 | 01JAN1959 |
2 | 1302 | Ho | 115 | 01JAN2006 | 15JUN1958 |
First, note the form of the SAS date constant:
'01jan1960'd
used in the WHERE= option. In general, a SAS date constant takes the form 'ddMONyyyy'd where dd denotes the day of the month (0, ..., 31), MON denotes the first three letters of the month, and yyyy denotes the four-digit year. The letter d that follows the date in single quotes tells SAS to treat the date string like a constant. Note that regardless of how you have informatted or formatted your SAS dates, the SAS date constant always takes the above form.
Now, launch and run the SAS program. Then, review the resulting output to convince yourself that only those individuals whose birth date is before January 1, 1960 are included in the output diet data set. You might also want to note the difference between the date7. and date9. format. Previously, we saw that when you use the date7. format, your dates are displayed in ddMonyy format. Here, you can see that when you use the date9. format, your dates are displayed in ddMonyyyy format. (Incidentally, I think it is a good practice to use four-digit years wherever possible to avoid any ambiguity.) We'll take a look at some of the other informats and formats available later in this lesson. Now, we'll go take a look at some of the available functions that work specifically with SAS dates.
Throughout this lesson so far, we have used the mmddyy8. informat to read in SAS dates. And, we have used the date7. and date9. formats to display SAS dates. In this section, we’ll just take a look at a few quick examples to illustrate some of the other informats and formats available in SAS.
Example
The following SAS program reads in three dates (date1, date2, and date3) using an mmddyy informat. Then, the dates are printed using a ddmmyy format:
DATA inputdates1;
INPUT @6 date1 mmddyy6. @13 date2 mmddyy8. @22 date3 mmddyy10.;
FORMAT date1 ddmmyy10. date2 ddmmyyb10. date3 ddmmyyc10.;
DATALINES;
041008 04-10-08 04 10 2008
;
RUN;
PROC PRINT data = inputdates1;
TITLE 'The mmddyy informat and the ddmmyy format';
RUN;
Obs | date1 | date2 | date3 |
---|---|---|---|
1 | 10/04/2008 | 10 04 2008 | 10:04:2008 |
First, review the INPUT statement and the corresponding forms of the April 10, 2008 date in the DATALINES statement. Note, in particular, that the width of the mmddyy informat (6, 8, or 10) tells SAS what form of the date it should expect. Don't worry — SAS will let you know if you misspecify the width of the format! Also, note that the way that we format dates can be completely independent of the way that they are informatted. Here, the dates are read in using the mmddyy informat and are displayed in the rearranged ddmmyy format. Well, let's be a little more specific here about that ddmmyy format. The "b" that appears in the format for the date2 variable tells SAS to display blank spaces between the month, day and year. The "c" that appears in the format for the date3 variable tells SAS to display colons between the month, day and year. If nothing appears (or alternatively an "s") in a ddmmyy format, as it does here for the date1 variable, SAS will display forward slashes between the month, day and year.
When you are satisfied you understand the use of the mmddyy informat and the ddmmyy format, launch and run the SAS program. Review the output to convince yourself that the program does as claimed.
Example
The following SAS program reads in three dates (date1, date2, and date3) using a ddmmyy informat. Then, the dates are printed using a mmddyy format:
DATA inputdates2;
INPUT @6 date1 ddmmyy6. @13 date2 ddmmyy8. @22 date3 ddmmyy10.;
FORMAT date1 mmddyyd10. date2 mmddyyn8. date3 mmddyyp10.;
DATALINES;
100408 10-04-08 10 04 2008
;
RUN;
PROC PRINT data = inputdates2;
TITLE 'The ddmmyy informat and the mmddyy format';
RUN;
Obs | date1 | date2 | date3 |
---|---|---|---|
1 | 04-10-2008 | 04102008 | 04.10.2008 |
First, review the INPUT statement and the corresponding forms of the April 10, 2008 date in the DATALINES statement. Again, the width of the ddmmyy informat (6, 8, or 10) tells SAS what form of the date it should expect. The "d" that appears in the format for the date1 variable tells SAS to display dashes between the month, day and year. The "
When you are satisfied you understand the use of the ddmmyy informat and the mmddyy format, launch and run the SAS program. Review the output to convince yourself that the program does as claimed.
Example
The following SAS program reads in three dates (date1, date2, and date3) using a date informat. Then, the dates are printed using weekdate, worddate, and worddatx formats, respectively:
DATA inputdates3;
INPUT @6 date1 date7. @14 date2 date9. @24 date3 date11.;
FORMAT date1 weekdate25.
date2 worddate19.
date3 worddatx19.;
DATALINES;
10Apr08 10Apr2008 10-Apr-2008
;
RUN;
PROC PRINT data = inputdates3;
TITLE 'The date7 informat and the weekdate and worddate formats';
RUN;
Obs | date1 | date2 | date3 |
---|---|---|---|
1 | Thursday, Apr 10, 2008 | April 10, 2008 | 10 April 2008 |
First, review the INPUT statement and the corresponding forms of the April 10, 2008 date in the DATALINES statement. Note, in particular, that the width of the date informat (7, 9, or 11) tells SAS what form of the date it should expect. Again — SAS will let you know if you misspecify the width of the format!
Then, launch and run the SAS program, and review the output so you can appreciate how dates formatted using the weekdate, worddate and worddatx formats are displayed. If the widths that you specify for the these formats are too small, SAS will attempt to abbreviate the date for you. You might want to change the width of the weekdate format to, say, 20 to see this for yourself.
8.2.2. SAS Date Functions¶
The date functions that are available in SAS can be used to:
create date values
take apart date values
massage date values (what??!)
calculate intervals
For no particular reason, we’ll look at them in that order.
8.2.2.1. Using functions to create date values¶
The functions that can be used to create date values include:
date( ) returns today’s date as a SAS date value
today( ) returns today’s date as a SAS date value
mdy(m,d,y) returns a SAS date value from the given month (m), day (d), and year (y) values
datejul(juldate) converts a Julian date (juldate) to a SAS date value
yyq(y, q) returns a SAS date value from the given year (y) and quarter (q) 1, 2, 3, or 4
The date( ) and today( ) functions are equivalent. That is, they both return the current date as defined as the date on which the SAS program is executed. You don’t need to put anything in between the parentheses for those two functions.
A Julian date is defined in SAS as a date in the form yyddd or yyyyddd, where yy or yyyy is a two-digit or four-digit integer that represents the year and ddd is the number of the day of the year. The value of ddd must be between 001 and 365 (or 366 for a leap year). So, for example, the SAS Julian date for January 21, 2008 is 2008021.
Let’s look at an example in which these five functions are used.
Example
The following SAS program creates a temporary SAS data set called createdates that contains six date variables. The variables current1 and current2 are assigned the current date using the date( )and today( ) functions. The variable current3 is assigned the 95th day of the 2008 year using the datejul( ) function. The variables current4 and current5 are assigned the date April 4th, 2008 using the mdy( ) function. And, the variable current6 is assigned the date April 1st, 2008 using the yyq( ) function.
DATA createdates;
current1= date();
current2 = today();
current3 = datejul(2008095);
mon = 4; day = 4; year = 2008;
current4 = mdy(mon, day, year);
current5 = current4;
current6 = yyq(2008, 2);
format current1 current2 current3 current5 current6 date9.;
RUN;
PROC PRINT data=createdates;
title 'The createdates data set';
var current1 current2 current3 current4 current5 current6;
RUN;
Obs | current1 | current2 | current3 | current4 | current5 | current6 |
---|---|---|---|---|---|---|
1 | 26SEP2020 | 26SEP2020 | 04APR2008 | 17626 | 04APR2008 | 01APR2008 |
First, review the program to make sure that you understand how to use each of the five functions. Note, for example, that to tell SAS to determine the SAS date value of the 95th day of the 2008 year, you have to input 2008095 into the datejul( ) function. If you instead input 200895 into the datejul( ) function, SAS reports that you've provided an invalid argument to the datejul( ) function and therefore sets current3 to missing. Also, note that current5 is just the formatted version of the unformatted current4 variable. When you are satisfied that you understand the five functions, launch and run the SAS program. Review the output to convince yourself that createdates does indeed contain the six date variables as described.
8.2.2.2. Using functions to take apart date values¶
The functions that can be used to take apart date values include:
day(date) returns the day of the month from a SAS date value (date)
month(date) returns the month from a SAS date value (date)
year(date) returns the year from a SAS date value (date)
The date can be specified either as a variable name or as a SAS date constant. Otherwise, fairly self-explanatory! Let’s take a look at an example.
Example
The following SAS program uses the day( ), month( ) and year( ) functions to extract the month, day and year from the wt_date variable:
DATA takeapart;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
wt_mo = month(wt_date);
wt_day = day(wt_date);
wt_yr = year(wt_date);
format wt_date b_date date9.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=takeapart;
title 'The dissected weight dates';
var wt_date wt_mo wt_day wt_yr;
RUN;
Obs | wt_date | wt_mo | wt_day | wt_yr |
---|---|---|---|---|
1 | 01DEC2005 | 12 | 1 | 2005 |
2 | 01DEC2005 | 12 | 1 | 2005 |
3 | 02DEC2005 | 12 | 2 | 2005 |
4 | 30NOV2005 | 11 | 30 | 2005 |
5 | 01JAN2006 | 1 | 1 | 2006 |
First, review the program to make sure that you understand how to use each of the three functions. Then, launch and run the SAS program, and review the output to convince yourself that the program does as claimed.
8.2.2.3. Using functions to massage date values¶
Okay, here’s that section with the intriguing title. The functions that can be used to massage date values include:
juldate(date) returns the Julian date in yyddd format from a SAS date value (date)
juldate7(date) returns the Julian date in yyyyddd format from a SAS date value (date)
qtr(date) returns the quarter of the year from a SAS date value (date) (1 = first three months, 2 = second three months, 3 = third three months, or 4 = last three months)
weekday(date) returns the number of the day of the week from a date value (date) (1 = Sunday, 2 = Monday, …, and 7 = Saturday)
Again, the date can be specified either as a variable name or as a SAS date constant. And, a Julian date in SAS is defined as a date in the form yyddd or yyyyddd, where yy or yyyy is a two-digit or four-digit integer that represents the year and ddd is the number of the day of the year (between 001 and 365 (or 366 for a leap year)).
Let’s look at an example in which these four functions are used.
Example
The following SAS program contains four assignment statements that "massage" the wt_date variable. The variable wt_jul1 is assigned the SAS Julian date in yyddd format. The variable wt_jul2 is assigned the SAS Julian date in yyyyddd format. The variable wt_qtr is assigned the quarter in which the wt_date occurs, and the variable wt_day is assigned the weekday on which the wt_date occurs:
DATA massaged;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
wt_jul1 = juldate(wt_date);
wt_jul2 = juldate7(wt_date);
wt_qtr = qtr(wt_date);
wt_day = weekday(wt_date);
format wt_date b_date date9.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data = massaged;
title 'The massaged data set';
var wt_date wt_jul1 wt_jul2 wt_qtr wt_day;
RUN;
Obs | wt_date | wt_jul1 | wt_jul2 | wt_qtr | wt_day |
---|---|---|---|---|---|
1 | 01DEC2005 | 5335 | 2005335 | 4 | 5 |
2 | 01DEC2005 | 5335 | 2005335 | 4 | 5 |
3 | 02DEC2005 | 5336 | 2005336 | 4 | 6 |
4 | 30NOV2005 | 5334 | 2005334 | 4 | 4 |
5 | 01JAN2006 | 6001 | 2006001 | 1 | 1 |
First, review the program to make sure that you understand how to use each of the four functions. Then, launch and run the SAS program, and review the output to convince yourself that the program does as claimed.
8.2.2.4. Using functions to calculate intervals¶
The functions that can be used to calculate intervals include:
yrdif(startdate, enddate, ‘method’) returns the difference in years between two SAS date values (startdate, enddate) using one of four methods (‘method’)
datdif(startdate, enddate, ‘method’) returns the difference in days between two SAS date values (startdate, enddate) using one of four methods (‘method’)
intck(‘interval’, fromdate, todate) returns the number of time intervals (‘interval’) that occur between two dates (fromdate, todate)
intnx(‘interval’, date, increment) applies multiples (increment) of a given interval (‘interval’) to a date value (date) and returns the resulting value, and hence can be used to identify past or future days, weeks, months, and so on
We’ll take a look at five examples here. The first one uses the yrdif( ) and datdif( ) functions, the next three use the intck( ) function, and the last one uses the intnx( ) function.
Example
The following SAS program uses the yrdif( ) function to calculate the difference between the subject's birth date (b_date) and first weight date (wt_date1) in order to determine the subject's age. And, the datdif( ) function is used to calculate days, the difference between the subject's first (wt_date1) and second (wt_date2) weight dates:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date1 mmddyy8. @43 wt_date2 mmddyy8. @52
b_date mmddyy8.;
age = yrdif(b_date, wt_date1, 'act/act');
days = datdif(wt_date1, wt_date2, 'act/act');
format wt_date1 wt_date2 b_date date9. age 4.1;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 03/04/06 01/01/60
1167 Maryann White 1 68 140 12/01/05 03/07/06 01/01/59
1168 Thomas Jones 2 190 12/2/05 3/30/06 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 2/27/06 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 4/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
TITLE "The calculation of subject's age";
var subj b_date wt_date1 age;
RUN;
PROC PRINT data=diet;
TITLE 'The calculation of days between weighings';
var subj wt_date1 wt_date2 days;
RUN;
Obs | subj | b_date | wt_date1 | age |
---|---|---|---|---|
1 | 1024 | 01JAN1960 | 01DEC2005 | 45.9 |
2 | 1167 | 01JAN1959 | 01DEC2005 | 46.9 |
3 | 1168 | 15JUN1960 | 02DEC2005 | 45.5 |
4 | 1201 | 31DEC1960 | 30NOV2005 | 44.9 |
5 | 1302 | 15JUN1958 | 01JAN2006 | 47.5 |
Obs | subj | wt_date1 | wt_date2 | days |
---|---|---|---|---|
1 | 1024 | 01DEC2005 | 04MAR2006 | 93 |
2 | 1167 | 01DEC2005 | 07MAR2006 | 96 |
3 | 1168 | 02DEC2005 | 30MAR2006 | 118 |
4 | 1201 | 30NOV2005 | 27FEB2006 | 89 |
5 | 1302 | 01JAN2006 | 01APR2006 | 90 |
Review the assignment statement that is used to calculate the values for the variable age. The first and second arguments of the yrdif( ) function tell SAS, respectively, the start and end date of the desired interval. Here, the start date is b_date and the end date is wt_date1. The third argument of the yrdif( ) function, which must be enclosed in single quotes, tells SAS how to calculate the difference. Here, 'act/act' tells SAS to calculate the difference using the actual number of years between the two dates. The four possible methods in calculating the number of years between two dates using the yrdif( ) function are:
- 'act/act' uses the actual number of days and years between two dates
- '30/360' specifies a 30-day month and a 360-day year
- 'act/360' uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 360)
- 'act/365' uses the actual number of days between dates in calculating the number of years (calculated by the number of days divided by 365)
The 'act/act' method is the method that most people would consider to be the most accurate. The other methods are methods that are sometimes used by accountants.
Now, review the assignment statement that is used to calculate the values for the variable days. The first and second arguments of the datdif( ) function tell SAS, respectively, the start and end date of the desired interval. Here, the start date is wt_date1 and the end date is wt_date2. The third argument of the datdif( ) function, which must be enclosed in single quotes, tells SAS how to calculate the difference. Here, 'act/act' tells SAS to calculate the difference using the actual number of days between the two dates. The two possible methods in calculating the number of days between two dates using the datdif( ) function are:
- 'act/act' uses the actual number of days and years between two dates
- '30/360' specifies a 30-day month and a 360-day year
Again, the 'act/act' method is the method that most people would consider to be the most accurate. The other method is a method that is sometimes used by accountants.
When you are satisfied that you understand the two functions, launch and run the SAS program. Review the output to convince yourself that age and days are indeed calculated as described.
Example
Recall that the intck( ) function returns the number of time intervals, such as the number of days or years, that occur between two dates. The following SAS program is identical to the previous program, except here the subjects' ages at their first weigh-in are determined using both the yrdif( ) and intck( ) functions to get age_yrdif and age_intchk, respectively. Similarly, the numbers of days between the subjects' two weigh-ins are determined using both the datdif( ) and intck( ) functions to get days_datdif and days_intchk, respectively:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date1 mmddyy8. @43 wt_date2 mmddyy8. @52
b_date mmddyy8.;
age_yrdif = yrdif(b_date, wt_date1, 'act/act');
age_intck = intck('year', b_date, wt_date1);
days_datdif = datdif(wt_date1, wt_date2, 'act/act');
days_intck = intck('day', wt_date1, wt_date2);
format wt_date1 wt_date2 b_date date9. age 4.1;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 03/04/06 01/01/60
1167 Maryann White 1 68 140 12/01/05 03/07/06 01/01/59
1168 Thomas Jones 2 190 12/2/05 3/30/06 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 2/27/06 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 4/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
TITLE "The calculation of subject's age";
var subj b_date wt_date1 age_yrdif age_intck;
RUN;
PROC PRINT data=diet;
TITLE 'The calculation of days between weighings';
var subj wt_date1 wt_date2 days_datdif days_intck;
RUN;
Obs | subj | b_date | wt_date1 | age_yrdif | age_intck |
---|---|---|---|---|---|
1 | 1024 | 01JAN1960 | 01DEC2005 | 45.9151 | 45 |
2 | 1167 | 01JAN1959 | 01DEC2005 | 46.9151 | 46 |
3 | 1168 | 15JUN1960 | 02DEC2005 | 45.4643 | 45 |
4 | 1201 | 31DEC1960 | 30NOV2005 | 44.9151 | 45 |
5 | 1302 | 15JUN1958 | 01JAN2006 | 47.5479 | 48 |
Obs | subj | wt_date1 | wt_date2 | days_datdif | days_intck |
---|---|---|---|---|---|
1 | 1024 | 01DEC2005 | 04MAR2006 | 93 | 93 |
2 | 1167 | 01DEC2005 | 07MAR2006 | 96 | 96 |
3 | 1168 | 02DEC2005 | 30MAR2006 | 118 | 118 |
4 | 1201 | 30NOV2005 | 27FEB2006 | 89 | 89 |
5 | 1302 | 01JAN2006 | 01APR2006 | 90 | 90 |
Review the assignment statement that is used to calculate the values for the variable age_intck. The first argument of the intck( ) function, which must appear in single quotes, tells SAS what time interval you are interested in counting. Although there are other intervals available, the most commonly used intervals include 'day', 'weekday', 'week', 'month', 'qtr', and 'year'. The second and third arguments of the intck( ) function tell SAS, respectively, the start and end date of the desired interval. Here, the start date is b_date, the end date is wt_date1, and the time interval is 'year'.
Now, review the assignment statement that is used to calculate the values for the variable days_intck. To calculate days, the start date is wt_date1, the end date is wt_date2, and the time interval is 'day'.
Theoretically, we should expect the yrdif( ) and intck( ) functions to get the same answers for age, and the datdif( ) and intck( ) functions to get the same answers for days. Launch and run the SAS program, and review the resulting output. Same answers or not? Hmmm ... you should see that the values for days_datdif and days_intck are the same, while the (rounded) values for age_yrdif and age_intck differ.
Why is that the case? It has to do with the fact that the intck( ) function counts intervals from fixed interval beginnings, not in multiples of an interval unit from the startdate value. Partial intervals are not counted. For example, 'week' intervals are counted by Sundays rather than seven-day multiples from the startdate value. 'Month' intervals are counted by the first day of each month, and 'year' intervals are counted from January 1st, not in 365-day multiples from the startdate value.
The next two examples are intended to help you understand how the intck( ) function counts intervals.
Example
The following SAS program uses the intck( ) function and SAS date constants to determine the number of days, weeks, months, and years between December 31, 2006 and January 1, 2007. It also calculates the number of years (years2) between January 1, 2007 and December 31, 2007, and the number of years (years3) between January 1, 2007 and January 1, 2008:
DATA timeintervals1;
days = intck('day', '31dec2006'd,'01jan2007'd);
weeks = intck('week', '31dec2006'd,'01jan2007'd);
months = intck('month', '31dec2006'd,'01jan2007'd);
years = intck('year', '31dec2006'd,'01jan2007'd);
years2 = intck('year', '01jan2007'd, '31dec2007'd);
years3 = intck('year', '01jan2007'd, '01jan2008'd);
RUN;
PROC PRINT data = timeintervals1;
TITLE 'Time intervals as calculated by intck function';
RUN;
Obs | days | weeks | months | years | years2 | years3 |
---|---|---|---|---|---|---|
1 | 1 | 0 | 1 | 1 | 0 | 1 |
First, review the program to make sure that you understand what it is doing. Then, launch and run the SAS program, and review the output. Are you surprised by any of the results? Let me venture to suggest that you find the results for days, weeks, and years3 to make sense, and the results for months, years, and years2 to be a little odd. Let's focus on the three odd results. In spite of only one day passing between 12/31/2006 and 01/01/2007, SAS assigns the variable months the value 1 because between 12/31/2006 and 01/01/2007, exactly one first day of the month is crossed (which happens to be January 1st). Similarly, in spite of only one day passing between 12/31/2006 and 01/01/2007, SAS assigns the variable years the value 1 because between 12/31/2006 and 01/01/2007, exactly one January 1st is crossed. And, in spite of 364 days passing between 01/01/2007 and 12/31/2007, SAS assigns the variable years2 the value 0 because no January 1st is crossed. Now, even though the results for days, weeks, and years3 might make intuitive sense to you, you should still make sure you understand why SAS assigns the values it does here based on how the intck( ) function works.
Example
In an attempt to explore the intck( ) function further, the following SAS program uses the intck( ) function and SAS date constants to determine the number of days, weeks, weekdays, months, qtrs, and years between March 15, 2007 and March 15, 2008:
DATA timeintervals2;
days = intck('day', '15mar2007'd,'15mar2008'd);
weeks = intck('week', '15mar2007'd,'15mar2008'd);
weekdays = intck('weekday', '15mar2007'd,'15mar2008'd);
months = intck('month', '15mar2007'd,'15mar2008'd);
qtrs = intck('qtr', '15mar2007'd,'15mar2008'd);
years = intck('year', '15mar2007'd,'15mar2008'd);
RUN;
PROC PRINT data = timeintervals2;
TITLE 'Time intervals as calculated by intck function';
RUN;
Obs | days | weeks | weekdays | months | qtrs | years |
---|---|---|---|---|---|---|
1 | 366 | 52 | 261 | 12 | 4 | 1 |
The main purpose of this program is to illustrate some of the intervals most commonly used in the intck( ) function. First, review the program to make sure that you understand what it is doing. Then, launch and run the SAS program, and review the output. Are you surprised by any of the results? In each case, I think you'll find the results to make intuitive sense.
Example
Now, suppose that each subject appearing in the diet data set needs to be weighed again in three months. The following SAS program uses the subject's previous weight date (wt_date) and various versions of the intnx( ) function to determine various versions of each subject's next weight date:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.;
nxdate_b1 = intnx('month', wt_date, 3);
nxdate_b2 = intnx('month', wt_date, 3, 'beginning');
nxdate_m = intnx('month', wt_date, 3, 'middle');
nxdate_e = intnx('month', wt_date, 3, 'end');
nxdate_s = intnx('month', wt_date, 3, 'sameday');
format wt_date b_date nxdate_b1 nxdate_b2
nxdate_m nxdate_e nxdate_s date9.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60
1167 Maryann White 1 68 140 12/01/05 01/01/59
1168 Thomas Jones 2 190 12/2/05 06/15/60
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60
1302 Felicia Ho 1 63 115 1/1/06 06/15/58
;
RUN;
PROC PRINT data=diet;
TITLE 'The data set containing next weight dates';
VAR subj wt_date nxdate_b1 nxdate_b2
nxdate_m nxdate_e nxdate_s;
RUN;
Obs | subj | wt_date | nxdate_b1 | nxdate_b2 | nxdate_m | nxdate_e | nxdate_s |
---|---|---|---|---|---|---|---|
1 | 1024 | 01DEC2005 | 01MAR2006 | 01MAR2006 | 16MAR2006 | 31MAR2006 | 01MAR2006 |
2 | 1167 | 01DEC2005 | 01MAR2006 | 01MAR2006 | 16MAR2006 | 31MAR2006 | 01MAR2006 |
3 | 1168 | 02DEC2005 | 01MAR2006 | 01MAR2006 | 16MAR2006 | 31MAR2006 | 02MAR2006 |
4 | 1201 | 30NOV2005 | 01FEB2006 | 01FEB2006 | 14FEB2006 | 28FEB2006 | 28FEB2006 |
5 | 1302 | 01JAN2006 | 01APR2006 | 01APR2006 | 15APR2006 | 30APR2006 | 01APR2006 |
Let's review the five assignment statements that calculate five versions of the subjects' next weight dates (nxdate_b1, nxdate_b2, nxdate_m, nxdate_e, and nxdate_s). As you can see, SAS is told to use the 'month' interval in each of the calculations. Again, although there are other intervals available, the most commonly used intervals include 'day', 'weekday', 'week', 'month', 'qtr', and 'year'. SAS is also told to use wt_date as the startdate in each of the calculations. And in each case, SAS is told to advance the wt_date by 3 months. Okay, so the only thing that differs between the five calculations are the last (optional) arguments ('beginning', 'middle', 'end', and 'sameday'). These so-called alignment arguments tell SAS to return either the beginning, middle, or end day of the resulting month. If an alignment is not specified, the beginning day is returned by default. If the 'sameday' alignment is specified, SAS of course returns the same number day but shifted by the number of specified intervals.
Let's launch and run the SAS program, and review the output. The dates should be as described. The contents of nxdate_b1 is the same as nxdate_b2, since the beginning day is returned by default. The variable nxdate_m contains the middle day of the resulting month, and nxdate_e contains the end day of the resulting month. And, for four of the subjects, SAS returns the same number day but 3 months in the future. For subject #1201, you would expect SAS to return February 30th, because it is exactly 3 months from November 30th. This illustrates how the intnx( ) function automatically adjusts the date if the resulting date doesn't exist.
8.2.3. SAS Date System Options¶
There are two system options that affect how SAS handles dates —the DATESTYLE= and YEARCUTOFF= options.
The DATESTYLE= system option tells SAS your intended sequence of month (M), day (D), and year (Y) when dates are ambiguous. Possible settings include MDY, MYD, YMD, YDM, DMY, DYM, and LOCALE. By default, the DATESTYLE system option is set to LOCALE, which tells SAS to use the form of dates that reflect the language and local conventions of the geographical region specified by the LOCALE system option. Yikes, this is sounding circular! Because LOCALE is by default set to ENGLISH for users in the United States, MDY is our default DATESTYLE option. We won’t spend any more time on the DATESTYLE system option, but it is something you’ll definitely want to know about if you ever get tempted to use the anydtdte. informats to read in dates. (Even though the anydtdte. informats are tempting to use as they allow you to read in different forms of the same date into one date variable, I chose not to present the informat, because I don’t like the way it makes SAS have to make decisions about my data!)
SAS developed the YEARCUTOFF= system option to provide users with a way to handle two digit years. If we specify the date constant ‘13apr08’d, we could mean 2008, 1908, or even 1808. The YEARCUTOFF = system option eliminates this ambiguity by telling SAS the first year of a 100-year span to be used by date informats and functions when SAS encounters a two-digit year. The default value of YEARCUTOFF is 1920. In the default case, if SAS encounters a two-digit year in your program between 20 and 99, SAS assumes the date has a prefix of 19. And, if SAS encounters a two-digit year in your program between 00 and 19, SAS assumes the date has a prefix of 20. There are two things you can do if you don’t like the way SAS is handling your two-digit dates — either use four-digit dates or use the OPTIONS statement to change the default YEARCUTOFF= option. We’ll take a look at two examples now just to make sure we understand how SAS handles two-digit years.
Example
The following SAS program uses the default YEARCUTOFF = 1920 to read in nine dates that contain two-digit years ranging from 20 to 99, and then from 00 to 19:
OPTIONS YEARCUTOFF=1920;
DATA twodigits1920;
INPUT date1 mmddyy8.;
FORMAT date1 worddatx20.;
DATALINES;
01/03/20
01/03/21
01/03/49
01/03/50
01/03/51
01/03/99
01/03/00
01/03/01
01/03/19
;
RUN;
PROC PRINT data=twodigits1920;
title 'Years with two-digits when YEARCUTOFF = 1920';
RUN;
Obs | date1 |
---|---|
1 | 3 January 1920 |
2 | 3 January 1921 |
3 | 3 January 1949 |
4 | 3 January 1950 |
5 | 3 January 1951 |
6 | 3 January 1999 |
7 | 3 January 2000 |
8 | 3 January 2001 |
9 | 3 January 2019 |
First, review the dates in the DATALINES statement to make sure you understand the range of two-digit years that we are trying to read into the twodigits1920 data set. Then, launch and run the SAS program, and review the resulting output. Note that the dates containing two-digit years between 20 and 99, are displayed as four-digit years between 1920 and 1999. And, the dates containing two-digit years between 00 and 19, are displayed as four-digit years between 2000 and 2019.
Example
The following SAS program is identical to the previous program except the YEARCUTOFF= system option has been changed to 1950. As before, SAS reads in nine dates that contain two-digit years ranging from 20 to 99, and then from 00 to 19:
OPTIONS YEARCUTOFF=1950;
DATA twodigits1950;
INPUT date1 mmddyy8.;
FORMAT date1 worddatx20.;
DATALINES;
01/03/20
01/03/21
01/03/49
01/03/50
01/03/51
01/03/99
01/03/00
01/03/01
01/03/19
;
RUN;
PROC PRINT data=twodigits1950;
title 'Years with two-digits when YEARCUTOFF = 1950';
RUN;
Obs | date1 |
---|---|
1 | 3 January 2020 |
2 | 3 January 2021 |
3 | 3 January 2049 |
4 | 3 January 1950 |
5 | 3 January 1951 |
6 | 3 January 1999 |
7 | 3 January 2000 |
8 | 3 January 2001 |
9 | 3 January 2019 |
Again, review the dates in the DATALINES statement to make sure you understand the range of two-digit years that we are trying to read into the twodigits1950 data set. Then, launch and run the SAS program, and review the resulting output. Note that now the dates containing two-digit years between 50 and 99, are displayed as four-digit years between 1950 and 1999. And, the dates containing two-digit years between 00 and 49, are displayed as four-digit years between 2000 and 2049.
8.2.4. SAS Time Basics¶
We won’t spend as much time (no pun intended!) learning how to handle times in SAS as we did learning how to handle dates, but we should still learn the basics. In this section, we’ll get a quick and broad overview of the fundamental things you need to know about working with times in SAS. We’ll learn how SAS defines time and datetime values, how to use an informat to read a time into a SAS data set, how to use a format to display a SAS time, how to use the most common time functions, and how to define a SAS time constant.
8.2.4.1. The Definition of a SAS Time and Datetime¶
SAS stores time values similar to the way it stores date values. Specifically, SAS stores time as a numeric value equal to the number of seconds since midnight. So, for example, SAS stores:
a 60 for 12:01 am, , since it is 60 seconds after midnight
a 95 for 12:01:35 am, since it is 95 seconds after midnight
a 120 for 12:02 am, since it is 120 seconds after midnight
and so on …
Since there are 86,400 seconds in a day, a SAS time value takes on a value between 0 and 86,400. No matter how you read a time, SAS converts the time to a number as just defined.
A SAS datetime is a special value that combines both date and time values. A SAS datetime value is stored as the number of seconds between midnight on January 1, 1960, and a given date and time. Okay, I don’t feel like calculating one of these datetimes out myself. I’ll trust the SAS manual that I’m looking at that tells me, for example, that the SAS datetime for April 22, 1989 at 4:10:45 pm equals 92,488,384 seconds. I guess that if you need the added accuracy of working with seconds, then datetimes are for you. I personally have never needed to use them.
8.2.4.2. Using Informats and Formats to Input and Display a SAS Time¶
Just as we need to tell SAS what form a date should take, we need to tell SAS what form a time should take. As you’d probably expect, we use time informats in an INPUT statement to tell SAS the form of the times to be read in. And, we use time formats in a FORMAT statement to tell SAS the form of the times to be displayed.
Example
The following SAS program reads five observations into a SAS data set called diet. One of the variables — weight time (wt_time) — is in hh:mm:ss format, and therefore SAS is told to read the dates using the time8. informat:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.
@52 wt_time time8.;
wtm_fmt1 = wt_time;
wtm_fmt2 = wt_time;
wtm_fmt3 = wt_time;
format wtm_fmt1 hhmm.
wtm_fmt2 hour5.2
wtm_fmt3 time8.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60 00:01:00
1167 Maryann White 1 68 140 12/01/05 01/01/59 00:15:00
1168 Thomas Jones 2 190 12/2/05 06/15/60 12:00:00
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60 00:00:00
1302 Felicia Ho 1 63 115 1/1/06 06/15/58 23:59:59
;
RUN;
PROC PRINT data=diet;
title 'The diet data set with formatted weight times';
var subj wt_time wtm_fmt1 wtm_fmt2 wtm_fmt3;
RUN;
Obs | subj | wt_time | wtm_fmt1 | wtm_fmt2 | wtm_fmt3 |
---|---|---|---|---|---|
1 | 1024 | 60 | 0:01 | 0.02 | 0:01:00 |
2 | 1167 | 900 | 0:15 | 0.25 | 0:15:00 |
3 | 1168 | 43200 | 12:00 | 12.00 | 12:00:00 |
4 | 1201 | 0 | 0:00 | 0.00 | 0:00:00 |
5 | 1302 | 86399 | 24:00 | 24.00 | 23:59:59 |
First, review the program so that you understand what it is doing. Specifically, pay attention to the time8. informat used to read in the wt_time variable. Also, note that three new weight time variables — wtm_fmt1, wtm_fmt2, wtm_fmt3 — are assigned to equal the values of the wt_time variable. The three new variables are each formatted differently, however. The FORMAT statement tells SAS to format wtm_fmt1 as hhmm., wtm_fmt2 as hour5.2, and wtm_fmt3 as time8.
Now, launch and run the SAS program, and review the resulting output to familiarize yourself with the contents of the diet data set. Note, in particular, the numeric values that are stored for the unformatted wt_time variable. As expected, the 00:01:00 time is stored as a 60, the 00:00:00 time is stored as a 0, and the 00:15:00 time is stored as a 900. Then, note the formatted versions of the weight time variables. As you can see, the hhmm. format displays the time on a 24-hour clock. The hour5.2 format displays the time as hours and decimal fractions of hours. And, the time8. format displays the time as hours, minutes and seconds in the form hh:mm:ss.
8.2.4.3. Using SAS Time Functions¶
Just as is the case for SAS dates, the best thing about SAS times is that, because SAS time values are numeric values, you can easily sort them, subtract them, and add them. You can also compare times. Or, you can use them in any of the available time functions. The most commonly used time functions are:
time( ) returns the current time as a SAS time value
hms(h, m, s) returns a SAS time value for the given hour (h), minutes (m), and seconds (s)
hour(time) returns the hour portion of a SAS time value (time)
minute(time) returns the minute portion of a SAS time value (time)
second(time) returns the second portion of a SAS time value (time)
We won’t look at an example of their use here, but the interval functions intnx( ) and intck( ) that we explored on SAS dates can also be used on SAS times.
Example
The following SAS program illustrates the use of the five time functions mentioned above. Specifically, the variable curtime is assigned the current time using the time( ) function. Then, the hour( ), minute( ) and second( ) functions are used to extract the hours, minutes and seconds from the wt_time variable. And finally, the hms( ) function is used to put the hours, minutes, and seconds back together again to create a new variable called wt_time2 that equals the old wt_time variable:
DATA diet;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.
@52 wt_time time8.;
curtime = time();
wt_hr = hour(wt_time);
wt_min = minute(wt_time);
wt_sec = second(wt_time);
wt_time2 = hms(wt_hr, wt_min, wt_sec);
format curtime wt_time wt_time2 time8.;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60 00:01:00
1167 Maryann White 1 68 140 12/01/05 01/01/59 00:15:00
1168 Thomas Jones 2 190 12/2/05 06/15/60 12:00:00
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60 00:00:00
1302 Felicia Ho 1 63 115 1/1/06 06/15/58 23:59:59
;
RUN;
PROC PRINT data=diet;
title 'The diet data set with five new variables';
var subj curtime wt_time wt_hr wt_min wt_sec wt_time2;
RUN;
Obs | subj | curtime | wt_time | wt_hr | wt_min | wt_sec | wt_time2 |
---|---|---|---|---|---|---|---|
1 | 1024 | 8:11:32 | 0:01:00 | 0 | 1 | 0 | 0:01:00 |
2 | 1167 | 8:11:32 | 0:15:00 | 0 | 15 | 0 | 0:15:00 |
3 | 1168 | 8:11:32 | 12:00:00 | 12 | 0 | 0 | 12:00:00 |
4 | 1201 | 8:11:32 | 0:00:00 | 0 | 0 | 0 | 0:00:00 |
5 | 1302 | 8:11:32 | 23:59:59 | 23 | 59 | 59 | 23:59:59 |
First, review the program to make sure that you understand how to use each of the five functions. Then, launch and run the SAS program, and review the output to convince yourself that the program does as claimed.
8.2.4.4. Comparing Times¶
Again, because SAS time values are numeric values, you can easily compare two or more times The comparisons are made just as the comparisons between any two numbers would take place. For example, because the time 00:10:00 is stored as a 600 in SAS, it is considered smaller than the time 00:15:00, which is stored as a 900 in SAS.
Example
The following SAS program illustrates how to compare the values of a time variable, not to the values of some other time variable, but rather to a time constant. Specifically, the WHERE= option on the DATA statement tells SAS to output to the diet data set only those individuals whose wt_time is between midnight and noon, inclusive:
DATA diet (where = ((wt_time ge '00:00:00't)
and (wt_time le '12:00:00't)));;
input subj 1-4 l_name $ 18-23 weight 30-32
+1 wt_date mmddyy8. @43 b_date mmddyy8.
@52 wt_time time8.;
time_int = abs((wt_time - '05:00:00't)/3600);
format wt_time time8. time_int 4.1;
DATALINES;
1024 Alice Smith 1 65 125 12/1/05 01/01/60 00:01:00
1167 Maryann White 1 68 140 12/01/05 01/01/59 00:15:00
1168 Thomas Jones 2 190 12/2/05 06/15/60 12:00:00
1201 Benedictine Arnold 2 68 190 11/30/05 12/31/60 00:00:00
1302 Felicia Ho 1 63 115 1/1/06 06/15/58 23:59:59
;
RUN;
PROC PRINT data=diet;
title 'The subsetted diet data set';
var subj l_name wt_time time_int;
RUN
Obs | subj | l_name | wt_time | time_int |
---|---|---|---|---|
1 | 1024 | Smith | 0:01:00 | 5.0 |
2 | 1167 | White | 0:15:00 | 4.8 |
3 | 1168 | Jones | 12:00:00 | 7.0 |
4 | 1201 | Arnold | 0:00:00 | 5.0 |
First, review the program to make sure you understand what it is doing. Note, for example, the form of the SAS time constants:
'00:00:00't
and
'12:00:00't
used in the WHERE= option. In general, a SAS time constant takes the form 'hh:mm:ss't where hh is the hour in 24-hour time, mm is the minutes, and ss (optional) are the seconds. The letter t that follows the time in single quotes tells SAS to treat the time string like a constant. Note that regardless of how you have informatted or formatted your SAS times, the SAS time constant always takes the above form.
This program also illustrates how you can use SAS time variables easily in calculations. The variable time_int is assigned the absolute difference in time, in hours, between each individual's weight time and their expected weight time, say for example, 5 am.
Now, launch and run the SAS program. Then, review the resulting output to convince yourself that only those individuals whose weight time is between midnight and noon are included in the output diet data set.
8.3. Exercises¶
We will use the dataset Bike_Lanes.csv. Please download it from the course webiste.
Use proc freq to make a table of the different bike lane types.
Create a format that changes changes the bike lane types by mapping “SIDEPATH”, “BIKE BOULEVARD”, and “BIKE LANE” to itself and all others to ” “. Apply this format to type and make a table of type using PROC FREQ. What happens to the other bike lane types?
Create a format that changes the bike lane types to be “CONTRAFLOW”, “SHARED BUS BIKE”, “SHARROW”, “SIGNED ROUTE”, or “OTHER” if the lane type is anything else. Apply this format to type and create a frequency table with PROC FREQ to see the result of the translation.
Write a datatep that can be used to read in the following small dataset with three dates and one time. Then print the dataset using readable date formats. Print out the dataset using DATE and TIME formats to confirm that the data was properly read in (see the SAS documentation pages as needed for DATE and TIME informats/formats).
DATA temp;
INPUT date1 /*informat1*/ +1 date2 /*informat2*/ +1
date3 /*informat3*/ @32 time /*informat4*/;
DATALINES;
2014/02/14 06Jan2018 4/5/2016 03:2:22
;
RUN;