{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data I/O\n", "\n", "In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. In this lesson, we learn methods that will work in many situations to get your data read into a SAS data set. Keep in mind though that when reading data into a SAS data set, you always need to tell SAS three things:\n", "\n", "* where your data reside — Are they **instream data**, that is, embedded within your SAS program? Are they contained in a **raw data file** external to your SAS program? Or are they contained already in **another SAS data set**?\n", "* the format of the data — Are the data values arranged in neatly defined columns so that they can be read in using **column input**? Are the data values separated by at least one blank space so that they can be read in using **list input**? Do the data values contain special characters so that they must be read in using **formatted input**?\n", "* the kind of SAS data set that you want to create — Do you want to create a **permanent SAS data set**? Or do you want to create a **temporary SAS data set**?\n", "\n", "In this lesson, we will learn how to tackle all of these cases. We will learn how to read in both instream, external data from a raw data file (e.g. a csv file), or from another SAS dataset. We'll learn how to use column, list and formatted input. And, we'll learn how to create both temporary and permanent SAS datasets." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading Instream Data\n", "\n", "Although the title of this section is reading instream data, it is hard to focus on just one method of reading data into SAS. As discussed in the introduction to this lesson, every time we read data into a SAS data set, we need to tell SAS three things — where our data reside, the form of the data, and the kind of SAS data set that we want to create. Let's jump right in and take a look at an example.\n", "\n", "
Obs | \n", "subj | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|
1 | \n", "1024 | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "1 | \n", "63 | \n", "115 | \n", "
The DATALINES statement indicates that lines of data are to follow. All lines following DATALINES are treated as data until a single semicolon is encountered. This semicolon indicates the end of the dataset. Note that DATALINES must occur last in the DATA step. Anything after the semicolon ending the DATALINES statement will not be executed by SAS.
\n", " \n", "The INPUT statement identifies the columns that we want to read from the input data, and the names of the SAS variables to be created from the data values.
\n", " \n", "The RUN statement tells SAS to execute the DATA step. The PROC PRINT statement tells SAS to print the entire temporary dataset temp1.
\n", " \n", "The key things to not about the program are:
\n", "\n", "\n",
"\n",
" 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;\n",
" 72 \n",
" 73 libname PHC6089 \"/folders/myfolders/SAS_Notes/data\";\n",
" NOTE: Libref PHC6089 was successfully assigned as follows: \n",
" Engine: V9 \n",
" Physical Name: /folders/myfolders/SAS_Notes/data\n",
" 74 \n",
" 75 \n",
" 76 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;\n",
" 88 \n",
"
\n",
"
\n",
"\n",
"Because the LIBNAME statement is a global statement, our PHC6089 libref remains in effect until we have modified it, canceled it, or ended the SAS session. That is, the LIBNAME statement assigns the libref for the current SAS session only. Each time you begin a SAS session, you must assign a libref to each permanent SAS data library that contains files that you want to access in that SAS session."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Reading Data into Permanent SAS Datasets\n",
"\n",
"Rather than having created a temporary data set in the previous Example that disappears at the end of our SAS session, we could have just as easily created a permanent data set that instead gets stored in a directory of our choosing and is therefore available to us permanently. Let's try that!\n",
"\n",
"Obs | \n", "subj | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|
1 | \n", "1024 | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "1 | \n", "63 | \n", "115 | \n", "
Note that the only differences between this program and the program in previous Example is that:
\n", "Launch and run the SAS program. Don't forget that you must make sure that you have already created the /folders/myfolders/SAS_Notes/data subdirectory on your computer (or at least some other appropriately placed and named subdirectory). After running the program, note that:
\n", "The following SAS program illustrates how to create a temporary SAS data set called temp3 by using column input to read in data stored in a raw data file called temp3.dat:
\n", " \n", "The following SAS program illustrates the simplest example of column input.
\n", "Obs | \n", "subj | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|
1 | \n", "1024 | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "1 | \n", "63 | \n", "115 | \n", "
Notice that the INFILE statement, which must precede the INPUT statement, merely replaces the DATALINES statement and the data that appeared in the previous two examples. The INFILE statement tells SAS where the raw data file is stored on your computer. The name and location of the raw data file must appear in single quotes and immediately follow the INFILE keyword. In this case, our raw data file temp3.dat is stored in the directory \"/folders/myfolders/SAS_Notes/data\". As you can see, the data values are the only items stored in the file.
\n", "\n", "In order to run this program, first save the temp3.dat file (see the CANVAS website for this file) to a convenient location on your computer. Then, edit the INFILE statement as necessary to reflect the correct location. Finally, launch and run the SAS program. Review the log and output windows to convince yourself that the data were properly read into the temporary data set temp3.
\n", "\n", "Instead of identifying the raw data file by specifying the entire filename and location in the INFILE statement, we can alternatively use what is called a fileref (for file reference). Just as we use a LIBNAME statement to assign a libref, we use a FILENAME statement to assign a filref. Filerefs perform the same function as librefs. That is, they temporarily point to a storage location for data. However, librefs point to SAS data libraries, whereas filerefs point to external data files.
\n", "The following SAS program illustrates the use of a fileref in the INFILE statement, in conjunction with a FILENAME statement, to read data stored in a raw data file called temp3.dat to create a temporary SAS data set called temp4:
\n", "\n", "The following SAS program illustrates the simplest example of column input.
\n", "Obs | \n", "subj | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|
1 | \n", "1024 | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "1 | \n", "63 | \n", "115 | \n", "
The FILENAME statement in our program assigns the fileref patients to our temp3.dat file stored in our /folders/myfolders/SAS_Notes/data/ folder. In general, the fileref can be a nickname of our choosing, as long as it is between 1 and 8 characters long, begins with a letter or underscore, and contains only letters, numbers, or underscores. The specification of the physical name of the file, of course, adheres to the conventions of the Windows operating system.
\n", "\n", "If you haven't already done so for the previous example, save the temp3.dat file to a convenient location on your computer. Then, edit the FILENAME statement as necessary to reflect the correct location. Finally, launch and run the SAS program. Review the log and output windows to convince yourself that the data were properly read into the temporary data set temp4.
\n", "Obs | \n", "subj | \n", "name | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice Smith | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "Maryann White | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "Thomas Jones | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "Benedictine Arnold | \n", "2 | \n", "68 | \n", "190 | \n", "
5 | \n", "1302 | \n", "Felicia Ho | \n", "1 | \n", "63 | \n", "115 | \n", "
First, inspect the SAS code to make sure you understand how to set up the INPUT statement for column input.
\n", "\n", "The data points are line up in columns. Although the data values are separated by a space for easy readability, they need not be when using column input. Column # 1 starts at the left-most column on the page.
\n", " \n", "The INPUT statement tells SAS that the subject number appears in columns #1-4. Name is a character variable that appears in columns #6-23, so name is character variable of length 18. Using column input, character variables can have embedded spaces. The gender variable is a character variable that appears in column #25. The height variable appears in columns #27-28. The weight variable appears in columns #30-32.
\n", "\n", "Then, launch and run the SAS program.
\n", " \n", "Finally, review the output from the print procedure to convince yourself that the data are read in properly.
\n", "Obs | \n", "init | \n", "f_name | \n", "l_name | \n", "weight | \n", "height | \n", "
---|---|---|---|---|---|
1 | \n", "A | \n", "Alice | \n", "Smith | \n", "125 | \n", "65 | \n", "
2 | \n", "M | \n", "Maryann | \n", "White | \n", "140 | \n", "68 | \n", "
3 | \n", "T | \n", "Thomas | \n", "Jones | \n", "190 | \n", ". | \n", "
4 | \n", "B | \n", "Benedictine | \n", "Arnold | \n", "190 | \n", "68 | \n", "
5 | \n", "F | \n", "Felicia | \n", "Ho | \n", "115 | \n", "63 | \n", "
First, inspect the SAS code so that you can become familiar with some of the features of column input. Then, launch and run the SAS program.
\n", " \n", "Review the output from the print procedure to convince yourself that the data are read in properly. Note that the position of the variables within the temporary data set temp corresponds to the order in which the variables appear in the input statement, not the order in which the variables appear in the data set. Note that the first column from the dataset is not read in to the dataset temp.
\n", "The following SAS program illustrates the simplest example of list input. Note that there is one blank space between each of the data values. Also note that although the data values need not be lined up in columns, we still recommend doing so because of the difficulty otherwise in \"eyeing\" the data quickly.
\n", "Obs | \n", "subj | \n", "name | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "Benedict | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "Felicia | \n", "1 | \n", "63 | \n", "115 | \n", "
The INPUT statement is how you tell SAS to read in the data using list input. For list input, simply list the variable names — leaving at least one space between names — in the order in which the variables appear in the data file. Remember to use the dollar sign (\\$) to distinguish character variables from numeric variables.
\n", "\n", "Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly with the exception of row 4. Note that Benedictine is read in as Benedict because of the default character length of 8. We will learn how to fix this later with informats.
\n", "The following SAS program illustrates the necessary use of the numeric missing value (.) placeholder when a data value is missing:
\n", "Obs | \n", "subj | \n", "name | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "Benny | \n", "2 | \n", ". | \n", "190 | \n", "
5 | \n", "1302 | \n", "Felicia | \n", "1 | \n", "63 | \n", "115 | \n", "
Note that Benny's height is missing. Therefore, since we are using the list input style to read in the data, we have to put in a missing value (.) placeholder.
\n", "\n", "First, launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly. Then, edit the program by deleting the missing value (.) placeholder. Rerun the SAS program to see what happens when you fail to account for the missing value. In the log file, you should see a note that says:
\n", " \n", "\n", "NOTE: SAS went to a new line when INPUT statement reached past the end of a line\n", "\n", " \n", "
And the resulting output:
\n", "Obs | \n", "subj | \n", "name | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "Benny | \n", "2 | \n", "190 | \n", "1302 | \n", "
should indicate that something has clearly gone awry. What is going on here is that, by default, SAS goes to the next data line to find more data if there are more variable names in the INPUT statement then there are values in the data line. In this case, Benny's height becomes 190, the first number to appear in the data line after gender, and Benny's weight becomes 1302, the first number to appear in the next data line.
\n", "The following SAS program illustrates how you can use the DELIMITER option of the INFILE statement to use values separators other than blanks. This example, in particular, illustrates it for the commonly used comma (,) as a delimiter:
\n", "Obs | \n", "subj | \n", "name | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "Benny | \n", "2 | \n", ". | \n", "190 | \n", "
5 | \n", "1302 | \n", "Felicia | \n", "1 | \n", "63 | \n", "115 | \n", "
By default, SAS assumes data are space-delimited. The DELIMITER option of the INFILE statement here instead warns SAS that the data are comma-delimited — that is, that commas separate the data values rather than blank spaces. You might also have noted that although the INFILE statement typically directs SAS to externally-stored data, here the CARDS (or equivalently DATALINES) option included in the INFILE statement alerts SAS that the data are actually included in the code. Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are indeed read in properly.
\n", "The following SAS program uses the @n column pointer control and standard numeric informats to read three numeric variables — subj, height, and weight — into a temporary SAS data set called temp:
\n", "Obs | \n", "subj | \n", "height | \n", "weight | \n", "
---|---|---|---|
1 | \n", "1024 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", ". | \n", "190 | \n", "
4 | \n", "1201 | \n", "68 | \n", "190 | \n", "
5 | \n", "1302 | \n", "63 | \n", "115 | \n", "
If you look at the INPUT statement, you'll see that it uses @n absolute pointer controls to move the input pointer first to column 1 to read subj, then to column 27 to read height, and finally to column 30 to read weight. In general, the @ moves the pointer to column n, the first column of the field that is being read.
\n", " \n", "The 4. that appears after subj, the 2. that appears after height, and 3. that appears after weight are the informats that tell SAS how to read each of the raw data values. In each case here, we are trying to read in standard numeric data values, and so we use what is called the w.d informat. The w tells SAS the width of the raw data value, that is how many columns it occupies. The d, which is optional, tells SAS the number of implied decimal places for the raw data value. The w and d must be connected by a period (.) delimiter. If the d is not present, you still need to make sure that you include the period in the informat name.
\n", " \n", "Making this all a little more concrete ... here, the subj values are four columns wide with no decimal places, and hence we use the 4. informat. We alternatively could have specified a 4.0 informat, but we could not have specified 4 (without the period) as the informat. The height values are two columns wide with no decimal places, and hence we use the 2. informat. Finally, the weight values are three columns wide with no decimal places, and hence we use the 3. informat.
\n", " \n", "Incidentally, the w.d informat ignores any d that we specify if the data value already contains a decimal point. So, for example, if we had a raw data value of 23.001 (occupying 6 columns with 5 digits, 1 decimal point, and 3 decimal places) and specified a 6. informat, SAS would still store the value as 23.001, even though we told SAS not to expect any decimal places.
\n", " \n", "Okay ... launch and run the SAS program, and review the output obtained from the print procedure to convince yourself that the three variables were read incorrectly. Oh, do you remember when I said that you needn't tell SAS to move the input pointer to the first column because it does so by default? You might want to convince yourself of this by removing the @1 that appears in the INPUT statement. Then, rerun the program to convince yourself that SAS still reads the data values properly.
\n", "The following SAS program uses the @n column pointer control and standard character and numeric informats to read, respectively, two character variables — l_name and f_name — and two numeric variables — weight and height — into a temporary SAS data set called temp:
\n", "Obs | \n", "l_name | \n", "f_name | \n", "weight | \n", "height | \n", "
---|---|---|---|---|
1 | \n", "Smith | \n", "Alice | \n", "125 | \n", "65 | \n", "
2 | \n", "White | \n", "Maryann | \n", "140 | \n", "68 | \n", "
3 | \n", "Jones | \n", "Thomas | \n", "190 | \n", ". | \n", "
4 | \n", "Arnold | \n", "Benedictine | \n", "190 | \n", "68 | \n", "
5 | \n", "Ho | \n", "Felicia | \n", "115 | \n", "63 | \n", "
The INPUT statement uses @n absolute pointer controls to move the input pointer first to column 18 to read l_name, then back to column 6 to read f_name, forward to column 30 to read weight, and back again to column 27 to read height. This example illustrates, therefore, how you can use @n pointer controls to read data fields in any order ... backwards, forwards, and backwards again.
\n", " \n", "The 3. that appears after weight and the 2. that appears after height should look familiar. They are again the numeric informats that tell SAS how to read each of the two numeric variables of interest. Now, if you look at the informats for the two character variables, l_name and f_name, you should see that they each begin with a dollar sign (\\$). Because we are trying to read in character data values, we use what is called the \\$w. informat. The w tells SAS the width of the raw data value, that is how many columns it occupies. The dollar sign (\\$) and period (.) are required delimiters. In our example, the last name (l_name) occupies as many as 6 columns and hence we use the \\$6. informat, and the first name (f_name) occupies as many as 11 columns and hence we use the \\$11. informat.
\n", "\n", "Okay ... launch and run the SAS program, and review the output obtained from the print procedure to convince yourself that the four variables were read in correctly.
\n", "In addition to using @n absolute pointer controls with numeric and character informats, the following SAS program uses +n relative pointer controls with nonstandard informats to create a temporary SAS data set called temp containing six variables:
\n", "Obs | \n", "subj | \n", "f_name | \n", "l_name | \n", "height | \n", "wt_date | \n", "calorie | \n", "
---|---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "Smith | \n", "65 | \n", "13118 | \n", "2036 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "White | \n", "68 | \n", "13118 | \n", "1800 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "Jones | \n", ". | \n", "13119 | \n", "2302 | \n", "
4 | \n", "1201 | \n", "Benedictine | \n", "Arnold | \n", "68 | \n", "13117 | \n", "2432 | \n", "
5 | \n", "1302 | \n", "Felicia | \n", "Ho | \n", "63 | \n", "13149 | \n", "1972 | \n", "
You should now understand the specifications for reading subj, f_name and l_name. The INPUT statement tells SAS to go to column 1 to read the numeric field subj that is four columns wide, then to go to column 6 to read the character field f_name that is 11 columns wide, and then to go to column 18 to read the character field l_name that is 6 columns wide.
\n", " \n", "The +n relative pointer controls are used to read the remaining three variables — height, wt_date, and calorie. The +n pointer control moves the input pointer forward to a column number that is relative to the current position of the pointer. That is, the + moves the pointer forward n columns. In order to count correctly, it is important to understand where the column pointer is located after each data value is read. In general, when using formatted input, the column pointer moves to the first column that follows the field that was just read.
\n", " \n", "Again, let's make it more concrete. When SAS finishes reading the l_name field, the column pointer moves to the first column that follows that field, that is, to the column that immediately follows the d in Arnold (column 24). Now, the height field begins 3 columns to the right, so that's why we tell SAS to move +3 before reading the height data. When SAS finishes reading the height field, the column pointer moves to the first column that follows that field, that is, to the empty column that follows the heights (column 29). Now, the wt_date field begins 5 columns to the right, so that's why we tell SAS to move +5 before reading the weight dates. When SAS finished reading the wt_date field, the column pointer moves to the first column that follows the field, that is to the empty column just before the calorie data (column 42). Now, the calorie field begins 1 column to the right, so that's why we tell SAS to move +1 before reading the calorie data.
\n", " \n", "We'd be all done explaining the INPUT statement if it weren't for the wt_date and calorie fields containing nonstandard numeric data. The wt_date field contains forward slashes (/) for specifying the dates, and the calorie data contains commas (,). To read the wt_date field, we use the mmddyy8. informat to tell SAS to expect dates written in mm/dd/yy form. The 8, of course, tells SAS that the dates could occupy as many as eight columns. To read the calorie field, we use the comma6. informat to tell SAS to read numeric data containing commas and occupying as many as 6 columns.
\n", " \n", "In general, the COMMAw.d informat is used to read numeric values and to remove embedded blanks, commas, dashes, dollar signs, percent signs, right parentheses, and left parentheses (which are converted to minus signs). The w. tells SAS the width of the field (including dollar signs, decimal places, or other special characters). The d, which is optional, tells SAS the number of implied decimal places for a value.
\n", " \n", "Whew!!! Now, go ahead and launch and run the SAS program, and review the output obtained from the print procedure. In doing so, note that formatted input does not imply formatted output. For example, note that SAS stores the date 12/01/95 as the number 13118. We won't worry about this too much yet — it's just that, interestingly enough, SAS stores dates as numeric values equal to the number of days that have passed since January 1, 1960. Yup, that's right —dates after January 1, 1960 are stored as unique positive integers, and dates before January 1, 1960 are stored as unique negative integers. Also, note that SAS prints the values of the variable calorie without commas. In order to get SAS to print the values in a more understandable format, we must tell SAS how to format the output by using a SAS FORMAT statement.
\n", "\n", "Run the following code to see formatted output:
\n", "Obs | \n", "subj | \n", "f_name | \n", "l_name | \n", "height | \n", "wt_date | \n", "calorie | \n", "
---|---|---|---|---|---|---|
1 | \n", "1024 | \n", "Alice | \n", "Smith | \n", "65 | \n", "12/01/95 | \n", "2,036 | \n", "
2 | \n", "1167 | \n", "Maryann | \n", "White | \n", "68 | \n", "12/01/95 | \n", "1,800 | \n", "
3 | \n", "1168 | \n", "Thomas | \n", "Jones | \n", ". | \n", "12/02/95 | \n", "2,302 | \n", "
4 | \n", "1201 | \n", "Benedictine | \n", "Arnold | \n", "68 | \n", "11/30/95 | \n", "2,432 | \n", "
5 | \n", "1302 | \n", "Felicia | \n", "Ho | \n", "63 | \n", "01/01/96 | \n", "1,972 | \n", "
The following SAS program creates a temporary SAS data set called work.temp, which is identical to the permanent SAS data set called PHC6089.temp2:
\n", "Obs | \n", "subj | \n", "gender | \n", "height | \n", "weight | \n", "
---|---|---|---|---|
1 | \n", "1024 | \n", "1 | \n", "65 | \n", "125 | \n", "
2 | \n", "1167 | \n", "1 | \n", "68 | \n", "140 | \n", "
3 | \n", "1168 | \n", "2 | \n", "68 | \n", "190 | \n", "
4 | \n", "1201 | \n", "2 | \n", "72 | \n", "190 | \n", "
5 | \n", "1302 | \n", "1 | \n", "63 | \n", "115 | \n", "
The following SAS program creates a temporary SAS data set called work.yts by reading in the CSV file Youth_Tobacco_Survey_YTS_Data.csv:
\n", "579 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
579! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
580
581 FILENAME yts_data
582 "/folders/myfolders/SAS_Notes/data/Youth_Tobacco_Survey_YTS_Data.csv";
583
584 PROC IMPORT datafile = yts_data out = yts DBMS = CSV replace;
585 getnames = yes;
586 guessingrows = max;
587 run;
588 /**********************************************************************
589 * PRODUCT: SAS
590 * VERSION: 9.4
591 * CREATOR: External File Interface
592 * DATE: 16SEP20
593 * DESC: Generated SAS Datastep Code
594 * TEMPLATE SOURCE: (None Specified.)
595 ***********************************************************************/
596 data WORK.YTS ;
597 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
598 infile YTS_DATA delimiter = ',' MISSOVER DSD firstobs=2 ;
599 informat YEAR best32. ;
600 informat LocationAbbr $2. ;
601 informat LocationDesc $24. ;
602 informat TopicType $25. ;
603 informat TopicDesc $29. ;
604 informat MeasureDesc $57. ;
605 informat DataSource $4. ;
606 informat Response $8. ;
607 informat Data_Value_Unit $1. ;
608 informat Data_Value_Type $10. ;
609 informat Data_Value best32. ;
610 informat Data_Value_Footnote_Symbol $1. ;
611 informat Data_Value_Footnote $71. ;
612 informat Data_Value_Std_Err best32. ;
613 informat Low_Confidence_Limit best32. ;
614 informat High_Confidence_Limit best32. ;
615 informat Sample_Size best32. ;
616 informat Gender $7. ;
617 informat Race $9. ;
618 informat Age $8. ;
619 informat Education $13. ;
620 informat GeoLocation $43. ;
621 informat TopicTypeId $3. ;
622 informat TopicId $6. ;
623 informat MeasureId $6. ;
624 informat StratificationID1 $4. ;
625 informat StratificationID2 $4. ;
626 informat StratificationID3 $4. ;
627 informat StratificationID4 $4. ;
628 informat SubMeasureID $5. ;
629 informat DisplayOrder best32. ;
630 format YEAR best12. ;
631 format LocationAbbr $2. ;
632 format LocationDesc $24. ;
633 format TopicType $25. ;
634 format TopicDesc $29. ;
635 format MeasureDesc $57. ;
636 format DataSource $4. ;
637 format Response $8. ;
638 format Data_Value_Unit $1. ;
639 format Data_Value_Type $10. ;
640 format Data_Value best12. ;
641 format Data_Value_Footnote_Symbol $1. ;
642 format Data_Value_Footnote $71. ;
643 format Data_Value_Std_Err best12. ;
644 format Low_Confidence_Limit best12. ;
645 format High_Confidence_Limit best12. ;
646 format Sample_Size best12. ;
647 format Gender $7. ;
648 format Race $9. ;
649 format Age $8. ;
650 format Education $13. ;
651 format GeoLocation $43. ;
652 format TopicTypeId $3. ;
653 format TopicId $6. ;
654 format MeasureId $6. ;
655 format StratificationID1 $4. ;
656 format StratificationID2 $4. ;
657 format StratificationID3 $4. ;
658 format StratificationID4 $4. ;
659 format SubMeasureID $5. ;
660 format DisplayOrder best12. ;
661 input
662 YEAR
663 LocationAbbr $
664 LocationDesc $
665 TopicType $
666 TopicDesc $
667 MeasureDesc $
668 DataSource $
669 Response $
670 Data_Value_Unit $
671 Data_Value_Type $
672 Data_Value
673 Data_Value_Footnote_Symbol $
674 Data_Value_Footnote $
675 Data_Value_Std_Err
676 Low_Confidence_Limit
677 High_Confidence_Limit
678 Sample_Size
679 Gender $
680 Race $
681 Age $
682 Education $
683 GeoLocation $
684 TopicTypeId $
685 TopicId $
686 MeasureId $
687 StratificationID1 $
688 StratificationID2 $
689 StratificationID3 $
690 StratificationID4 $
691 SubMeasureID $
692 DisplayOrder
693 ;
694 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
695 run;
NOTE: The infile YTS_DATA is:
Filename=/folders/myfolders/SAS_Notes/data/Youth_Tobacco_Survey_YTS_Data.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=30Jul2020:12:31:00,
File Size (bytes)=2615810
NOTE: 9794 records were read from the infile YTS_DATA.
The minimum record length was 228.
The maximum record length was 355.
NOTE: The data set WORK.YTS has 9794 observations and 31 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds
9794 rows created in WORK.YTS from YTS_DATA.
NOTE: WORK.YTS data set was successfully created.
NOTE: The data set WORK.YTS has 9794 observations and 31 variables.
NOTE: PROCEDURE IMPORT used (Total process time):
real time 8.71 seconds
cpu time 7.34 seconds
696
697 ods html5 (id=saspy_internal) close;ods listing;
698
Note that when running PROC IMPORT SAS generates the DATA step code (shown in the log file output) to read in the dataset. Some of the key features of PROC IMPORT:
\n", " \n", "\n", "----1----2----3----4----5----6----7----8\n", "75+ 0-39g/day 0-9g/day 1 18\n", "75+ 0-39g/day 10-19 2 6\n", "75+ 0-39g/day 30+ 1 3\n", "75+ 40-79 0-9g/day 2 5\n", "75+ 40-79 10-19 1 3\n", "75+ 40-79 20-29 0 3\n", "75+ 40-79 30+ 1 1\n", "75+ 80-119 0-9g/day 1 1\n", "75+ 80-119 10-19 1 1\n", "75+ 120+ 0-9g/day 2 2\n", "75+ 120+ 10-19 1 1\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "SAS", "language": "sas", "name": "sas" }, "language_info": { "codemirror_mode": "sas", "file_extension": ".sas", "mimetype": "text/x-sas", "name": "sas" } }, "nbformat": 4, "nbformat_minor": 2 }