{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SAS Variables and Assignment Statements\n", "\n", "Once you've read your data into a SAS data set, surely you want to do something with it. A common thing to do is to change the original data in some way in an attempt to answer a research question of interest to you. You can change the data in one of two ways:\n", "\n", "1. You can use a basic **assignment statement** in which you add some information to all of the observations in the data set. Some assignment statements may take advantage of the numerous **SAS functions** that are available to make programming certain calculations easier (e.g., taking an average).\n", "2. Alternatively, you can use an **if-then-else statement** to add some information to some but not all of the observations. In this lesson, we will learn how to use assignment statements and numeric SAS functions to change your data, and then, we will learn how to use if-then-else statements to change a subset of your data.\n", "\n", "Modifying your data may involve not only changing the values of a particular variable, but also the type of the variable. That is, you might need to change a character variable to a numeric variable. For that reason, we'll investigate how to use the INPUT function to convert character data values to numeric values.\n", "\n", "## Assignment Statement Basics\n", "\n", "The fundamental method of modifying the data in a data set is by way of a basic assignment statement. Such a statement always takes the form:\n", "\n", "variable = expression;\n", "\n", "where variable is any valid SAS name and expression is the calculation that is necessary to give the variable its values. The variable must always appear to the left of the equal sign and the expression must always appear to the right of the equal sign. As always, the statement must end with a semicolon (;).\n", "\n", "Because assignment statements involve changing the values of variables, in the process of learning about assignment statements we'll get practice with working with both numeric and character variables. We'll also learn how using numeric SAS functions can help to simplify some of our calculations.\n", "\n", "
Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "p1 | \n", "f1 | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "97 | \n", "80 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", "100 | \n", "85 | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "99 | \n", "93 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "82 | \n", "69 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "98 | \n", "92 | \n", "
The data set contains student names (name), each of their four exam grades (e1, e2, e3, e4), their project grade (p1), and their final exam grade (f1).
\n", "A couple of comments. For the sake of the examples that follow, we'll use the DATALINES statement to read in the data. We could have just as easily used the INFILE statement. Additionally, for the sake of ease, we'll create temporary data sets rather than permanent ones. Finally, after each SAS DATA step, we'll use the PRINT procedure to print all or part of the resulting SAS data set for your perusal.
\n", "The following SAS program illustrates a very simple assignment statement in which SAS adds up the four exam scores of each student and stores the result in a new numeric variable called examtotal.
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "examtotal | \n", "
---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "315 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", ". | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "381 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "237 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "324 | \n", "
Note that, as previously described, the new variable name examtotal appears to the left of the equal sign, while the expression that adds up the four exam scores (e1+e2+e3+e4) appears to the right of the equal sign.
\n", "Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the new numeric variable examtotal is indeed the sum of the four exam scores for each student appearing in the data set. Also note what SAS does when it is asked to calculate something when some of the data are missing. Rather than add up the three exam scores that do exist for John Simon, SAS instead assigns a missing value to his examtotal. If you think about it, that's a good thing! Otherwise, you'd have no way of knowing that his examtotal differed in some fundamental way from that of the other students. The important lesson here is to always be aware of how SAS is going to handle the missing values in your data set when you perform various calculations!\n", "
In the previous example, the assignment statement created a new variable in the data set by simply using a variable name that didn't already exist in the data set. You need not always use a new variable name. Instead, you could modify the values of a variable that already exists. The following SAS program illustrates how the instructor would modify the variable e2, say for example, if she wanted to modify the grades of the second exam by adding 8 points to each student's grade:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "p1 | \n", "f1 | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "90 | \n", "86 | \n", "69 | \n", "97 | \n", "80 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "80 | \n", "86 | \n", ". | \n", "100 | \n", "85 | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "100 | \n", "92 | \n", "99 | \n", "99 | \n", "93 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "71 | \n", "71 | \n", "49 | \n", "82 | \n", "69 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "70 | \n", "88 | \n", "74 | \n", "98 | \n", "92 | \n", "
Note again that the name of the variable being modified (e2) appears to the left of the equal sign, while the arithmetic expression that tells SAS to add 8 to the second exam score (e2+8) appears to the right of the equal sign. In general, when a variable name appears on both sides of the equal sign, the original value on the right side is used to evaluate the expression. The result of the expression is then assigned to the variable on the left side of the equal sign.
\n", "Launch and run the SAS program. Review the output from the print procedure to convince yourself that the values of the numeric variable e2 are indeed eight points higher than the values in the original data set.
\n", "Operation | \n", "\t\t\tSymbol | \n", "\t\t\tAssignment Statement | \n", "\t\t\tAction Taken | \n", "\t\t
addition | \n", "\t\t\t+ | \n", "\t\t\ta = b + c; | \n", "\t\t\tadd b and c | \n", "\t\t
subtraction | \n", "\t\t\t- | \n", "\t\t\ta = b - c; | \n", "\t\t\tsubtract c from b | \n", "\t\t
multiplication | \n", "\t\t\t* | \n", "\t\t\ta = b * c; | \n", "\t\t\tmultiply b and c | \n", "\t\t
division | \n", "\t\t\t/ | \n", "\t\t\ta = b / c; | \n", "\t\t\tdivide b by c | \n", "\t\t
exponentiation | \n", "\t\t\t** | \n", "\t\t\ta = b ** c; | \n", "\t\t\traise b to the power of c | \n", "\t\t
negative prefix | \n", "\t\t\t- | \n", "\t\t\ta = -b; | \n", "\t\t\ttake the negative of b | \n", "\t\t
The following example contains a calculation that illustrates the standard order of operations. Suppose a statistics instructor calculates the final grade by weighting the average exam score by 0.6, the project score by 0.2, and the final exam by 0.2. The following SAS program illustrates how the instructor (incorrectly) calculates the students' final grades:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "p1 | \n", "f1 | \n", "final | \n", "
---|---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "97 | \n", "80 | \n", "267.45 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", "100 | \n", "85 | \n", ". | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "99 | \n", "93 | \n", "305.95 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "82 | \n", "69 | \n", "208.85 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "98 | \n", "92 | \n", "266.50 | \n", "
Well, okay, so the instructor should stick to statistics and not mathematics. As you can see in the assignment statement, the instructor is attempting to tell SAS to average the four exam scores by adding them up and dividing by 4, and then multiplying the result by 0.6. Let's see what SAS does instead. Launch and run the SAS program, and review the output to see if you can figure out what SAS did, say, for the first student Alexander Smith. If you're still not sure, review the rules for the order of the operations again. The rules tell us that SAS first:
\n", "Then, SAS performs all of the addition:
46.8 + 82 + 86 + 17.25 + 19.4 + 16.0\n", "
to get his final score of 267.45. Now, maybe that's a final score that Alexander wants, but it is still fundamentally wrong. Let's see if we can help set the statistics instructor straight by taking advantage of that last rule that says operations in parentheses are performed first.
\n", "The following example contains a calculation that illustrates the standard order of operations. Suppose a statistics instructor calculates the final grade by weighting the average exam score by 0.6, the project score by 0.2, and the final exam by 0.2. The following SAS program illustrates how the instructor (correctly) calculates the students' final grades:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "p1 | \n", "f1 | \n", "final | \n", "
---|---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "97 | \n", "80 | \n", "82.65 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", "100 | \n", "85 | \n", ". | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "99 | \n", "93 | \n", "95.55 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "82 | \n", "69 | \n", "65.75 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "98 | \n", "92 | \n", "86.60 | \n", "
Let's dissect the calculation of Alexander's final score again. The assignment statement for final tells SAS:
\n", "
Then, SAS performs the addition of the last three items:
\n", "
47.25 + 19.4 + 16.0\n", "
to get his final score of 82.65. There, that sounds much better. Sorry, Alexander.
\n", "Launch and run the SAS program to see how we did. Review the output from the print procedure to convince yourself that the final grades have been calculated as the instructor wishes. By the way, note again that SAS assigns a missing value to the final grade for John Simon.
\n", "In this last example, we calculated the students' average exam scores by adding up their four exam grades and dividing by 4. We could have instead taken advantage of one of the many numeric functions that are available in SAS, namely that of the MEAN function.
\n", "In the previous example, we calculated students' average exam scores by adding up their four exam grades and dividing by 4. Alternatively, we could use the MEAN function. The following SAS program illustrates the calculation of the average exam scores in two ways — by definition and by using the MEAN function:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "avg1 | \n", "avg2 | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "78.75 | \n", "78.75 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", ". | \n", "82.00 | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "95.25 | \n", "95.25 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "59.25 | \n", "59.25 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "81.00 | \n", "81.00 | \n", "
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the two methods of calculating the average exam scores do indeed yield the same results:
\n", "Oooops! What happened? SAS reports that the average exam score for John Simon is 82 when the average is calculated using the MEAN function, but reports a missing value when the average is calculated using the definition. If you study the results, you'll soon figure out that when calculating an average using the MEAN function, SAS ignores the missing values and goes ahead and calculates the average based on the available values.
\n", "We can't really make some all-conclusive statement about which method is more appropriate, as it really depends on the situation and the intent of the programmer. Instead, the (very) important lesson here is to know how missing values are handled for the various methods that are available in SAS! We can't possibly address all of the possible calculations and functions in this course. So ... you would be wise to always check your calculations out on a few representative observations to make sure that your SAS programming is doing exactly as you intended. This is another one of those good programming practices to jot down.
\n", "Although you can refer to SAS Help and Documentation (under \"functions, by category\") for a full accounting of the built-in numeric functions that are available in SAS, here is a list of just some of the numeric functions that can be helpful when performing statistical analyses:
\n", "Common Functions | \n", "\t\t\tExample | \n", "\t\t
INT: the integer portion of a numeric value | \n", "\t\t\ta = int(x); | \n", "\t\t
ABS: the absolute value of the argument | \n", "\t\t\ta = abs(x); | \n", "\t\t
SQRT: the square root of the argument | \n", "\t\t\ta = sqrt(x); | \n", "\t\t
MIN: the minimum value of the arguments | \n", "\t\t\ta = min(x, y, z); | \n", "\t\t
MAX: the maximum value of the arguments | \n", "\t\t\ta = max(x, y, z); | \n", "\t\t
SUM: the sum of the arguments | \n", "\t\t\ta = sum(x, y, z); | \n", "\t\t
MEAN: the mean of the arguments | \n", "\t\t\ta = mean(x, y, z); | \n", "\t\t
ROUND: round the argument to the specified unit | \n", "\t\t\ta = round(x, 1); | \n", "\t\t
LOG: the log (base e) of the argument | \n", "\t\t\ta = log(x); | \n", "\t\t
LAG: the value of the argument in the previous observation | \n", "\t\t\ta = lag(x); | \n", "\t\t
DIF: the difference between the values of the argument in the current and previous observations | \n", "\t\t\ta = dif(x); | \n", "\t\t
N: the number of non-missing values of the argument | \n", "\t\t\ta = n(x); | \n", "\t\t
NMISS: the number of missing values of the argument | \n", "\t\t\ta = nmiss(x); | \n", "\t\t
I have used the INT function a number of times when dealing with numbers whose first few digits contain some additional information that I need. For example, the area code in this part of Pennsylvania is 814. If I have phone numbers that are stored as numbers, say, as 8142341230, then I can use the INT function to extract the area code from the number. Let's take a look at an example of this use of the INT function.
\n", "The following SAS program uses the INT function to extract the area codes from a set of ten-digit telephone numbers:
\n", "Obs | \n", "name | \n", "phone | \n", "areacode | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "8145551212 | \n", "814 | \n", "
2 | \n", "John Simon | \n", "8145562314 | \n", "814 | \n", "
3 | \n", "Patricia Jones | \n", "7175559999 | \n", "717 | \n", "
4 | \n", "Jack Benedict | \n", "5705551111 | \n", "570 | \n", "
5 | \n", "Rene Porter | \n", "8145542323 | \n", "814 | \n", "
In short, the INT function returns the integer part of the expression contained within parentheses. So, if the phone number is 8145562314, then int(phone/10000000) becomes int(814.5562314) which becomes, as claimed, the area code 814. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the area codes are calculated as claimed.
\n", "One really cool thing is that you can nest functions in SAS (as you can in most programming languages). That is, you can compute a function within another function. When you nest functions, SAS works from the inside out. That is, SAS performs the action in the innermost function first. It uses the result of that function as the argument of the next function, and so on. You can nest any function as long as the function that is used as the argument meets the requirements for the argument.The following SAS program illustrates nested functions when it rounds the students' exam average to the nearest unit:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "avg | \n", "
---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "79 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", "82 | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "95 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "59 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "81 | \n", "
For example, the average of Alexander's four exams is 78.75 (the sum of 78, 82, 86, and 69 all divided by 4). Thus, in calculating avg for Alexander, 78.75 becomes the argument for the ROUND function. That is, 78.75 is rounded to the nearest one unit to get 79. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the exam averages avg are rounded as claimed.
\n", "When creating a new character variable in a data set, most often you will want to assign the values based on certain conditions. For example, suppose an instructor wants to create a character variable called status which indicates whether a student \"passed\" or \"failed\" based on their overall final grade. A grade below 65, say, might be considered a failing grade, while a grade of 65 or higher might be considered a passing grade. In this case, we would need to make use of an if-then-else statement. We'll learn more about this kind of statement in a later section, but you'll get the basic idea here. The following SAS program illustrates the creation of a new character variable called status using an assignment statement in conjunction with an if-then-else statement:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "avg | \n", "status | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "78.75 | \n", "Passed | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", "82.00 | \n", "Passed | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "95.25 | \n", "Passed | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "59.25 | \n", "Failed | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "81.00 | \n", "Passed | \n", "
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly. As you can see, to specify a character variable's value using an assignment statement, you enclose the value in quotes. Some comments:
\n", "The following SAS program illustrates how SAS tries to perform an automatic character-to-numeric conversion of standtest and e1, e2, e3, and e4 so that arithmetic operations can be performed on them:
\n", "256 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
256! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
257
258 DATA grades;
259 input name $ 1-15 e1 $ e2 $ e3 $ e4 $ standtest $;
260 avg = round(mean(e1,e2,e3,e4),1);
261 std = standtest/4;
262 DATALINES;
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
260:22 260:25 260:28 260:31 261:11
NOTE: Invalid numeric data, standtest='1,210' , at line 261 column 11.
RULE:----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
263 Alexander Smith 78 82 86 69 1,210
name=Alexander Smith e1=78 e2=82 e3=86 e4=69 standtest=1,210 avg=79 std=. _ERROR_=1 _N_=1
NOTE: Invalid numeric data, standtest='1,010' , at line 261 column 11.
265 Patricia Jones 98 92 92 99 1,010
name=Patricia Jones e1=98 e2=92 e3=92 e4=99 standtest=1,010 avg=95 std=. _ERROR_=1 _N_=3
NOTE: Invalid numeric data, standtest='1,180' , at line 261 column 11.
267 Rene Porter 100 62 88 74 1,180
name=Rene Porter e1=100 e2=62 e3=88 e4=74 standtest=1,180 avg=81 std=. _ERROR_=1 _N_=5
NOTE: Missing values were generated as a result of performing an operation on missing values.
Each place is given by: (Number of times) at (Line):(Column).
3 at 261:20
NOTE: The data set WORK.GRADES has 5 observations and 8 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
268 ;
269 RUN;
270
271 ods html5 (id=saspy_internal) close;ods listing;
272
Okay, first note that for some crazy reason all of the data in the data set have been read in as character data. That is, even the exam scores (e1, e2, e3, e4) and the standardized test scores (standtest) are stored as character variables. Then, when SAS goes to calculate the average exam score (avg), SAS first attempts to convert e1, e2, e3, and e4 to numeric variables. Likewise, when SAS goes to calculate a new standardized test score (std), SAS first attempts to convert standtest to a numeric variable. Let's see how it does. Launch and run the SAS program, and before looking at the output window, take a look at the log window. You should see something that looks like the log shown above:
\n", "
The first NOTE that you see is a standard message that SAS prints in the log to warn you that it performed an automatic character-to-numeric conversion on your behalf. Then, you see three NOTE about invalid numeric data concerning the standtest values 1,210, 1,010, and 1,180. In case you haven't figured it out yourself, it's the commas in those numbers that is throwing SAS for a loop. In general, the automatic conversion produces a numeric missing value from any character value that does not conform to standard numeric values (containing only digits 0, 1, ..., 9, a decimal point, and plus or minus signs). That's why that fifth NOTE is there about missing values being generated. The output itself:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "standtest | \n", "avg | \n", "std | \n", "
---|---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "1,210 | \n", "79 | \n", ". | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", "\n", " | 990 | \n", "82 | \n", "247.50 | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "1,010 | \n", "95 | \n", ". | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "875 | \n", "59 | \n", "218.75 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "1,180 | \n", "81 | \n", ". | \n", "
shows the end result of the attempted automatic conversion. The calculation of avg went off without a hitch because e1, e2, e3, and e4 contain standard numeric values, whereas the calculation of std did not because standtest contains nonstandard numeric values. Let's take this character-to-numeric conversion into our own hands.
\n", "The following SAS program illustrates the use of the INPUT function to convert the character variable standtest to a numeric variable explicitly so that an arithmetic operation can be performed on it:
\n", "Obs | \n", "name | \n", "standtest | \n", "std | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "1,210 | \n", "302.50 | \n", "
2 | \n", "John Simon | \n", "990 | \n", "247.50 | \n", "
3 | \n", "Patricia Jones | \n", "1,010 | \n", "252.50 | \n", "
4 | \n", "Jack Benedict | \n", "875 | \n", "218.75 | \n", "
5 | \n", "Rene Porter | \n", "1,180 | \n", "295.00 | \n", "
The only difference between the calculation of std here and of that in the previous example is that the standtest variable has been inserted here into the INPUT function. The general form of the INPUT function is:
\n", "INPUT(source, informat)
\n",
" where
\n", "In our case, standtest is the character variable we are trying to convert to a numeric variable. The values in standtest conform to the comma5. informat, and hence its specification in the INPUT function.
\n", "Let's see how we did. Launch and run the SAS program, and again before looking at the output window, take a look at the log window. You should see something that now looks like this:
\n", "320 ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
320! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
321
322 DATA grades;
323 input name $ 1-15 e1 $ e2 $ e3 $ e4 $ standtest $;
324 std = input(standtest,comma5.)/4;
325 DATALINES;
NOTE: The data set WORK.GRADES has 5 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
331 ;
332 RUN;
333
334 ods html5 (id=saspy_internal) close;ods listing;
335
Ahhaa! No warnings about SAS taking over our program and performing automatic conversions. That's because we are in control this time! Now, looking at the output (see above), we see that we successfully calculated std this time around. That's much better!
\n", "A couple of closing comments. First, I might use our discussion here to add another item to your growing list of good programming practices. Whenever possible, make sure that you are the one who is in control of your program. That is, know what your program is doing at all times, and if it's not doing what you'd expect it to do for all situations, then rewrite it in such a way to make sure that it does.
\n", "Second, you might be wondering \"geez, we just spent all this time talking about character-to-numeric conversions, but what happens if I have to do a numeric-to-character conversion instead?\" Don't worry ... SAS doesn't let you down. If you try to do something to a numeric variable that should only be done to a character variable, SAS automatically tries first to convert the character variable to a numeric variable. If that doesn't work, then you'll want to use the PUT function to convert your numeric values to character values explicitly. We'll address the PUT function in Stat 481 when we learn about character functions in depth.
\n", "There is nothing really new here. You've already seen an if-then(-else) statement in the previous lesson. Our focus there was primarily on the assignment statement. Here, we'll focus on the entire if-then statement, including the condition. The following SAS program creates a character variable status, whose value depends on whether or not the student's first exam grade is less than 65:
\n", "Obs | \n", "name | \n", "e1 | \n", "status | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "\n", " |
2 | \n", "John Simon | \n", "88 | \n", "\n", " |
3 | \n", "Patricia Jones | \n", "98 | \n", "\n", " |
4 | \n", "Jack Benedict | \n", "54 | \n", "Failed | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "\n", " |
First, note that we continue to work with the grades data set from the last section. Again, the data set contains student names (name), each of their four exam grades (e1, e2, e3, e4), their project grade (p1), and their final exam grade (f1). Then, launch and run the SAS program. Review the output from the print procedure to convince yourself that the values of the character variable status have been assigned correctly.
\n", "Comparison | \n", "\t\t\tSAS syntax | \n", "\t\t\tAlternative SAS syntax | \n", "\t\t
less than | \n", "\t\t\t< | \n", "\t\t\tLT | \n", "\t\t
greater than | \n", "\t\t\t> | \n", "\t\t\tGT | \n", "\t\t
less than or equal to | \n", "\t\t\t<= | \n", "\t\t\tLE | \n", "\t\t
greater than or equal to | \n", "\t\t\t>= | \n", "\t\t\tGE | \n", "\t\t
equal to | \n", "\t\t\t= | \n", "\t\t\tEQ | \n", "\t\t
not equal to | \n", "\t\t\t^= | \n", "\t\t\tNE | \n", "\t\t
equal to one of a list | \n", "\t\t\tin | \n", "\t\t\tIN | \n", "\t\t
The following SAS program uses the IN operator to identify those students who scored a 98, 99, or 100 on their project score. That is, students whose p1 value equals either 98, 99, or 100 are assigned the value 'Excellent' for the project variable:
\n", "Obs | \n", "name | \n", "p1 | \n", "project | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "97 | \n", "\n", " |
2 | \n", "John Simon | \n", "100 | \n", "Excellent | \n", "
3 | \n", "Patricia Jones | \n", "99 | \n", "Excellent | \n", "
4 | \n", "Jack Benedict | \n", "82 | \n", "\n", " |
5 | \n", "Rene Porter | \n", "98 | \n", "Excellent | \n", "
Launch and run the SAS program and review the output from the PRINT procedure to convince yourself that the program performs as described.
\n", "
Note! After being introduced to the comparison operators, students are often tempted to use the syntax EQ in an assignment statement. If you try it, you'll soon learn that SAS will hiccup at you. Assignment statements must always use the equal sign (=).
\n", "The following SAS program creates a character variable status, whose value is \"Failed\" IF the student's first exam grade is less than 65, otherwise (i.e., ELSE) the value is \"Passed\":
\n", "Obs | \n", "name | \n", "e1 | \n", "status | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "Passed | \n", "
2 | \n", "John Simon | \n", "88 | \n", "Passed | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "Passed | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "Failed | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "Passed | \n", "
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly.
\n", "Note that, in general, using ELSE statements with IF-THEN statements can save resources:
\n", "For greater efficiency, you should construct your IF-THEN-ELSE statements with conditions of decreasing probabilities.
\n", "Operation | \n", "\t\t\tSAS syntax | \n", "\t\t\tAlternative SAS syntax | \n", "\t\t
are both conditions true? | \n", "\t\t\t& | \n", "\t\t\tAND | \n", "\t\t
is either condition true? | \n", "\t\t\t| | \n", "\t\t\tOR | \n", "\t\t
reverse the logic of a comparison | \n", "\t\t\t^ or ~ | \n", "\t\t\tNOT | \n", "\t\t
The following SAS program illustrates the use of several mutually exclusive conditions within an if-then-else statement. The program uses the AND operator to define the conditions. Again, when comparisons are connected by AND, all of the comparisons must be true in order for the condition to be true.
\n", "Obs | \n", "name | \n", "avg | \n", "overall | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "78.8 | \n", "C | \n", "
2 | \n", "John Simon | \n", ". | \n", "Incomplete | \n", "
3 | \n", "Patricia Jones | \n", "95.3 | \n", "A | \n", "
4 | \n", "Jack Benedict | \n", "59.3 | \n", "F | \n", "
5 | \n", "Rene Porter | \n", "81.0 | \n", "B | \n", "
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the letter grades have been assigned correctly. Also note how the program in general, and the if-then-else statement in particular, is formatted in order to make the program easy to read. The conditions and assignment statements are aligned nicely in columns and parentheses are used to help offset the conditions. Whenever possible ... okay, make that always ... format (and comment) your programs. After all, you may actually need to use them again in a few years. Trust me ... you'll appreciate it then! Note that:
\n", "Oh, one more point. You may have noticed, after the condition that takes care of missing values, that the conditions appear in order from A, B, ... down to F. Is the instructor treating the glass as being half-full as opposed to half-empty? Hmmm ... actually, the order has to do with the efficiency of the statements. When SAS encounters the condition that is true for a particular observation, it jumps out of the if-then-else statement to the next statement in the DATA step. SAS thereby avoids having to needlessly evaluate all of the remaining conditions. Hence, we have ourselves another good programming habit ... arrange the order of your conditions (roughly speaking, of course!) in an if-then-else statement so that the most common one appears first, the next most common one appears second, and so on. You'll also need to make sure that your condition concerning missing values appears first in the IF statement, otherwise SAS may bypass it.
\n", "In the previous program, the conditions were written using the AND operator. Alternatively, we could have just used straightforward numerical intervals. The following SAS program illustrates the use of alternative intervals as well as the alternative syntax for the comparison operators. Note that you get the same output as the previous program.
\n", "
Obs | \n", "name | \n", "avg | \n", "overall | \n", "
---|---|---|---|
1 | \n", "Alexander Smith | \n", "78.8 | \n", "C | \n", "
2 | \n", "John Simon | \n", ". | \n", "Incomplete | \n", "
3 | \n", "Patricia Jones | \n", "95.3 | \n", "A | \n", "
4 | \n", "Jack Benedict | \n", "59.3 | \n", "F | \n", "
5 | \n", "Rene Porter | \n", "81.0 | \n", "B | \n", "
Now, suppose an instructor wants to give bonus points to students who show some sign of improvement from the beginning of the course to the end of the course. Suppose she wants to add two points to a student's overall average if either her first exam grade is less than her third and fourth exam grade or her second exam grade is less than her third and fourth exam grade. (Don't ask why! I'm just trying to motivate something here.) The operative words here are \"either\" and \"or\". In order to accommodate the instructor's wishes, we need to take advantage of the OR comparison operator. When comparisons are connected by OR, only one of the comparisons needs to be true in order for the condition to be true. The following SAS program illustrates the use of the OR operator, the AND operator, and the use of the OR and AND operators together:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "avg | \n", "adjavg | \n", "
---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "78.8 | \n", "78.8 | \n", "
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", ". | \n", ". | \n", ". | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "95.3 | \n", "95.3 | \n", "
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "59.3 | \n", "59.3 | \n", "
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "81.0 | \n", "83.0 | \n", "
First, inspect the program to make sure you understand the code. In particular, note that logical comparisons that are enclosed in parentheses are evaluated as true or false before they are compared to other expressions. In this example:
\n", "Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that, where appropriate, two points were added to the student's average (avg) to get an adjusted average (adjavg). Also, note that we didn't have to worry about programming for missing values here, because the student's adjusted average (adjavg) would automatically be assigned missing if his or her average (avg) was missing. SAS calls this \"propagation of missing values.\"
\n", "Suppose our now infamous instructor wants to identify those students who either did not complete the course or failed. Because SAS is case-sensitive, any if-then-else statements written to identify the students have to check for those students whose status is 'failed' or 'Failed' or 'FAILED' or ... you get the idea. One rather tedious solution would be to check for all possible \"typings\" of the word \"failed\" and \"incomp\" (for incomplete). Alternatively, we could use the UPCASE function to first produce an uppercase value, and then make our comparisons only between uppercase values. The following SAS program takes such an approach:
\n", "Obs | \n", "name | \n", "status | \n", "action | \n", "action2 | \n", "
---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "passed | \n", "none | \n", "none | \n", "
2 | \n", "John Simon | \n", "incomp | \n", "contact | \n", "contact | \n", "
3 | \n", "Patricia Jones | \n", "PAssed | \n", "\n", " | none | \n", "
4 | \n", "Jack Benedict | \n", "FAILED | \n", "\n", " | contact | \n", "
5 | \n", "Rene Porter | \n", "PASSED | \n", "\n", " | none | \n", "
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the if-then-else statement that involves the creation of the variable action is inadequate while the one that uses the UPCASE function to create the variable action2 works like a charm.
\n", "By the way, when making comparisons that involve character values, you should know that SAS considers a missing character value (a blank space ' ') to be smaller than any letter, and so the good habit of programming for missing values holds when dealing with character variables as well.
\n", "Suppose our instructor wants to assign a grade of zero to any student who missed the fourth exam, as well as notify the student that she has done so. The following SAS program illustrates the use of the DO-END clause to accommodate the instructors wishes:
\n", "Obs | \n", "name | \n", "e1 | \n", "e2 | \n", "e3 | \n", "e4 | \n", "p1 | \n", "f1 | \n", "notify | \n", "
---|---|---|---|---|---|---|---|---|
1 | \n", "Alexander Smith | \n", "78 | \n", "82 | \n", "86 | \n", "69 | \n", "97 | \n", "80 | \n", "\n", " |
2 | \n", "John Simon | \n", "88 | \n", "72 | \n", "86 | \n", "0 | \n", "100 | \n", "85 | \n", "YES | \n", "
3 | \n", "Patricia Jones | \n", "98 | \n", "92 | \n", "92 | \n", "99 | \n", "99 | \n", "93 | \n", "\n", " |
4 | \n", "Jack Benedict | \n", "54 | \n", "63 | \n", "71 | \n", "49 | \n", "82 | \n", "69 | \n", "\n", " |
5 | \n", "Rene Porter | \n", "100 | \n", "62 | \n", "88 | \n", "74 | \n", "98 | \n", "92 | \n", "\n", " |
The DO statement tells SAS to treat all of the statements it encounters as one all-inclusive action until a matching END appears. If no matching END appears, SAS will hiccup. Launch and run the SAS program, and review the output of the PRINT procedure to convince yourself that the program accomplishes what we claim.
\n", "\n", "data school;\n", " input Age Quiz : $1. Midterm Final;\n", " /* Add you statements here */\n", "datalines;\n", "12 A 92 95\n", "12 B 88 88\n", "13 C 78 75\n", "13 A 92 93\n", "12 F 55 62\n", "13 B 88 82\n", ";\n", "\n", "\n", "Using If-Then-Else statements, compute two new variables as follows: \n", "\n", "* Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13.\n", "* The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75, D = 70, and F = 65. Using this information, compute a course grade (Course) as a weighted average of the Quiz (20\\%), Midterm (30\\%) and Final (50\\%)." ] } ], "metadata": { "kernelspec": { "display_name": "SAS", "language": "sas", "name": "sas" }, "language_info": { "codemirror_mode": "sas", "file_extension": ".sas", "mimetype": "text/x-sas", "name": "sas" } }, "nbformat": 4, "nbformat_minor": 2 }