{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to PROC SQL\n", "\n", "PROC SQL is a powerful tool for data manipulation and querying. It can perform many functions such as the conventional DATA and PROC steps but with fewer statements and computer resources. In this lesson, we will investigate how to select, subset, sort, summarize and group data with SQL procedure.\n", "\n", "## PROC SQL Basics\n", "\n", "PROC SQL is a procedure that SAS developed for the implementation of Structured Query Language. You can use this procedure to modify, retrieve and report data in tables and views (created on tables). Just as with other SAS procedures, PROC SQL also has basic syntax structures. It takes the following general form:\n", "\n", "
\n",
    "PROC SQL;\n",
    "SELECT column-1<,…column-n>\n",
    "FROM table-1|view-1<,…table-n|view-n>\n",
    "<WHERE expression>\n",
    "<GROUP BY column-1<,…column-n>>\n",
    "<HAVING expression >\n",
    "<ORDER BY column-1<,…column-n >>;\n",
    "QUIT;\n",
    "\n",
    "
\n", "\n", "First of all, you may see differences in terminology between SQL and other SAS steps. For example, the data file is called data set in other SAS steps, but table in SQL. Correspondingly, records are called observations in the previous lessons, but rows in SQL tables; and we call a field of data set as variable, but column in this lesson.\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
Other SAS stepsSQL Procedure
data settable
observationrow
variablecolumn
\n", "\n", "Another thing that needs your attention is that, unlike other SAS procedures, there could be one or a few SELECT statements inside PROC SQL. One SELECT statement is called a query, which is composed of many clauses, like SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY. The order of these clauses is important. They must appear in the order as shown above.\n", "\n", "We will use the whole lesson to work our way through all these keywords in PROC SQL. Let’s start with the most basic one.\n", "\n", "
\n", "

Example

\n", "

The following SAS SQL code is just query that retrieves data from a single table:

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "LIBNAME phc6089 \"/folders/myfolders/SAS_Notes/data\";\n", "PROC SQL;\n", " select ID, SATM, SATV\n", " from phc6089.survey;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

To run the program above, you will need to save the SAS data file (survey.sas7bdat) to your computer first (see the data folder on the course website). Edit LIBNAME statement to reflect the directory in which you saved the survey data set. Then run the program and check the output.

\n", "

The SQL procedure in this code represents the most basic form of the procedure. Like other SAS procedures, you need to run PROC SQL at the beginning to invoke it. Inside the procedure, there is only one statement starting with SELECT, which chooses the columns you want. You can list as many columns as needed, separated by commas. Another clause is FROM, which is used to specify the table(s). PROC SQL follows the same protocol of SAS file names. Here we used a two-level name to reference the permanent file. Just as you read the code, this program is used to select three columns (student id, SAT Math score and SAS Verbal score) from the table.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following SAS program uses CREATE TABLE statement to create a new table named SAT_scores, which contains student id, SAT math scores and verbal scores.

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROC SQL;\n", " CREATE TABLE SAT_Scores as\n", " select ID, SATM, SATV\n", " from phc6089.survey;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program. You may notice that there is no output displayed in the SAS output window or any open ODS destination. That’s because the CREATE TABLE statement suppresses the printed output of the query. However, if you check the SAS log window, it shows a message that indicates that the table has been created, and the number of rows and columns in the table (see output above). In this example, table SAT_scores has 226 rows and 3 columns. And the new table’s columns have the same attributes (type, length, format, label) as those of the selected source columns.

\n", "

From these two examples, you now have some idea about what PROC SQL is like to work with. Let’s summarize what makes it so unique from other SAS procedures.

\n", "
    \n", "
  1. Unlike other SAS procedures which contain many statements, the SQL procedure may consist of one or more than one SELECT statement. Each SELECT statement contains several clauses, like SELECT, FROM, WHERE, ORDER BY. But the SELECT and FROM clause are essential and indispensable. Other clauses are optional. All clauses have to be written in the order as listed in the syntax. For each one SELECT statement, only one semicolon is needed at the end of the statement.
  2. \n", "
  3. No RUN statement is required for PROC SQL to execute. SQL procedure will continue to run after you submit the program. To end it, you have to submit another PROC step, a DATA step, or a QUIT statement.
  4. \n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the SELECT Clause\n", "\n", "In the previous section, we learned the basics about PROC SQL. Next, we will investigate more details about the SELECT statement and how to use it to retrieve data, create new columns and what options are available for data manipulation.\n", "\n", "
\n", "

Example

\n", "

The following SAS program creates a new temporary table with all columns retrieved from permanent file traffic.sas7bdat (see the data folder on the course website):

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROC SQL;\n", "CREATE TABLE traffic as\n", " select *\n", " from phc6089.traffic;\n", "QUIT;\n", " \n", "PROC CONTENTS data=traffic VARNUM; \n", "RUN;\n", " \n", "PROC CONTENTS data=phc6089.traffic VARNUM;\n", "RUN;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

First, you need to download the permanent SAS data file traffic to your own computer. Revise the libname statement as needed. Then run the program.

\n", "

One thing you need to know about this program is the shortcut, noted with an asterisk (*) after SELECT. The asterisk refers to all columns in the original table. So, this code is to select all columns in the permanent file into the temporary file, traffic.

\n", "

To check the data, you may use the other procedures we learned in previous lessons, such as the PRINT procedure. In the above program, PROC CONTENTS has been used to check the variable attributes in the original and the new table. As we mentioned in the previous section, the variables chosen from other table(s) keep the same attributes.

\n", "

Besides selecting original columns, the SELECT clause can also be used to create new columns, just as we used assignment statements in DATA step to create new variables.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program is to create new columns with the SELECT statement:

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROC SQL;\n", " select id, count_location,\n", " scan(count_location,-1,' ') as orientation,\n", " street,\n", " passing_vehicle_volume * 0.5 as weekends_traffic_volume\n", " from traffic;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

As you can see, this code uses the traffic table we created previously. Using the SELECT statement you can create new columns that contain either characters or numbers. With valid statements within the SELECT clause, you can use any expression for new columns. And, the new columns can be named by using the keyword AS followed by the names you would like to use. (Column names also follow the rules for SAS names.) In the above code, the first new column is created by a character function scan(), which substring is the orientation information from the existing column, count_location. The name for this new column is orientation after AS. (It may make no sense, just for the use of example.) The second new column is a math expression that estimates the traffic volume during weekends by multiplying daily vehicle volume by 0.5. Its alias is weekends_traffic_volume.

\n", "

Launch and run the SAS program, and review the output to convince yourself that SAS does indeed create two new columns as you expect. But you should note that new columns only exist during the query, unless you created a table out of it.

\n", "

While observing the data in traffic, you may notice that some data are not formatted as you want. Fortunately, SAS provides many options in SELECT statement so you can enhance the appearance of the query output.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program adds the format to dates, labels columns and add titles to the output:

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROC SQL;\n", " TITLE \"Traffic volume in Area SS\";\n", " TITLE2 \"During weekdays and weekends\";\n", " select id, \n", " Date_of_count label='Date of Count' format=mmddyy10.,\n", " count_location label='Location',\n", " street,\n", " passing_vehicle_volume label='Daily Volume' format=comma6.,\n", " passing_vehicle_volume * 0.5 as weekends_traffic_volume label='Weekends Volume' format=comma6.\n", " from traffic;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program and then review the resulting output to convince yourself that the data has been formatted and labeled as you expect. Except for titles, you can also add a footnote to the output using footnote statement. But unlike using title and footnote statements with other SAS steps, both statements have to be placed either before the PROC SQL statement, or between the PROC SQL statement and SELECT statement.

\n", "
\n", "

One more thing we will talk about in this section is the CASE operator, which just follows the SELECT clause to create new columns conditionally. You must remember that this applies only to IF-THEN-ELSE statements that are available in DATA step. In PROC SQL, the CASE operator can perform the equivalent functions. First, let’s look at the syntax for the CASE construct.

\n", "
\n",
    "CASE\n",
    "WHEN when-condition THEN result-expression\n",
    "<… WHEN when-condition THEN\n",
    "result-expression>\n",
    "<ELSE result-expression>\n",
    "END AS < column name>\n",
    "
\n", "

As in IF-THEN statements, you can add as many WHEN conditions as you want. The conditions can be any valid SAS expression, including calculations, functions, and logical operators. It works as IF-THEN statements, too. If the conditions have been met, SAS will carry out the corresponding actions following the keyword THEN. If the WHEN condition is false, then PROC SQL executes the ELSE expression. You can create a new column and name it with AS keywords after END. The ELSE and AS keywords are optional. But it’s good practice to keep original columns while creating new ones.

\n", "
\n", "

Example

\n", "

The following SAS program uses CASE operator to assign different salary raise plans for each salary range:

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROC SQL ;\n", " select Name,\n", " Department,\n", " employee_annual_salary label='salary' format=dollar12.2,\n", " 'next year raise:',\n", " case\n", " when employee_annual_salary=. then .\n", " when employee_annual_salary < 85000 then 0.05\n", " when 85000 <= employee_annual_salary < 125000 then 0.03\n", " when employee_annual_salary >=125000 then 0.01\n", " else 0\n", " end as raise format=percent8.\n", " from phc6089.salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

You already know format and label options from the previous explanations. There are a couple of new things in this example, however. First, you can insert a character(or numeric) constant as a new column in the table. Here a character string “next year raise” has been added between salary and raise. Raise is also a new column which has been created by the CASE operator based on the current annual salary of each person.

\n", "

Download the SAS data set salary.sas7bdat (see the data folder on the course website) on your computer and revise the libname statement to reflect the directory where you save the file. Then launch and run the program. Review the query result to convince yourself that the raise values have been assigned correctly.

\n", "
\n", "

The CASE operator has two forms of syntax. In fact, if you use only one column for WHEN condition(s), this column’s name can be put after CASE and before WHEN. So you don’t have to repeat the column’s name in each WHEN condition. Below is the syntax for this form:

\n", "
\n",
    "CASE <column-name>\n",
    "when-condition THEN result-expression\n",
    "<… WHEN when-condition THEN\n",
    "result-expression>\n",
    "<ELSE result-expression>\n",
    "END AS < column name>\n",
    "
\n", "
\n", "

Example

\n", "

The following program uses the simpler form of CASE construct to decide compensation (Yes or N/A) based on departments:

\n", "
" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NameDepartmentsalaryCompensation
WARNERPROCUREMENT$76,980.00N/A
EDWARDSPOLICE$83,616.00Yes
PENDARVISPOLICE$103,590.00Yes
CLARKFIRE$85,680.00Yes
FOLINOPOLICE$83,616.00Yes
SAWYERCITY COUNCIL$117,333.00N/A
WALTONPOLICE$89,718.00Yes
QUINLANTRANSPORTN$95,888.04N/A
SOTOWATER MGMNT$79,040.00N/A
RUSSWATER MGMNT$79,040.00N/A
MC GUIREPOLICE$86,520.00Yes
GAWRISCHPOLICE$86,520.00Yes
CANOFIRE$86,520.00Yes
CLARKSTREETS & SAN$72,862.40N/A
GIBOWICZPOLICE$83,616.00Yes
MANCILLAPOLICE$90,456.00Yes
WYATTWATER MGMNT$65,686.40N/A
VALLEFIRE$54,114.00Yes
SISSACPUBLIC LIBRARY$12,407.20N/A
ARMSTEADPOLICE$80,778.00Yes
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL outobs=20;\n", " select Name,\n", " Department,\n", " employee_annual_salary label='salary' format=dollar12.2,\n", " case department\n", " when 'POLICE' then 'Yes'\n", " when 'FIRE' then 'Yes'\n", " else 'N/A'\n", " end as Compensation\n", " from phc6089.salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

The above code uses the same data set as the previous example, salary. It assigns the different compensation plans based on which department people work for and creates a new column, Compensation, for the result. This time, the column name Department has been put outside the WHEN conditions and into CASE operator. So we don’t need coding like “WHEN department=’POLICE’” any more.

\n", "

Another feature is the option you can use in the PROC SQL statement, OUTOBS=n. It can be used to limit the number of rows displayed in the output. So in this case, we would expect the table in the output window shows the first 20 rows of the data. And such a warning message will be delivered in the log file.

\n", "
WARNING: Statement terminated early due to OUTBOS=20 option.
\n", "

Note that OUTOBS= will also affect tables that are created by the CREATE TABLE statement.

\n", "

Launch and run the program. Then check the query result to make sure the records have been processed as expected. Note that you have to be cautious with this simpler form. For instance, if you move Employee_annual_salary out of the WHEN conditions in the program of the previous example, SAS will report an error and not execute!

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the WHERE Clause\n", "\n", "As you know, the WHERE statement or option in DATA step or other procedures is very useful in selecting observations from a data set based on some criteria. In PROC SQL, the WHERE clause in the SELECT statement can also be used to subset data based on specified conditions. Any valid SAS expression can be put inside WHERE clause, including functions, comparison or logical operators and even some special operators. Making the good use of it can increase programming efficiency and save computing resources greatly. As always, we will work through this subject with examples.\n", "\n", "
\n", "

Example

\n", "

The following example uses the WHERE clause to select employees who work at a police department and have the job title as sergeant:

\n", "
" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NameDepartmentEmployee Annual Salary
PENDARVISPOLICE103590
ODUMPOLICE106920
DANIELSPOLICE106920
TATE JRPOLICE103590
GOLDSMITHPOLICE110370
FRANKOPOLICE106920
RADDATZPOLICE110370
FLEMINGPOLICE100440
CASEYPOLICE110370
MC COYPOLICE110370
JACOBSPOLICE106920
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Name, Department, Employee_annual_salary \n", " from phc6089.salary\n", " where Department='POLICE' AND Position_title='SERGEANT';\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Reading through the program, you must have known that it selects the name, department and annual salary information from salary data for police sergeants. Note that the columns in the WHERE clause do not have to be specified in the SELECT clause, (such as Position title), which is used in the WHERE clause but not in the SELECT clause. However, for the sake of the results checking, I would suggest to keep these columns in the query until verified.

\n", "
\n", "

Launch and run the SAS program, and review the output to convince yourself that the records have been selected as described.

\n", "

We saw two types of operators used in the above program, the comparison (=) and the logical (and). Besides these common ones, another type that could be very useful in your programming is called a conditional operator. You may know some of them already, like IN, CONTAINS and MISSING. You can find the complete list of operators in the SAS documentation. Next, let’s look at a couple of examples on this using BETWEEN AND and LIKE.

\n", "
BETWEEN value-1 AND value-2
\n", "

Both value-1 and value-2 are end values. So you can use the BETWEEN AND operator to specify a range of values, such as from one date to another, or from lower limit to upper limit. The smaller value does not have to be the first.

\n", "
\n", "

Example

\n", "

The following program uses the operator, BETWEEN AND, to select observations from salary data whose annual salary is between \\$65,000 and \\$70,000, and also works in Fire department:

\n", "
" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NameDepartmentSalary
KELLYFIRE$65,946.00
KOCHANEYFIRE$65,946.00
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Name, Department,\n", " Employee_annual_salary label='Salary' format=DOLLAR12.2\n", " from phc6089.salary\n", " where Employee_annual_salary between 65000 and 70000\n", " and Department='FIRE';\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program, and review the query output to convince yourself that the SAS yield the result as expected.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another useful operator is the LIKE operator:\n", "\n", "`Column LIKE ‘pattern’`\n", "\n", "With the LIKE operator, you have to specify a column name and the pattern to be matched. Regarding the pattern, first it is case-sensitive and has to be enclosed in quotation marks; secondly, it may contain a special character, either an underscore(_) and/or percent sign(%). The underscore character stands for any single character and the percent sign for any sequence of zero or more characters. For example, assume that you are working with a table containing these values for a column.\n", "\n", "* Cathy\n", "* Kathy\n", "* Kathie\n", "* Katherine\n", "\n", "Now using different patterns, the selection results are different:\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
Patterns Results
Kath_Kathy
Kath__Kathie
Kath%Kathy, Kathie, Katherine
_ath%All of the names above
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program shows the use of the LIKE operator in a WHERE clause to select name, department, position title and annual salary information for people whose name starts with R and the third letter is B:

\n", "
" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NameDepartmentPosition TitleSalary
ROBINSONWATER MGMNTOPERATING ENGINEER-GROUP C$93,745.60
RABANALESFINANCEAUDITOR II$87,912.00
ROBERTSFIREPARAMEDIC I/C$79,404.00
ROBINSONWATER MGMNTCONSTRUCTION LABORER$79,040.00
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Name, Department, Position_title,\n", " Employee_annual_salary label='Salary' format=DOLLAR12.2\n", " from phc6089.salary\n", " where Name like 'R_B%';\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program, and review the query output to convince yourself that the SAS behaves as described.

\n", "
\n", "
\n", "Another point worthy of being made here is the CALCULATED keywords. In the last section you learned that we can perform calculations in SELECT statement and assign an alias to that new column. However, because SAS processes the WHERE clause prior to the SELECT clause, you will run into a problem if the calculated column is used in a WHERE clause as condition. Therefore, the keyword CALCULATED has to be inserted into the WHERE clause along with the alias to inform SAS that the value is calculated within the query. This point will be illustrated by the following programs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program attempts to calculate the bonus for every employee, then select ones who has more than \\$2,000 as bonus:

\n", "
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
176  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
176! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
177
178 PROC SQL;
179 select Name, Department,
180 Employee_annual_salary label='Salary' format=DOLLAR12.2,
181 Employee_annual_salary * 0.02 as Bonus
182 from phc6089.salary
183 where Bonus >2000 ;
NOTE: Data file PHC6089.SALARY.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
ERROR: The following columns were not found in the contributing tables: Bonus.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
184 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

185
186 ods html5 (id=saspy_internal) close;ods listing;

187
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Name, Department,\n", " Employee_annual_salary label='Salary' format=DOLLAR12.2,\n", " Employee_annual_salary * 0.02 as Bonus\n", " from phc6089.salary\n", " where Bonus > 2000 ;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program. You may want to see what’s going wrong yourself. In the log window, SAS delivered an error message that the column Bonus cannot be found (see output above). That’s because SAS processes the WHERE clause before the SELECT clause. To make it right, add CALCULATED in the WHERE clause as shown below.

\n", "
" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
NameDepartmentSalaryBonus
PENDARVISPOLICE$103,590.002071.8
SAWYERCITY COUNCIL$117,333.002346.66
ODUMPOLICE$106,920.002138.4
PLANTZGENERAL SERVICES$102,060.002041.2
FORDFIRE$100,440.002008.8
DANIELSPOLICE$106,920.002138.4
MALONEYFIRE$127,566.002551.32
MOLLOYCOMMUNITY DEVELOPMENT$102,060.002041.2
NIEGOFIRE$143,682.002873.64
PRICEFIRE$103,590.002071.8
TATE JRPOLICE$103,590.002071.8
JIMENEZPROCUREMENT$113,448.002268.96
CEBALLOSWATER MGMNT$113,448.002268.96
FERMANFIRE$131,466.002629.32
SHUMTRANSPORTN$104,736.002094.72
FUNKFIRE$101,688.002033.76
WRZESINSKIFIRE$105,918.002118.36
GOLDSMITHPOLICE$110,370.002207.4
FRANKOPOLICE$106,920.002138.4
RADDATZPOLICE$110,370.002207.4
MC NABBFIRE$105,918.002118.36
FLEMINGPOLICE$100,440.002008.8
CASEYPOLICE$110,370.002207.4
MACELLAIO JRWATER MGMNT$101,920.002038.4
DARLINGLAW$149,160.002983.2
NASHFINANCE$103,740.002074.8
MC COYPOLICE$110,370.002207.4
HOLDERWATER MGMNT$108,534.402170.688
TAYLORFIRE$108,462.002169.24
PERFETTIPOLICE$112,068.002241.36
JACOBSPOLICE$106,920.002138.4
HENRYAVIATION$108,534.402170.688
IRELANDFIRE$113,400.002268
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Name, Department,\n", " Employee_annual_salary label='Salary' format=DOLLAR12.2,\n", " Employee_annual_salary * 0.02 as Bonus\n", " from phc6089.salary\n", " where CALCULATED Bonus > 2000 ;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Now it’s working! Make the same change to your program. Check the output to make sure that SAS processes the data properly.

\n", "

An alternative to using the keyword CALCULATED is to repeat the calculation expression in the WHERE clause. In the preceding program, the WHERE clause can be rewritten as:

\n", "
where Employee_annual_salary *0.02 >2000;
\n", "

But note that this is not an efficient way to do this because SAS has to do the calculation twice.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sorting Data\n", "\n", "The SELECT and FROM clauses are indispensable in the SQL query. Other clauses may be optional but very useful when querying a table. Last section introduced the WHERE clause and how to use it to select rows conditionally. From previous examples, you may have noticed that the order of observations might remain the same as they were in original data set. If, however, you want to specify the order of data, you will need the ORDER BY clause to sort the data as you want.\n", "\n", "
\n", "

Example

\n", "

The following SAS program uses ORDER BY inside PROC SQL to sort the data in the file survey.sas7bdat by the values of gender and GPA:

\n", "
" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idGenderGPASATMSATV
1219Female3.01630590
1039Female3.02560560
1125Female3.08470510
1203Female3.1500610
1139Female3.1530600
1068Female3.1550610
1116Female3.1560550
1072Female3.1570570
1138Female3.12560580
1120Female3.16680670
1102Female3.2620630
1142Female3.2720500
1020Female3.2600630
1201Female3.21760660
1089Female3.25500600
1133Female3.27450550
1144Female3.27640600
1163Female3.3600590
1038Female3.3680650
1069Female3.3600490
1033Female3.3650600
1115Female3.3580620
1057Female3.3600600
1037Female3.3500400
1109Female3.31500490
1215Female3.33650630
1078Female3.33570490
1060Female3.36540550
1030Female3.36450450
1129Female3.4500540
1100Female3.4490520
1196Female3.41480560
1046Female3.42660580
1165Female3.45560500
1148Female3.46620640
1082Female3.48550690
1096Female3.48750550
1168Female3.5650560
1091Female3.5800650
1018Female3.5600750
1181Female3.5600550
1123Female3.51560530
1177Female3.53520500
1063Female3.53560590
1015Female3.55400600
1058Female3.55585590
1160Female3.57700700
1159Female3.59640440
1202Female3.6460540
1004Female3.6710560
1086Female3.6540570
1035Female3.61500550
1187Female3.62780660
1149Female3.63640700
1110Female3.63700600
1011Female3.67690690
1044Female3.7570560
1145Female3.7500450
1207Female3.7640670
1052Female3.7600620
1218Female3.71540560
1174Female3.74680600
1118Female3.74650700
1031Female3.75640620
1073Female3.76600550
1005Female3.76600520
1055Female3.76760600
1204Female3.77650630
1199Female3.77620640
1134Female3.78650600
1208Female3.78550400
1010Female3.8580540
1197Female3.8600700
1205Female3.8550550
1077Female3.81560610
1105Female3.81510680
1179Female3.83660660
1023Female3.88670680
1094Female3.89640710
1067Female3.9640560
1065Female3.9575600
1081Female3.94620600
1014Female4700700
1075Female4650550
1214Female4590500
1157Male3.02400400
1047Male3.02700570
1084Male3.03690690
1140Male3.04540600
1152Male3.05600500
1130Male3.06660620
1151Male3.08590590
1053Male3.08420490
1097Male3.1640430
1137Male3.1580640
1090Male3.12560330
1185Male3.13600600
1101Male3.13570490
1209Male3.14580580
1041Male3.16620760
1220Male3.19480480
1153Male3.2550630
1071Male3.2650660
1119Male3.2640630
1045Male3.2680710
1217Male3.21620400
1200Male3.21650590
1222Male3.24510680
1175Male3.3640600
1170Male3.3610590
1016Male3.3640600
1095Male3.33690650
1212Male3.35570480
1098Male3.36540520
1027Male3.36550600
1017Male3.38720580
1087Male3.4580570
1161Male3.4600630
1036Male3.42530760
1221Male3.42590610
1194Male3.46640600
1009Male3.48690620
1029Male3.5680680
1048Male3.51650590
1024Male3.51600730
1034Male3.53670630
1206Male3.53620630
1111Male3.54540570
1150Male3.55630650
1106Male3.57620500
1042Male3.66640640
1169Male3.67650670
1019Male3.7600720
1080Male3.72660660
1079Male3.72700640
1059Male3.72800750
1173Male3.73580590
1166Male3.74700520
1070Male3.76600610
1062Male3.76670670
1226Male3.78630520
1162Male3.83710710
1006Male3.86610720
1180Male3.86720500
1172Male3.87780580
1122Male3.88670510
1131Male3.92730800
1007Male3.94710670
1028Male4610600
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select ID, Gender, GPA, SATM, SATV\n", " from phc6089.survey\n", " where SATV is not null and GPA > 3\n", " order by Gender, GPA ;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program, and review the output to convince yourself that the query result is in order first by gender and then by GPA.

\n", "

Several things need to be pointed out regarding the above program:

\n", "
    \n", "
  1. You can use one or more column in ORDER BY to sort the data. Comma is used to separate multiple column names. In this example, two columns have been used, Gender and GPA. So the data will be sorted by Gender first, then by GPA in order.
  2. \n", "
  3. By default, the values of column(s) will be sorted ascendingly. For example, there are two values in Gender, Female and Male. In the query result, Female records are listed first, then male ones because SAS sorted them by the first letter in alphabetical order. As to GPA order, since it’s numeric, SAS sorted observations by number values of GPA inside each gender group.
  4. \n", "
  5. The WHERE clause is used to select observations that her/his SAT verbal score is not missing and GPA greater than 3. “is not null” and “is not missing” are interchangeable to indicate no missing values included.
  6. \n", "
\n", "

As in PROC SORT, if you want to change the default ascending order into descending order, you just need to specify DESC following the column name.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following SAS program sorts the data survey.sas7bdat by the values of gender in descending order then by GPA ascendingly:

\n", "
" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idGenderGPASATMSATV
1047Male3.02700570
1157Male3.02400400
1084Male3.03690690
1140Male3.04540600
1152Male3.05600500
1130Male3.06660620
1053Male3.08420490
1151Male3.08590590
1137Male3.1580640
1097Male3.1640430
1090Male3.12560330
1185Male3.13600600
1101Male3.13570490
1209Male3.14580580
1041Male3.16620760
1220Male3.19480480
1045Male3.2680710
1071Male3.2650660
1119Male3.2640630
1153Male3.2550630
1200Male3.21650590
1217Male3.21620400
1222Male3.24510680
1175Male3.3640600
1016Male3.3640600
1170Male3.3610590
1095Male3.33690650
1212Male3.35570480
1098Male3.36540520
1027Male3.36550600
1017Male3.38720580
1161Male3.4600630
1087Male3.4580570
1221Male3.42590610
1036Male3.42530760
1194Male3.46640600
1009Male3.48690620
1029Male3.5680680
1048Male3.51650590
1024Male3.51600730
1034Male3.53670630
1206Male3.53620630
1111Male3.54540570
1150Male3.55630650
1106Male3.57620500
1042Male3.66640640
1169Male3.67650670
1019Male3.7600720
1080Male3.72660660
1079Male3.72700640
1059Male3.72800750
1173Male3.73580590
1166Male3.74700520
1070Male3.76600610
1062Male3.76670670
1226Male3.78630520
1162Male3.83710710
1180Male3.86720500
1006Male3.86610720
1172Male3.87780580
1122Male3.88670510
1131Male3.92730800
1007Male3.94710670
1028Male4610600
1219Female3.01630590
1039Female3.02560560
1125Female3.08470510
1072Female3.1570570
1139Female3.1530600
1068Female3.1550610
1116Female3.1560550
1203Female3.1500610
1138Female3.12560580
1120Female3.16680670
1020Female3.2600630
1142Female3.2720500
1102Female3.2620630
1201Female3.21760660
1089Female3.25500600
1144Female3.27640600
1133Female3.27450550
1163Female3.3600590
1038Female3.3680650
1037Female3.3500400
1069Female3.3600490
1033Female3.3650600
1057Female3.3600600
1115Female3.3580620
1109Female3.31500490
1078Female3.33570490
1215Female3.33650630
1060Female3.36540550
1030Female3.36450450
1129Female3.4500540
1100Female3.4490520
1196Female3.41480560
1046Female3.42660580
1165Female3.45560500
1148Female3.46620640
1082Female3.48550690
1096Female3.48750550
1181Female3.5600550
1168Female3.5650560
1018Female3.5600750
1091Female3.5800650
1123Female3.51560530
1177Female3.53520500
1063Female3.53560590
1058Female3.55585590
1015Female3.55400600
1160Female3.57700700
1159Female3.59640440
1086Female3.6540570
1004Female3.6710560
1202Female3.6460540
1035Female3.61500550
1187Female3.62780660
1149Female3.63640700
1110Female3.63700600
1011Female3.67690690
1044Female3.7570560
1145Female3.7500450
1052Female3.7600620
1207Female3.7640670
1218Female3.71540560
1174Female3.74680600
1118Female3.74650700
1031Female3.75640620
1073Female3.76600550
1055Female3.76760600
1005Female3.76600520
1204Female3.77650630
1199Female3.77620640
1134Female3.78650600
1208Female3.78550400
1010Female3.8580540
1205Female3.8550550
1197Female3.8600700
1077Female3.81560610
1105Female3.81510680
1179Female3.83660660
1023Female3.88670680
1094Female3.89640710
1067Female3.9640560
1065Female3.9575600
1081Female3.94620600
1075Female4650550
1014Female4700700
1214Female4590500
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select ID, Gender, GPA, SATM, SATV\n", " from phc6089.survey\n", " where SATV is not null and GPA > 3\n", " order by Gender desc, 3 ;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

There are only two places that are different from the program in the previous example. DESC has been added after Gender to tell SAS to sort the data descending. Another way to refer to the column rather than its name is its location in the SELECT clause. GPA is listed as the third one so that we can use 3 to specify GPA.

\n", "

Launch and run the SAS program, and then review the output to convince yourself that the output from this query is in descending order of Gender and in ascending order of GPA.

\n", "

Up until now, you might think that ORDER BY can perform the same as PROC SORT. Actually, it can do more than that. Let’s find out with the next example.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program sorts the survey data first by gender in descending order as before, then by mean values of SAT math and verbal scores in ascending order:

\n", "
" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idGenderGPASmokeCigarettesSATMSATV
1157Male3.02No400400
1090Male3.12No560330
1053Male3.08No420490
1220Male3.19No480480
1217Male3.21No620400
1212Male3.35No570480
1101Male3.13No570490
1098Male3.36No540520
1097Male3.1No640430
1152Male3.05No600500
1111Male3.54No540570
1106Male3.57No620500
1140Male3.04Yes540600
1226Male3.78No630520
1087Male3.4No580570
1027Male3.36No550600
1209Male3.14No580580
1173Male3.73No580590
1122Male3.88No670510
1153Male3.2No550630
1151Male3.08No590590
1222Male3.24No510680
1185Male3.13No600600
1221Male3.42Yes590610
1170Male3.3No610590
1070Male3.76No600610
1028Male4No610600
1137Male3.1No580640
1166Male3.74No700520
1180Male3.86No720500
1161Male3.4No600630
1175Male3.3Yes640600
1016Male3.3No640600
1194Male3.46No640600
1048Male3.51No650590
1200Male3.21No650590
1206Male3.53No620630
1119Male3.2No640630
1047Male3.02No700570
1150Male3.55No630650
1042Male3.66No640640
1130Male3.06No660620
1036Male3.42No530760
1034Male3.53No670630
1017Male3.38No720580
1071Male3.2No650660
1009Male3.48No690620
1080Male3.72No660660
1169Male3.67No650670
1019Male3.7No600720
1006Male3.86No610720
1024Male3.51No600730
1079Male3.72No700640
1062Male3.76No670670
1095Male3.33No690650
1172Male3.87Yes780580
1029Male3.5No680680
1041Male3.16No620760
1007Male3.94No710670
1084Male3.03No690690
1045Male3.2No680710
1162Male3.83No710710
1131Male3.92No730800
1059Male3.72No800750
1037Female3.3No500400
1030Female3.36No450450
1145Female3.7No500450
1208Female3.78No550400
1125Female3.08No470510
1109Female3.31No500490
1133Female3.27No450550
1015Female3.55No400600
1202Female3.6No460540
1100Female3.4No490520
1177Female3.53No520500
1196Female3.41No480560
1129Female3.4No500540
1035Female3.61No500550
1078Female3.33No570490
1165Female3.45No560500
1159Female3.59No640440
1069Female3.3No600490
1060Female3.36No540550
1123Female3.51No560530
1214Female4No590500
1089Female3.25Yes500600
1218Female3.71No540560
1205Female3.8No550550
1086Female3.6No540570
1116Female3.1No560550
1203Female3.1No500610
1039Female3.02No560560
1010Female3.8No580540
1005Female3.76No600520
1044Female3.7No570560
1139Female3.1No530600
1072Female3.1No570570
1138Female3.12No560580
1181Female3.5No600550
1073Female3.76No600550
1063Female3.53No560590
1068Female3.1No550610
1077Female3.81No560610
1065Female3.9No575600
1058Female3.55No585590
1163Female3.3No600590
1105Female3.81No510680
1075Female4No650550
1067Female3.9No640560
1057Female3.3No600600
1115Female3.3No580620
1168Female3.5No650560
1081Female3.94No620600
1142Female3.2No720500
1052Female3.7No600620
1219Female3.01No630590
1020Female3.2No600630
1082Female3.48No550690
1144Female3.27No640600
1046Female3.42No660580
1033Female3.3No650600
1134Female3.78No650600
1102Female3.2No620630
1148Female3.46No620640
1031Female3.75No640620
1199Female3.77No620640
1004Female3.6No710560
1174Female3.74No680600
1215Female3.33No650630
1204Female3.77No650630
1110Female3.63No700600
1096Female3.48No750550
1197Female3.8No600700
1207Female3.7No640670
1179Female3.83No660660
1038Female3.3No680650
1149Female3.63No640700
1023Female3.88No670680
1018Female3.5No600750
1120Female3.16No680670
1118Female3.74No650700
1094Female3.89No640710
1055Female3.76No760600
1011Female3.67Yes690690
1160Female3.57No700700
1014Female4No700700
1201Female3.21Yes760660
1187Female3.62No780660
1091Female3.5No800650
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select *\n", " from phc6089.survey\n", " where SATV is not null and GPA>3\n", " order by Gender desc, MEAN(SATM,SATV) ;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Since all columns will be used in the query, * is used to specify all columns after SELECT. The WHERE clause remains the same. In the ORDER BY clause, besides Gender, one function is used to calculate the average scores of SATM and SATV, then uses the calculation results to sort the data inside each gender group. To get the same result, could you try other SAS steps and count how many of them will be needed?

\n", "

Launch and run the SAS program, and review the output to convince yourself that the data has been sorted in desired order.

\n", "

One more thing, you may notice a note in log window when running this program.

\n", "
NOTE: The query as specified involves ordering by an item that doesn't appear in its SELECT clause.
\n", "

That’s because MEAN(SATM,SATV) is not listed in the SELECT clause, only in the ORDER BY clause.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summarizing and Grouping Data\n", "\n", "In previous sections, we use the SQL procedure to generate detailed reports. Sometimes, the summarized report is also necessary for us to explore data. To do that, we are going to need summary functions and/or the GROUP BY clause in PROC SQL.\n", "\n", "Many summary functions that are used in other SAS steps can also work well in PROC SQL. Below is the table of summary functions you can request:\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
Summary function Description
AVG, MEANmean or average of values
COUNT, FREQ, Nnumber of non-missing values
CSScorrected sum of squares
CVcoefficient of variation(percent)
MAXlargest value
MINsmallest value
NMISSnumber of missing values
PRTprobability of a greater absolute value of student's t
RANGErange of values
STDstandard deviation
STDERRstandard error of the mean
SUMsum of values
Tstudent's t value for testing hypothesis
USSuncorrected sum of squares
VARvariance
\n", "\n", "Note: some functions have multiple names. The first listed is the SQL name.\n", "\n", "Next we will work through examples to see how these functions perform calculations in PROC SQL. Along the way, the GROUP BY clause will be introduced and work with the functions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program uses the AVG() function to calculate the mean scores of SAT math and verbal test:

\n", "
" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
average_Mathaverage_Verbal
599.0046580.3256
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select avg(SATM) as average_Math,\n", " avg(SATV) as average_Verbal\n", " from phc6089.survey;\n", "QUIT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

First launch and run the SAS program. When checking the output you will see two overall average scores have been calculated for SATM and SATV separately. There is only one observation in the output window.

\n", "

Let’s review the function in the code. To calculate average, either MEAN() or AVG() can be used in this case. Note that there is only one argument (column) inside the function AVG(). So the statistic is calculated across all rows for one column.

\n", "
    \n", "
  • AVG(STAM)—the overall average score of SATM
  • \n", "
  • AVG(STAV)—the overall average score of SATV
  • \n", "
\n", "

Quite simple, right? Let’s add one more argument into the function. Can you guess how many observations will be in the output?

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

In the following program, two columns are the arguments of the function MEAN():

\n", "
" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
average
700
600
470
635
560
665
690
595
655
560
690
.
515
700
500
620
650
675
660
615
450
650
675
665
620
560
575
605
680
450
630
350
625
650
525
645
450
665
560
.
690
640
615
565
695
620
635
620
630
450
600
610
455
605
680
550
600
587.5
775
545
615
670
575
625
587.5
425
600
580
545
605
655
570
575
475
600
580
585
530
670
660
610
620
570
690
710
555
575
.
550
445
725
575
560
675
670
650
535
530
650
505
530
625
475
.
595
560
580
525
495
650
555
.
600
530
600
555
525
675
635
675
475
590
545
435
490
645
625
610
520
640
765
420
500
625
585
625
610
570
565
570
.
610
525
620
475
590
600
630
670
640
590
550
590
485
.
660
400
540
540
700
615
710
595
625
530
610
535
605
660
600
500
680
585
640
620
595
510
650
660
610
575
550
505
.
600
530
720
610
590
540
530
.
500
620
555
520
650
525
630
620
710
500
555
640
550
625
655
475
580
625
610
525
425
545
640
640
510
550
610
480
600
595
.
640
635
575
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select mean(SATM, SATV) as average\n", " from phc6089.survey;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

We changed the program a little bit. Both SATM and SATV are put inside the function as arguments. Launch and run the SAS program. You will see there are 226 observations, which is the same as in the original survey data.

\n", "

If you add more than one column as arguments of summary functions, SAS will perform the calculation across the columns for each row to generate the above output.

\n", "

In this case, the summary function is not performing aggregation anymore. SAS then looks for a like-named function in BASE SAS. If yes, the calculation will be performed for each row; if not, an error message will be output in the log window. You can try to change MEAN() to AVG() to see what will happen.

\n", "
ERROR: Function AVG could not be located.
\n", "
\n", "
\n", "

Example

\n", "

The following program uses only one argument for MEAN(), but add one more column in the SELECT clause:

\n", "
" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Genderaverage_Math
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Female599.0046
Female599.0046
Male599.0046
Female599.0046
Female599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Male599.0046
Female599.0046
Male599.0046
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Gender, \n", " mean(SATM) as average_Math\n", " from phc6089.survey;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

In the above program, the the SELECT statement changed again. This time, only one argument is for the MEAN() function to calculate the overall average score of SAT math grades. Outside the function, another column has been selected as well. What output will it produce?

\n", "

Launch and run the SAS program. You may be surprised that the output contains 226 rows. Review the output you will see two things that have been done by the above code:

\n", "
    \n", "
  1. It calculated the overall average math score for all students.
  2. \n", "
  3. It displayed all rows in the output because Gender is not an argument of MEAN() function.
  4. \n", "
\n", "

Note that the overall average math score is just repeated for each row. You can find a message like the one below in the log window. When you submit such a program, SAS calculate the statistic first. Then merge it back with other columns. That’s how “remerging” happens.

\n", "
NOTE: The query requires remerging summary statistics back with the original data
\n", "

The above result is not what we wanted. Now, let’s see how to use the GROUP BY clause to make it reasonable.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following example calculates the average SAT math score for each gender group:

\n", "
" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Genderaverage_Math
Female589.5082
Male611.3298
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Gender,\n", " mean(SATM) as average_Math\n", " from phc6089.survey\n", " group by Gender;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

The above program seems identical to the program in the previous example except for one more clause: GROUP BY. Finally, we get it right and obtain the desired result: the average SAT math scores for female and male students. Of course, you can make further use of GROUP BY by adding multiple columns. Let’s find out with the next example.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program uses both Gender and SmokeCigarettes in the GROUP BY clause to calculate the average SAT math scores:

\n", "
" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
GenderSmokeCigarettesaverage_Math
FemaleNo589.6552
FemaleYes586.6667
MaleNo613.2353
MaleYes593.3333
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Gender, SmokeCigarettes,\n", " mean(SATM) as average_Math\n", " from phc6089.survey\n", " group by 1, 2;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Launch and run the SAS program, then review the output. As you can see, the average math scores are calculated for each smoking group (Yes or No) inside each gender group (Female or Male).

\n", "

Just one more thing about this program, the columns can also be referred to by their locations in the SELECT clause as in the WHERE clause. Here, 1 and 2 are used to refer to Gender and SmokeCigarettes.

\n", "

Next, we will pay attention to one special summary function in SQL, which is COUNT(). You can use the COUN() function to count the non-missing values.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following example count the number of rows in survey data, the number of non-missing records for math and verbal test scores, and the distinct values of gender:

\n", "
" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
No_obsNo_Math_recordsNo_Verbal_recordsGender_group
2262162152
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select count(*) as No_obs,\n", " count(SATM) as No_Math_records,\n", " count(SATV) as No_Verbal_records,\n", " count(distinct Gender) as Gender_group\n", " from phc6089.survey;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

The above code reveals three different common ways of using the COUNT() function.

\n", "
    \n", "
  1. Count(*) is to count total number of rows in a table. COUNT() is the only function that allows you to use * as an argument.
  2. \n", "
  3. Count(column) is to count the number of non-missing values in a column. In the program, we count the number of non-missing values for math and verbal scores.
  4. \n", "
  5. Count(distinct column) is to count the total number of unique values in a column. In the above example, we count the number of gender categories.
  6. \n", "
\n", "

Launch and run the SAS program, then review the output. With knowledge of some of the missing values inside the table, we are not surprised to see the first three numbers unmatched. The total number of rows in survey data is 226. The total numbers of non-missing values of math and verbal scores are 216 and 215, separately. Both numbers are less than 226, which means there are missing values in each column, and SATV has one more value missing. There are only two categories in Gender, Male and Female. So the last count is 2.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using the HAVING Clause\n", "\n", "Previously we learned how to use the GROUP BY clause to group and summarize data. Sometimes, we want to select certain groups from the result. That’s when the HAVING clause comes into play.\n", "\n", "
\n", "

Example

\n", "

The following program calculates the average salary for each department, then select three departments as needed in the query output:

\n", "
" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
DepartmentAvg_salary
LAW$71,082.20
FINANCE$82,184.00
FIRE$90,742.33
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Department,\n", " avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2\n", " from phc6089.salary\n", " group by Department\n", " having Department in ('LAW','FINANCE','FIRE')\n", " order by Avg_salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Let’s review the program first. The code selects the column Department and uses the summary function AVG() to compute the average salaries. Since the GROUP BY clause also is also present in the SELECT statement, the averages are for each department. The user is only interested in three departments, Law, Finance and Fire. So we use the HAVING clause to select only these three to be output. Finally, we ask SAS to sort the data by average salaries. This program contains every clause we have learned so far except the WHERE clause, which we will address later.

\n", "

Launch and run the SAS program and review the output to make sure you understand the output.

\n", "

You may wonder if WHERE can do the same thing as HAVING does in the above program. You can try replacing Having with WHERE clause as following. You will get identical output as before.

\n", "
" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
DepartmentAvg_salary
LAW$71,082.20
FINANCE$82,184.00
FIRE$90,742.33
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Department,\n", " avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2\n", " from phc6089.salary\n", " where Department in ('LAW','FINANCE','FIRE')\n", " group by Department\n", " order by Avg_salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

However, let’s not assume that WHERE and HAVING are the same based on this. There are some big differences between them. Generally speaking, HAVING has control on grouped data during output; WHERE controls input data row by row. Let’s see more examples about these two commands.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program calculates the average salary for each department and choose ones having more than \\$70,000:

\n", "
" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
DepartmentAvg_salary
LAW$71,082.20
BOARD OF ELECTION$74,988.00
HEALTH$75,066.86
TRANSPORTN$79,438.18
POLICE$81,850.26
FINANCE$82,184.00
WATER MGMNT$84,780.42
PROCUREMENT$89,236.00
COMMUNITY DEVELOPMENT$90,096.00
DoIT$90,252.00
FIRE$90,742.33
ADMIN HEARNG$91,980.00
BUILDINGS$94,793.01
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Department,\n", " avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2\n", " from phc6089.salary\n", " group by Department\n", " having Avg_salary > 70000\n", " order by Avg_salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Only a small change has been made to this program. The condition in the HAVING clause changed the department average salary more than \\$70,000. So, the expression used in the HAVING statement is a summary function. And, the data is sorted by average values.

\n", "

Launch and run the SAS program and review the output. As we expect, all departments having more than \\$70,000 average salary are listed as the query result.

\n", "

Next, let’s try using WHERE to perform the same task.

\n", "
" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
349  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
349! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
350
351 PROC SQL;
352 select Department,
353 avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2
354 from phc6089.salary
355 where calculated Avg_salary > 70000
356 group by Department
357 order by Avg_salary;
NOTE: Data file PHC6089.SALARY.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
ERROR: Summary functions are restricted to the SELECT and HAVING clauses only.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
358 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

359
360 ods html5 (id=saspy_internal) close;ods listing;

361
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Department,\n", " avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2\n", " from phc6089.salary\n", " where calculated Avg_salary > 70000\n", " group by Department\n", " order by Avg_salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

You must remember that to use the computed result in the WHERE clause, the keyword “CALCULATED” should be inserted. Oops! SAS gives us an error message like this:

\n", "
ERROR: Summary functions are restricted to the SELECT and HAVING clauses only.
\n", "

This example illustrates a big difference between HAVING and WHERE. The summary functions can be used in a HAVING clause but not in a WHERE clause, because HAVING works on grouped data, but WHERE evaluates existing or calculated data row by row.

\n", "

Based on our current experiences with these two clauses, you might prefer to use HAVING since it can be used for both situations. However, don’t rush to this conclusion either. You will find out more in the next example.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following two SAS program are similar. The only difference is that the first program uses a WHERE clause and the second program uses a HAVING clause. They try to accomplish the same task: count how many employees at each position inside Police Department:

\n", "
" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Position TitleEmployees
ACCOUNTANT II1
CLINICAL THERAPIST III1
CROSSING GUARD5
CROSSING GUARD - PER AGREEMENT3
DIR OF POLICE RECORDS1
FISCAL ADMINISTRATOR1
POLICE OFFICER85
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER / FLD TRNG OFFICER1
SENIOR DATA ENTRY OPERATOR2
SERGEANT11
\n", "
\n", "
\n", "
\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Position TitleEmployees
ACCOUNTANT II1
CLINICAL THERAPIST III2
CROSSING GUARD5
CROSSING GUARD5
CROSSING GUARD5
CROSSING GUARD5
CROSSING GUARD5
CROSSING GUARD - PER AGREEMENT3
CROSSING GUARD - PER AGREEMENT3
CROSSING GUARD - PER AGREEMENT3
DIR OF POLICE RECORDS1
FISCAL ADMINISTRATOR1
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER85
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER (ASSIGNED AS DETECTIVE)13
POLICE OFFICER / FLD TRNG OFFICER1
SENIOR DATA ENTRY OPERATOR4
SENIOR DATA ENTRY OPERATOR4
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
SERGEANT11
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Position_Title,\n", " count(*) as Employees\n", " from phc6089.salary\n", " where Department='POLICE'\n", " group by Position_Title;\n", "QUIT;\n", " \n", "PROC SQL;\n", " select Position_Title,\n", " count(*) as Employees\n", " from phc6089.salary\n", " group by Position_Title\n", " having Department='POLICE';\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Now, Launch and run both programs. The output on the top is from the program using WHERE clause; the output on the bottom is the partial output from the program using HAVING clause.

\n", "

You might be surprised to see how different these two results are. One would expect a result like the output on the top. But the output on the bottom has so many more rows, and even some numbers do not match! Let’s review the code to understand what happened. There are two columns in the SELECT clause, Position_Title and a summary function, count(*), which counts total number of rows for each position group since we specify Position_Title in the GROUP BY clause. Unlike the programs in the previous example, the expression used inside WHERE and HAVING references another column, Department, which is not in the SELECT clause. Therefore, SAS handles them differently in the two programs.

\n", "

The first program uses the WHERE clause. Since SAS processes the WHERE clause before SELECT and on a row-by-row basis, the records from Police department are selected from the data first. Then SAS counts the number of employees under each position title inside the department. For example, there is only one person who is a “CLINICAL THERAPIST III” in the Police Department. So the count is 1. We obtained the desired output.

\n", "

On the other hand, the second program uses the HAVING clause. It is equivalent to the following program but without Department column in the output:

\n", "
" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
Position TitleDepartmentEmployees
ACCOUNTANT IIPOLICE1
CLINICAL THERAPIST IIIPOLICE2
CROSSING GUARDPOLICE5
CROSSING GUARDPOLICE5
CROSSING GUARDPOLICE5
CROSSING GUARDPOLICE5
CROSSING GUARDPOLICE5
CROSSING GUARD - PER AGREEMENTPOLICE3
CROSSING GUARD - PER AGREEMENTPOLICE3
CROSSING GUARD - PER AGREEMENTPOLICE3
DIR OF POLICE RECORDSPOLICE1
FISCAL ADMINISTRATORPOLICE1
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICERPOLICE85
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER (ASSIGNED AS DETECTIVE)POLICE13
POLICE OFFICER / FLD TRNG OFFICERPOLICE1
SENIOR DATA ENTRY OPERATORPOLICE4
SENIOR DATA ENTRY OPERATORPOLICE4
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
SERGEANTPOLICE11
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Position_Title,\n", " Department,\n", " count(*) as Employees\n", " from phc6089.salary\n", " group by Position_Title\n", " having Department='POLICE';\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

In this program, SAS counts employee numbers on each position across all departments because of GROUP BY clause. For example, there is each one person titled “CLINICAL THERAPIST III” in POLICE department and HEALTH department. So the total count on this position is 2. Since there is an extra column in SELECT clause besides the summary function and a GROUP BY column, all rows are in the output with counts on each job position. For instance, under position title “CLINICAL THERAPIST III”, both records have 2 as value of “Employees”. At last, SAS evaluates the condition (Department=POLICE) in HAVING clause to select rows for the output. That’s why you see Employees=2 for position title “CLINICAL THERAPIST III” in the output from the second query.

\n", "

We have seen two examples that show the differences between HAVING and WHERE so far. Since SAS handles them so differently, when it comes to WHERE or HAVING, pick one that fits your needs the best.

\n", "

Last but not the least, let’s check out one more cool feature of HAVING clause.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program selects the departments whose average salary is lower than the overall salary level:

\n", "
" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
DepartmentAvg_salary
DISABILITIES$36,264.00
OEMC$49,116.80
FAMILY & SUPPORT$53,642.00
PUBLIC LIBRARY$59,030.00
GENERAL SERVICES$65,600.80
BUSINESS AFFAIRS$65,652.00
CITY COUNCIL$66,983.00
AVIATION$67,704.48
STREETS & SAN$68,625.08
LAW$71,082.20
BOARD OF ELECTION$74,988.00
HEALTH$75,066.86
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " select Department,\n", " avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2\n", " from phc6089.salary\n", " group by Department\n", " having Avg_salary < (select avg(Employee_annual_salary) from phc6089.salary)\n", " order by Avg_salary;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Going through this program, you may not find anything unusual until HAVING clause. Inside the clause it’s not a standard expression as before, but a query:

\n", "
(select avg(Employee_annual_salary) from stat482.salary)
\n", "

Such kind of query is called subquery, inner query or nested query. You can use this query-expression in a HAVING or WHERE clause. The subquery used in this example is to calculate the overall average salary. The result is compared with average salaries of each department. Then SAS evaluates the condition “Less than” in HAVING clause to select departments who have less average salaries to output.

\n", "

Launch and run the SAS program, and review the query result. Convince yourself that the departments’ information has been selected as described.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Querying Multiple Tables\n", "\n", "So far, all the examples in this lesson are querying a single table. However, as matter of fact, you can specify multiple tables in the FROM clause. Querying more than more table at a time makes PROC SQL even more powerful in data manipulation.\n", "\n", "The following examples use two tables:\n", "\n", "Survey Data (survey.sas7bdat) contains:\n", "\n", "ID, Gender, GPA, SmokeCigarrets, SATM, SATV\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
idGenderGPASmokeCigSATMSATV
1001Male2.67No700700
1002Female2.98No700500
1003Female2.67No470470
1004Female3.6No710560
1005Female3.76No600520
1006Male3.86No610720
1007Male3.94No710670
1008Male2.8Yes610580
1009Male3.48No690620
\n", "\n", "Survey2 Data (survey2.sas7bdat) contains:\n", "\n", "ID, Seating, DiveInfluence, Height, Weight\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
idSeatingDriverInfluenHeightWeight
1001MiddleNo68190
1002MiddleNo54110
1003MiddleNo65225
1004MiddleNo52135
1005BackNo72128
1006MiddleNo70188
1007BackNo70155
1008MiddleYes68160
1009FrontNo72160
\n", "\n", "Download these two tables if you have not done so. Revise the libnameto reflect the directory that you save the files.\n", "\n", "
\n", "

Example

\n", "

The following program attempts to get demographic information about students from two separate tables, survey and survey2:

\n", "
" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
423  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
423! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
424
425 PROC SQL;
426 create table demo_info as
427 select ID, Gender, Height, Weight
428 from phc6089.survey, phc6089.survey2;
NOTE: Data file PHC6089.SURVEY.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
NOTE: Data file PHC6089.SURVEY2.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
ERROR: Ambiguous reference, column ID is in more than one table.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
429 QUIT;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

430
431 ods html5 (id=saspy_internal) close;ods listing;

432
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " create table demo_info as\n", " select ID, Gender, Height, Weight\n", " from phc6089.survey, phc6089.survey2;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Let’s review the code. In this SQL procedure, we used the CREATE TABLE clause to save and name the new table as demo_info. The subsequent SELECT clause chooses ID, gender, height and weight columns from two tables. In FROM clause, two tables’ names are listed.

\n", "

Launch and run the SAS program. You should expect no result in the output window because the CREATE TABLE clause suppresses output. On the other hand, check the log window and you will find the error message: “Ambiguous reference, column ID is in more than one table”.

\n", "

As you observed two tables, ID is in both tables and contains the same information. If a column in the SELECT statement appears in multiple tables, the table it is chosen from has to be specified by adding the table’s name in front as this:

\n", "
Table.Column
\n", "

So to make it right, we revise the previous program a little bit: change ID to survey.ID, which means that we use ID from survey data. The other change is the tables’ names. You can give a table an alias with or without the keyword AS after its original name. In the following program, we use S1 for survey data and S2 to survey2 data. And as you can see, it’s okay to use one level alias even for a permanent file. This makes life easier! In this way, ID can be specified as S1.ID.

\n", "
" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", " \n", " \n", " \n", "\n", "\n", "

\n", "\n", "
434  ods listing close;ods html5 (id=saspy_internal) file=stdout options(bitmap_mode='inline') device=svg style=HTMLBlue; ods
434! graphics on / outputfmt=png;
NOTE: Writing HTML5(SASPY_INTERNAL) Body file: STDOUT
435
436 PROC SQL;
437 create table demo_info as
438 select s1.ID, Gender, Height, Weight
439 from phc6089.survey as s1, phc6089.survey2 as s2;
NOTE: Data file PHC6089.SURVEY.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
NOTE: Data file PHC6089.SURVEY2.DATA is in a format that is native to another host, or the file encoding does not match the session
encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.DEMO_INFO created, with 51076 rows and 4 columns.

440 QUIT;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

441
442 ods html5 (id=saspy_internal) close;ods listing;

443
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " create table demo_info as\n", " select s1.ID, Gender, Height, Weight\n", " from phc6089.survey as s1, phc6089.survey2 as s2;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Everything seems good. Now launch and run the SAS program. As before, there is no output because of the CREATE TABLE statement. Check the log file in which there are two notes that need your attention (see the last two notes above).

\n", "

The first is “The execution of this query involves performing one or more Cartesian product joins that can not be optimized”. What is a Cartesian product? It refers to a query result in which each row in the first table is combined with every row in the second table. If you specify multiple tables in FROM clause but do not use a WHERE clause to choose needed rows, a Cartesian product is generated. For example, if we submit the following program:

\n", "
\n",
    "PROC SQL;\n",
    "Select *\n",
    "from table1, table2;\n",
    "
\n", "

Table1 has 3 rows; Table2 has 3 rows as well. Their Cartesian product contains (3*3)9 rows.

\n", "\n", "
\n", "
\n", "
\n", "

Table1

\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
namevalue1
x1
y2
z3
\n", "
\n", "\n", "
\\(\\times \\)
\n", "\n", "
\n", "

Table2

\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
namevalue2
A4
B5
C6
\n", "
\n", "\n", "
\\(=\\)
\n", "\n", "
\n", "

Result:

\n", "\n", "\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\t\n", "\t\t\n", "\t\n", "
namevalue1namevalue2
x1A4
x1B5
x1C6
y2A4
y2B5
y2C6
z3A4
z3B5
z3C6
\n", "
\n", "
\n", "

In the program for this example, there is no WHERE clause. So SAS generated a Cartesian product and gave you the note. Both Survey and Survey2 have 226 rows in the table. The query should have (226*226) = 51076 rows as the result. That’s why you got the other note, “Table Work.demo_info created, with 51076 rows and 4 columns.” Clearly, this can’t be correct. How do we get the desired result? Let’s make a final push.

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Example

\n", "

The following program selects the demographic information of students (ID, gender, height and weight) from two tables, survey and survey2:

\n", "
" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\n", "\n", "SAS Output\n", "\n", "\n", "\n", "
\n", "
\n", "

Traffic volume in Area SS

\n", "

During weekdays and weekends

\n", "
\n", "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
idGenderHeightWeight
1001Male67190
1002Female54110
1003Female65225
1004Female52135
1005Female72128
1006Male70188
1007Male70155
1008Male68160
1009Male72160
\n", "
\n", "
\n", "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PROC SQL;\n", " create table demo_info as\n", " select s1.ID, Gender, Height, Weight\n", " from phc6089.survey as s1, phc6089.survey2 as s2\n", " where s1.ID = s2.ID;\n", " select *\n", " from demo_info\n", " where ID < 1010;\n", "QUIT;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

Let’s check through the code. Only one more clause has been added to the query, WHERE. We use the WHERE clause to subset the whole Cartesian product by only selecting the rows with matched ID numbers. Note that the column names in the WHERE clause do not have to be the same. At last, to be able to check the table in person, another query is added to display the data in the output window.

\n", "

Launch and run the SAS program, and review the log file and the output.

\n", "
NOTE: Table WORK.DEMO_INFO creates, with 226 rows and 4 columns.
\n", "

Finally, we got what we want. As you can see from the query result, it’s like combining two columns from each table horizontally. SAS also call it join. In this particular case, since we only chose the matched rows, it’s also called the inner join. Such type of join is very similar to Merge By in the DATA step but requiring less computing resources and less coding. There are other types of join and data union (a vertical combination of rows) in PROC SQL which are beyond this lesson’s scope. If you are interested, you can explore them yourself with the foundation of this lesson!

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "SAS", "language": "sas", "name": "sas" }, "language_info": { "codemirror_mode": "sas", "file_extension": ".sas", "mimetype": "text/x-sas", "name": "sas" } }, "nbformat": 4, "nbformat_minor": 2 }