Two
SAS: An Introduction
24
2.1 WHAT IS SAS?
It is a programming language composed of statements that specify
how data is to be processed and analyzed.
A SAS program consists of a sequence of SAS statements grouped
together into blocks, referred to as steps.
Two major types of steps:
Data steps
Procedure steps (proc)
A data step creates and modies a SAS data set.
Aproc step is used to performa particular type of statistical analysis
on the SAS data set.
25
2.2 HOW TO START PC SAS IN WINDOWS 7
Double click the following icon in the desktop to activate SAS.
Alternatively,
Start" All Programs" SAS" SAS 9.3 (English)".
After SAS starts, ve main windows are opened, namely:
Editor
Log
Output
26
Results
Explorer
27
2.3 EDITOR WINDOW
The Editor window is for typing in, editing and running programs.
The program currently in the editor window can be run by choosing
Submit" option from the Run" menu.
A program can also be run by clicking the running man icon in the
toolbar.
The Run" menu is specic to the Editor window and is not available
if any other window is active.
28
2.4 LOG AND OUTPUT WINDOWS
The Log window shows the SAS statements that have been submit-
ted together with information about the execution of the program,
including warning and error messages.
The Output window shows the printed results of any procedures.
29
2.5 EXPLORER AND RESULTS WINDOWS
The Explorer window shows contents of SAS environment:
Libraries; File shortcut; Favourite Folders; My Computer
The Results window organizes outputs in a way similar to le ex-
plorer:
Double click on the name to open the output related to the name.
30
EXAMPLE 2.1 (A SIMPLE SAS PROGRAM)
data ex2_1;
input subject gender $ exam1 exam2 hw_grade $;
datalines;
10 m 80 84 a
7 m 85 89 a
4 f 90 86 b
20 m 82 85 b
25 f 94 94 a
14 f 88 84 c
;
proc means data=ex2_1;
var exam1 exam2;
run;
proc means data=ex2_1 n mean std stderr maxdec=1;
var exam1 exam2;
run;
31
2.6 A FEW SIMPLE RULES IN SAS PROGRAMMING
A variable name must not be more than 32 characters in length.
A variable name must start with a letter or the underscore character
(_).
Using blank or special characters, such as commas, semicolons, etc.
for variable names are not allowed.
Every SAS statement must end with a semicolon (;) except the lines
of data.
End a program with the statement run;.
32
2.7 GENERAL FORM OF SIMPLE DATA STEP
data data_set_name;
input variables;
datalines;
the lines of data
;
run;
33
2.8 INPUT FORMAT
There are several ways of entering data in SAS.
34
Free format
data ex2_1;
input subject gender $ exam1 exam2 hw_grade $;
datalines;
10 m 80 84 a
7 m 85 89 a
......
$" implies the variable is a character variable
Use ." for a missing value
35
Fixed format
data ex2_1fix1;
input subject 1-2 gender $ 4 exam1 6-7 exam2 9-10 hw_grade $ 12;
datalines;
10 m 80 84 a
7 m 85 89 a
......
Or
data ex2_1fix2;
input @1 subject 2.0 @4 gender $1. @6 exam1 2.0
@9 exam2 2.0 / @1 hw_grade $1.;
datalines;
10 m 80 84
a
7 m 85 89
a
......
@x goes to column x
36
$1. means a character variable with width 1
W.d means a numeric variable with width W and d decimal places
123 in 3.2 format is 1.23
You can omit the 0 in W.0
/ goes to the next line
37
2.9 USING DATA FROM EXTERNAL FILES
We can read in external data les into SAS:
data ex2_1 ;
infile C:\ST2137\data\ex2_1.txt;
input subject 1-2 gender $ 4 exam1 6-7 exam2 9-10 hw_grade $ 12;
run;
Remarks:
(a) If ," is used as a delimiter instead of a space in free format data, then
we add a delimiter="," to the infile statement as follows:
data ex2_1 ;
infile C:\ST2137\data\ex2_1c.txt delimiter=",";
input subject gender $exam1 exam2 hw_grade $;
run;
38
(b) SAS allows consecutive delimiter. Hence a data line
10,m,,84,a
has only 4 observations, not 5 with the third missing. We should put a
. as a missing observation. Hence the data line should be
10,m,.,84,a
39
Limiting the number of observations read from external les
data ex2_1;
infile C:\ST2137\data\ex2_1.txt firstobs = 2 obs = 4;
input subject 1-2 gender $ 4 exam1 6-7 exam2 9-10 hw_grade $ 12;
run;
It asks SAS to begin with the second row of the data le and end with the
fourth row of the data le. Three observations can be found in the data set
ex2_1.
40
2.10 COMMENT LINES
Sometimes we want to include some remarks or comments in our program
so that these remarks will not be executed. Two ways to make comments
in a SAS program
(a) Begin a SAS statement with an * and end it with a ;
(b) Begin the comment with /* and end it with */
data sales;
input item price;
*
compute sales tax;
tax = price
*
0.07;
datalines;
001 15
......
;
Remark: A new variable tax is created such that its value equals to 7% of
the price.
41
Alternatively, we may write it as
data sales;
input item price;
tax = price
*
0.07 /
*
compute sales tax;
*
/;
datalines;
001 15
....
;
42
2.11 IMPORTING AN EXCEL FILE
File" Import Data".
Choose Microsoft Excel Workbook (*.xls,*.xlsb, *.xlsm, *.xlsx)". Then
click Next".
Type the location of the Excel le or click on Browse" to select the
le.
Click OK".
The default option is to read variables names from the rst line.
If no variable names in the rst line of the Excel le, then do the
following
click Options ..."
Uncheck Use data in the rst row as SAS variable names"
43
OK"
Click on Next". Enter a name (e.g. ex2_1) in the member. The SAS
data set will be created under the name ex2_1 in the Work directory.
The SAS data sets in the Work directory are temporary and will be
lost once we logoff the SAS session.
Click on Finish".
44
2.12 VIEWING AND MODIFYING A DATA SET
All the data sets created or imported are stored temporary in the Work
directory. They can be modied in the following way.
Explorer" Libraries" Work"
Double click on the lename ex2_1.
In the menu bar, click Edit" Edit Mode"
Click on a particular cell to modify the data in that cell
45
2.13 SELECT A SUBSET OF A DATA SET
Suppose we want to analyze those students who scored above 85 in exam1.
A way to select a subset is to use the where (condition)" or if (condi-
tion)" statement so that those data records satisfying the condition will be
included in the new data set.
data subset1;
set ex2_1;
/
*
provided that a SAS data set named ex2_1 has been created or imported
*
/
where (exam1 >= 85 and hw_grade = "a");
/
*
"if" can be used in the place of "where"
*
/
run;
The set statement reads data from the specied data set.
The data are read in from the data set ex2_1 instead of from an ex-
ternal le or from a datalines statement.
46
2.14 CREATING AND RETRIEVING A SAS DATA SET
All SAS data sets will disappear after the session ends. In order to use
the same SAS data set again in the following sessions, we can create a
permanent SAS data set in the library under the sasuser" directory. By
default, the sasuser" folder is located at
c:\Users\username\My Documents\My SAS Files\9.3\.
47
Creating a permanent data set
Example:
data "C:\ST2137\data\ex2_1";
/
*
The SAS data set <ex2_1> will be permanently
stored in the directory C:\ST2137\data\
*
/
set ex2_1;
/
*
provided that a SAS data set named ex2_1
has been created or imported
*
/
run;
48
Retrieving a permanent SAS data set
data try;
set "C:\ST2137\data\ex2_1";
run;
49
2.15 MANIPULATING DATA SETS
50
Keeping a subset of variables
Suppose we want to keep only subject, gender and the 2 exam marks.
data ex2_1keep;
set ex2_1;
keep subject gender exam1 exam2;
run;
Hence only those variables on the list will be kept in the new data set.
51
Dropping a subset of variables
Suppose we want to drop the homework grade from the data set.
data ex2_1drop;
set ex2_1;
drop hw_grade;
run;
52
Concatenating data sets
Suppose we have a data set for male (e.g. ex2_1m) and a data set for
female (e.g. ex2_1f). We need to combine the two data sets to do analysis.
data ex2_1concat;
set ex2_1m ex2_1f;
run;
A new data set is formed which consists of all the lines in ex2_1m and
followed by all the lines in ex2_1f.
Question:
How to create data sets ex2_1m and ex2_1f from ex2_1?
53
Match merging data sets
Suppose we have another data set ex2_1iq that consists of the IQ scores
of the subjects in the data set ex2_1. We want to combine observations
from these two data sets into a single observation in a new data set ac-
cording to the values of a variable that is specied in the by statement.
proc sort data=ex2_1;
by subject;
data ex2_1iq;
input subject iq @@;
datalines;
10 106 7 112 4 119
20 102 25 125 14 101
;
proc sort data=ex2_1iq;
by subject;
data ex2_1merge;
merge ex2_1 ex2_1iq;
by subject;
run;
54
Remark: The @@ at the end of the input statement asks the computer to
continue reading until the end of the line.
Always remember to sort the data sets before merging. A common vari-
able to the data sets must be specied for the purpose of merging.
55
2.16 TRANSFORMING DATA & CREATING A NEW VARIABLE
data ex2_1transf;
set ex2_1;
ave_mark= (exam1 + exam2)/2;
if hw_grade = "." then final_grade= "";
else if ave_mark >= 90 and hw_grade = "a" then final_grade = "a";
else if ave_mark >= 85 and (hw_grade = "a" or hw_grade= "b") then final_grade= "b";
else final_grade= "c";
run;
56
2.17 LOGICAL OPERATORS
EQ" or = means equal
LT" or < means less than
LE" or <= means less than or equal
GT" or > means greater than
GE" or >= means greater than or equal
NE" or ^= means not equal
NOT" or ^ means negation
57
2.18 A SPECIAL FUNCTION
Suppose we want to rank the subject based on the average exam mark.
data ex2_1transf;
set ex2_1;
ave_mark = (exam1 + exam2)/2;
proc sort data=ex2_1transf;
by descending ave_mark;
data ex2_1rank;
set ex2_1transf;
position = _n_;
run;
Remark:
The value of _n_ represents the number of times the data" step has iter-
ated.
58
2.19 DO LOOP
The general form of a do loop is
do index variable= beginning number to ending number;
statements;
end;
Heres an example:
data exdo1;
do factora = 1 to 3;
do factorb = 1 to 5;
input y @@;
output;
end;
end;
datalines;
22 24 16 18 19
15 21 26 16 25
14 28 21 19 24
run;
59