KEMBAR78
SAS Data Sets for Data Analysts | PDF | Microsoft Excel | Computers
0% found this document useful (0 votes)
65 views10 pages

SAS Data Sets for Data Analysts

Creating SAS Data Sets from External Files for SAS base programming exam

Uploaded by

Kent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views10 pages

SAS Data Sets for Data Analysts

Creating SAS Data Sets from External Files for SAS base programming exam

Uploaded by

Kent
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

5 Creating SAS Data Sets from External Files

1. Raw Data Files

 To read the raw data file, the DATA step must provide the following instructions to
SAS:
o the location or name of the external text file
o a name for the new SAS data set
o a reference that identifies the external file
o a description of the data values to be read.

 FILENAME statement to point to the location of a data file or an aggregate storage


location that contains data files.

FILENAME filref 'path';

where
 filref is 1 to 8 characters long, begins with a letter or underscore, and
contains only letters, numbers, or underscores.
 'path' is the fully qualified name or location of the file(s).

 DATA statement indicates the beginning of the DATA step and names the SAS data
set to be created:

DATA SAS-data-set-1 <…SAS-data-set-n>;

where SAS-data-set names (in the format libref.flename) the data set or data sets
to be created.

 INFILE statement indicates the file where the data is in:

INFILE file-specification <options>;

where
 file-specification can take the form fileref to name a previously
defined file reference or 'filename' to point to the actual name and location
of the file
 To reference an external file with a fileref that points to an aggregate
storage location, you specify the fileref followed by the individual
filename in parentheses (e.g. fileref (xxx.dat)).
 options describe the input file's characteristics and specify how it is to be
read with the INFILE statement.
 Adding OBS=n option to the INFILE statement enables you to process
only records 1 through n, so you can verify that the correct fields are read

1
before reading the entire data file.

 Column input specifies actual column locations for values. When you use column
input, your data must be
o character or standard numeric values
o in fixed fields.
 When you use column input, you can
o read any or all fields from the raw data file
o read the fields in any order
o specify only the starting column for values that occupy only one column.
 Standard numeric data values can contain only
o numerals
o decimal points
o numbers in scientific or E-notation (e.g. 2.3E4)
o plus or minus signs.
 Nonstandard numeric data includes
o values that contain special characters, such as percent signs (%), dollar signs
($), and commas (,)
o date and time values
o data in fraction, integer binary, real binary, and hexadecimal forms.

 INPUT statement describes the fields of raw data to be read and placed into the SAS
data set:

INPUT variable <$>startcol-endcol…;

where
 variable is the SAS name that you assign to the field
 the dollar sign ($) identifies the variable type as character (if the variable
is numeric, then nothing appears here)
 startcol represents the starting column for this variable
 endcol represents the ending column for this variable.

 Unlike syntax errors, invalid data errors do not cause SAS to stop processing a
program.

 Whenever you use the DATA step to read raw data, remember the steps that you
followed in this chapter, which help ensure that you don't waste resources when
accessing data:
o write the DATA step using the OBS= option in the INFILE statement
o submit the DATA step
o check the log for messages
o view the resulting data set
o remove the OBS= option and resubmit the DATA step
o check the log again
o view the resulting data set again.

2
Example 1

Can column input to be used for the following data set?

1---+----10---+----20---+----30---+
MARY JACKSON 12 29,296.63
SUSAN JEFFERSON 15 12,367.23
MARCO LEE 11 56,309.00
SERGE LANG 13 32,446.78
MOORE ABRAHAM 12 28,945.12

Example 2

Write a SAS step that create a SAS dataset salary which contains data from
salary.txt:

1---+----10---+----20---+----30---+
MARY JACKSON 12 29296.63
SUSAN JEFFERSON 15 12367.23
MARCO LEE 11 56309.00
SERGE LANG 13 32446.78
MOORE ABRAHAM 12 28945.12

Example 3

Write a SAS step that create a SAS dataset honours which contains the first two
observations from honours.txt:

1---+----10---+
23FIRST M

3
12SECONDUF
13SECONDLM
11FIRST F
59THIRD M
81SECONDLF

2. Creating and Modifying Variables

 To modify existing values or to create new variables, you can use an assignment
statement in any DATA step:

variable=expression;

where
 variable names a new or existing variable
 expression is any valid SAS expression

 You use SAS expressions in assignment statements and many other SAS
programming statements to
o transform variables
o create new variables
o conditionally process variables
o calculate new values
o assign new values

 Below lists arithmetic operators:


Operator Action Example Priority
- negative prefix negative=-x; I
** exponentiation raise=x**y; I
* multiplication mult=x*y; II
"/" division divide=x/y; II
+ addition sum=x+y; III
- subtraction diff=x-y; III

 When you use more than one arithmetic operator in an expression,


o operations of priority I are performed before operations of priority II, and so
on

4
o consecutive operations that have the same priority are performed
 from right to left within priority I
 from left to right within priority II and III
o you can use parentheses to control the order of operations.
 You can assign date values to variables in assignment statements by using date
constants. To represent a constant in SAS date form, specify the date
as 'ddmmmyy' or 'ddmmmyyyy', immediately followed by a D or d where
o dd is a one-or two-digit value for the day
o mmm is a three-letter abbreviation for the month (JAN, FEB, and so on)
o yy or yyyy is a two-or four-digit value for the year, respectively.

Example 4

Write a SAS step that creates a SAS dataset salary4 which contains the following dataset
(salary.txt):

1---+----10---+----20---+----30---+
MARY JACKSON 12 29296.63
SUSAN JEFFERSON 15 12367.23
MARCO LEE 11 56309.00
SERGE LANG 13 32446.78
MOORE ABRAHAM 12 28945.12

The first field should be stored in firstName, second in lastName, third in age, and
forth in salary. Also create a new variable savingHKD which is 7.8 times salary and a
new variable update which is July 28, 2015.

3. Subsetting Data

 IF statement causes the DATA step to continue processing only those observations
that meet the condition of the expression specified in the IF statement:

IF expression;

5
where expression is any valid SAS expression.
 If the expression is true, the DATA step continues to process that
observation.
 If the expression is false, no further statements are processed for that
observation, and control returns to the top of the DATA step.

Example 5

Write a SAS step that creates a SAS dataset salary5 which contains the following dataset
(salary.txt):

1---+----10---+----20---+----30---+
MARY JACKSON 12 29296.63
SUSAN JEFFERSON 15 12367.23
MARCO LEE 11 56309.00
SERGE LANG 13 32446.78
MOORE ABRAHAM 12 28945.12

The first field should be stored in firstName, second in lastName, third in age, and
forth in salary. In addition, only observations with age larger than 12 are read.

4. Reading Instream Data

 To read instream data, you use


o a DATALINES statement as the last statement in the DATA step and
immediately preceding the data lines
o a null statement (a single semicolon) to indicate the end of the input data.

Example 6

Continue with example 5. Write a SAS program that creates a SAS dataset salary6 using
instream data.

6
5. Creating Raw Data

 To create a raw data file and not a SAS data set, it is inefficient to print a data set
name in the DATA statement. Instead, use the keyword _NULL_ as the data set name,
which enables you to use the DATA step without actually creating a SAS data set.
 A SET statement specifies the SAS data set that you want to read from.
 FILE statement to specify the output file.

FILE file-specification <options><operating-environment-options>;

where
 file-specification can take the form fileref to name a previously defined
file reference or 'filename' to point to the actual name and location of the
file
 options names options that are used in creating the output file
 operating-environment-options names options that are specific to an
operating environment (for more information, see the SAS documentation
for your operating environment).

 PUT statement describes the columns to write values to the raw data file:

PUT variable startcol-endcol…;

where
 variable is the name of the variable whose value is written
 startcol indicates where in the line to begin writing the value
 endcol indicates where in the line to end the value.

Example 7

Continue with example 6. Write a SAS program that create an external data file
salary6.txt with salary in column 1 to 20, lastName in 22 to 32, firstname in
33 to 43, and age in 45 to 50.

7
6. Reading Microsoft Excel Data

 To read in Excel data you use one of the following methods:


 SAS/ACCESS LIBNAME statement
 Import Wizard

 LIBNAME statement associates a SAS name with an Excel workbook file by pointing
to its location. The Excel workbook becomes a new library in SAS, and the
worksheets in the workbook become the individual SAS data sets in that library.

 To read the Excel workbook file, the DATA step must provide the following
instructions to SAS:
 a libref to reference the Excel workbook to be read
 the name and location (using another libref) of the new SAS data set
 the name of the Excel worksheet that is to be read

 The general form of the SAS/ACCESS LIBNAME statement is as follows:

LIBNAME libref 'location-of-Excel-workbook '<options>;

where
 libref is a name that you associate with an Excel workbook.
 'location-of-Excel-workbook' is the physical location of the Excel
workbook.

 To read a worksheet in an Excel file you specify the worksheet as follows:

results.'tests$'n

where results is the libref and tests is the name of the worksheet in the
Excel file. To write a worksheet to an Excel file, you, however, only need to specify
the worksheet as other SAS data file (that is, results.tests).
 To disassociate a libref, use a LIBNAME statement, specifying the libref and
the CLEAR option. For example, libname results clear;

Example 8

Write a SAS program that prints the dataset in Sheet1 of salary.xlsx.

8
7. Answers of Examples
1 No. This is because it contains non-standard numeric data.

2
filename example2 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data salary;
infile example2(salary.txt);
input firstname $ 1-5 lastname $ 9-17 age 20-21 salary 24-29;
run;
filename example3 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data honours;
infile example3(honours.txt) obs=2;
input age 1-2 honours $ 3-9 gender $ 10;
run;

3
filename example3 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data honours;
infile example3(honours.txt) obs=2;
input age 1-2 honours $ 3-9 gender $ 10;
run;

4
filename example4 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data salary;
infile example4(salary.txt);
input firstname $ 1-5 lastname $ 9-17 age 20-21 salary 24-29;
salaryHKD = 7.8 * salary;
run;

5
filename example5 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data salary;
infile example5(salary.txt);
input firstname $ 1-5 lastname $ 9-17 age 20-21 salary 24-29;
if age > 12;
run;

6
filename example6 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data salary6;
input firstname $ 1-5 lastname $ 9-17 age 20-21 salary 24-29;
datalines;
MARY JACKSON 12 29296.63

9
SUSAN JEFFERSON 15 12367.23
MARCO LEE 11 56309.00
SERGE LANG 13 32446.78
MOORE ABRAHAM 12 28945.12
;

7
filename example7 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set';

data _null_;
set salary6;
file example7(salary6.txt);
put firstname $ 33-43 lastname $ 22-32 age 45-50 salary 1-20;
run;

8
libname exefile 'D:\DropBox\Dropbox\SAS\SAS14_15s3\Data Set\salary.xlsx';
proc print data=exefile.'Sheet1$'n;
run;

10

You might also like