An Intro to PROC SQL: Simple & Complex Queries
Melissa Navarro, Individual Contributor, San Diego, CA
ABSTRACT
Structured Query Language (SQL) is the universal programming language for relational database management.
SAS ® implementation of SQL is PROC SQL, which is an essential tool for managing and manipulating data. A query
is an inquiry to retrieve data from one or more database tables by using a SELECT clause and other clauses such as
FROM, WHERE, ORDER BY. This paper provides an introduction to both simple and complex PROC SQL queries.
Additionally, I will present some tips that can benefit programmers when using PROC SQL. These skills are essential
for anyone involved in analysis using SAS, and can serve as a foundation for those working with large data sets.
INTRODUCTION
SAS programmers are always looking for techniques to help manage and manipulate data more efficiently. PROC
SQL is a powerful tool that provides an easy, flexible way to manipulate, and combine data. PROC SQL queries
retrieve data from one or more database tables. A query can be used to answer a simple question from a table or a
more complex question from multiple tables. This paper illustrates the basics of PROC SQL, the difference between
simple and complex queries, and certain techniques that can help a SAS programmer when using PROC SQL.
Examples and techniques illustrated:
Simple Queries:
• Retrieving data from a table
• Selecting desired columns from a table
• Creating tables and subsetting rows from a table
• Performing basic calculations using summary functions
Complex Queries:
• Inner Join
• Left Outer Join
• Right Outer Join
DATA
1
An Intro to PROC SQL: Simple & Complex Queries, continued
PROC SQL BASICS
The following table lists the different terms used between SAS and SQL programming languages.
SAS SQL
SAS Data File Table
Observation Row
Variable Column
Syntax
The following illustrates basic PROC SQL syntax to perform a query:
PROC SQL;
SELECT < column names>
FROM < table names>;
< WHERE >
< GROUP BY >
< ORDER BY >
QUIT;
Brief explanation of the syntax:
PROC SQL – required, invokes the SQL procedure
SELECT – required, specifies the selected columns
FROM- required, specifies the tables to be read
WHERE- optional, subsets desired data based on one or more conditions
GROUP BY- optional, classifies the data into groups based on selected columns
ORDER BY- optional, sorts the rows in ascending or descending order
QUIT- required, used to terminate the query once it’s done
SIMPLE QUERY
Retrieving data from a table
A simple query is described as retrieving data from one table for quick results. Most of the time, the PROC SQL
syntax is short and easy to understand. The following syntax illustrates how to retrieve rows of data from a table. The
rows of data are read from the table, not updated/modified.
Example 1.
PROC SQL;
SELECT *
FROM AUTHOR;
QUIT;
Example 1 Output:
2
An Intro to PROC SQL: Simple & Complex Queries, continued
Selecting desired columns from a table
PROC SQL lets you display one or more selected columns in a specific order by specifying the names of the columns
after the SELECT statement.
Example 2.
PROC SQL;
SELECT BOOK_ID, PRICE, GENRE
FROM BOOK;
QUIT;
Example 2 output:
3
An Intro to PROC SQL: Simple & Complex Queries, continued
Creating tables and subsetting rows from a table
Example 3.
PROC SQL;
CREATE TABLE BOOK_PRICE AS
SELECT BOOK_ID, NAME, GENRE
FROM BOOK
WHERE GENRE = 'Poetry';
QUIT;
Example 3 output:
Performing basic calculations using summary functions
The following table lists the available PROC SQL summary functions:
Using summary functions can come in handy when basic information from a data set is desired. Results are
displayed quick, time is saved and keystrokes are less. Highly recommend when analyzing large data sets.
Example 4.
PROC SQL;
SELECT GENRE, MIN(PRICE) AS CHEAP
FROM BOOK
GROUP BY GENRE;
QUIT;
4
An Intro to PROC SQL: Simple & Complex Queries, continued
Example 4 output:
COMPLEX QUERIES
A complex query is described, as retrieving data from two or more tables for more complex questions. PROC SQL
syntax can be extensive and hard to understand depending on desired results.
Joins are used for joining two or more tables of data together. A minimum of two tables is required and columns from
each table are used to join tables. Joins are extremely valuable when joining large, complex tables. The data does
not need to be sorted and common variable names are not required. The chart below describes the different types of
joins in PROC SQL. Inner, left outer, and right outer joins are illustrated in this paper.
JOINS
Cartesian Product/Cross Join All possible combinations of rows and
columns from two or more tables
Inner Join Only retrieves rows with matching values
from two or more tables
Reflexive Join Table is joined with itself
Left Outer Join Retrieves matching rows while
preserving any unmatched rows from the
left table
Right Outer Join Retrieves matching rows while
preserving any unmatched rows from the
right table
Full Outer Join Retrieves matching rows while
preserving any unmatched rows from the
left and right tables
Inner Joins
An inner join is the most common type of join. It is also referred as an equijoin because of the “=” sign used in the
where clause.
5
An Intro to PROC SQL: Simple & Complex Queries, continued
Example 5.
PROC SQL;
SELECT BOOK.A_ID, BOOK.TITLE, AUTHOR.NAME
FROM BOOK,
AUTHOR
WHERE BOOK.A_ID = AUTHOR.A_ID;
QUIT;
Example 5 output:
6
An Intro to PROC SQL: Simple & Complex Queries, continued
Left Outer Join
Example 6.
PROC SQL;
SELECT BOOK.A_ID, BOOK.TITLE, AUTHOR.GENDER
FROM BOOK LEFT JOIN
AUTHOR
ON BOOK.A_ID = AUTHOR.A_ID;
QUIT;
Example 6 output:
7
An Intro to PROC SQL: Simple & Complex Queries, continued
Right Outer Join
Example 7.
PROC SQL;
SELECT BOOK.A_ID, BOOK.TITLE, AUTHOR.NAME
FROM BOOK RIGHT JOIN
AUTHOR
ON BOOK.A_ID = AUTHOR.A_ID
ORDER BY A_ID DESC;
QUIT;
Example 7 output:
8
An Intro to PROC SQL: Simple & Complex Queries, continued
TIPS
PROC SQL is a useful tool when analyzing complex tables. Based on my experience with PROC SQL and many
other programming languages, I would to share some techniques.
1. Use Views to reduce data redundancy and hide sensitive information from unauthorized users.
2. Use Indexes which may be able to improve performance by providing faster and more efficient access to
data
3. Try to eliminate redundant columns and rows from tables to improve CPU time
4. When writing complex queries, using table aliases can simplify and consolidate syntax
5. Use WHERE statements to subset tables when possible
6. Use the NOPRINT option when possible
7. The COMPRESS function can help clean data and increase program efficiency
CONCLUSION
PROC SQL is an amazing tool when dealing with big data. Joins are essential when combining multiple tables. The
simplistic, flexible, and user-friendly syntax makes PROC SQL unique and powerful. PROC SQL offers many
techniques and topics not described in this paper.
REFERENCES
Lafler, Kirk. 2015, PROC SQL Programming: The Basics and Beyond Course Notes
SAS Institute Inc.2014, SAS Certification Prep Guide: Advanced Programming for SAS 9, Fourth Edition, Cary NC:
SAS Institute Inc.
ACKNOWLEDGMENTS
A special thanks to Abigail Hernandez, Otto Colombani, and Cynthia Simon-Arndt for their support and helpful
comments on this paper.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the author at:
Melissa Navarro
Individual Contributor
San Diego, CA 91942
melissa.navarro1109@gmail.com
www.linkedin.com/in/melissanavarro1
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS
Institute Inc. in the USA and other countries. ® indicates USA registration.
Other brand and product names are trademarks of their respective companies.