KEMBAR78
Unit 4 Tutorials Advanced Database Commands | PDF | Table (Database) | Databases
0% found this document useful (0 votes)
5 views99 pages

Unit 4 Tutorials Advanced Database Commands

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views99 pages

Unit 4 Tutorials Advanced Database Commands

Uploaded by

parrishd2015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 99

Unit 4 Tutorials: Advanced Database Commands

INSIDE UNIT 4

Querying Multiple Tables

JOINs
JOINs That Specify Which Columns to Use
NATURAL JOINs
JOIN ON to Link Tables
AS/ALIAS to Rename Tables and Columns
OUTER JOINs
LEFT JOINs
RIGHT JOINs
CROSS JOINs

Constructing Related Tables

Foreign Keys & Creating Tables


Foreign Keys & Altering Tables
Foreign and Primary Keys
Adding and Deleting Foreign Keys
Foreign Key Errors

Specialized Queries

Subqueries
Subquery Performance
Find Duplicate Rows
UNION to Combine Results
ANY and ALL Operators

Views

Calculations in SELECT Statements


VIEW to Provide a Subset
VIEW to Simplify Queries
VIEW & Complex Queries
CREATE OR REPLACE VIEW to Update Views
DROP VIEW to Remove Views

JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will learn how JOIN clauses work and how you can use them to find meaningful facts and interesting associations in the data. Specifically, this
lesson will cover:
1. JOIN Basics
2. INNER JOINs
3. OUTER JOINs

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
1. JOIN Basics
JOINs are one of the most fundamental concepts in SQL. They enable you to retrieve data from multiple related tables in a single query. A JOIN enables you to combine rows
from several tables, pulling data from different sources into a cohesive dataset. JOINs can help you avoid data duplication and maintain a well-structured schema.

 KEY CONCEPT

One common mistake is to confuse a JOIN with a relationship. Relationships are permanent connections between tables based on common fields. A JOIN is an operation
used in a query to specify how data from multiple related tables will be included in a query. JOINs rely on existing relationships within the database and are specific to
the query in which they are referenced.
PostgreSQL supports several types of JOINs, each serving a specific purpose. Common JOIN types include INNER JOINs, LEFT JOINs (or LEFT OUTER JOINs), RIGHT JOINs
(or RIGHT OUTER JOINs), and FULL JOINs (or FULL OUTER JOINs). You will work with these in upcoming lessons, as well as NATURAL JOINs, CROSS JOINs, and self JOINs.
Each JOIN type has its own unique behaviors, as you will discover.

2. INNER JOINs
The INNER JOIN is a JOIN in SQL in which rows from multiple tables are combined based on a condition. In this way, only rows matching the condition are included in the
result set. The purpose of this type of JOIN is to retrieve data from the participating tables with matching values in the specified columns.

 HINT

An INNER JOIN returns only rows from both tables with corresponding values, effectively removing the nonmatching records.
An INNER JOIN is commonly used when combining data from related tables, where the columns being joined have a defined relationship. A database with two tables, such
as "Customers" and "Orders," can be used to fetch records indicating which customers have placed orders, and their details. By reducing the result set to only the data that
meets the specified criteria, the INNER JOIN operation enhances query efficiency. As a result, complex datasets can be queried in a precise and organized manner with its
help.

EXAMPLE First, we will create two tables for representatives and departments. The common attribute between the two tables is the representative_id from the
representative table and the manager_id in the manager table.
Notice that in this case, the two column names are different from one another but linked through the primary key.

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department ( department_id INT PRIMARY KEY, department_name VARCHAR (100) NOT NULL, manager_id INT, constraint fk_manager FORE
);
INSERT INTO representative (representative_id, first_name, last_name)
VALUES (1, 'Bob','Evans'), (2, 'Tango','Rushmore'), (3, 'Danika','Arkane'), (4, 'Mac','Anderson');
INSERT INTO department (department_id, department_name,manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', null), (5, 'Support', null);
We can represent this data through the use of a Venn diagram, where the left circle represents the data from the representative table, and the right circle represents the data
from the department table:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 2
By running a SELECT statement for each table individually, we can see that Tango Rushmore is not a department manager (because his ID does not appear in the
manager_id column) and that the Finance and Support departments still need a manager (because the manager_id column is null for their records).

Individually, the data looks like the following:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 3
We can complete an INNER JOIN by identifying the column name. The basic INNER JOIN between the two tables should look like the following:

SELECT *
FROM representative
JOIN department
ON representative.representative_id = department.manager_id;
With this INNER JOIN, we find that the data in the representative_id column in the representative table matches the department’s manager_id. There should be three that
match:

The Venn diagram below illustrates the INNER JOIN:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 4
This is the most common type of JOIN, as we typically want to be able to identify where data between two or more tables matches.

 TERM TO KNOW

Venn Diagram
A diagram that illustrates logical relationships between two or more sets of items by using overlapping circles or shapes. In many cases, they serve as a visual way to
summarize similar and different aspects of items.

3. OUTER JOINs
OUTER JOINs include nonmatching rows from one or both of the participating tables, extending INNER JOIN capabilities. An OUTER JOIN includes rows where the condition
is not met, unlike an INNER JOIN, which retrieves only rows with matching values. This ensures that the result set contains data from both tables. OUTER JOINs come in three
varieties: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. This lesson explains the basic concepts behind OUTER JOINs; you will learn their specifics and
how to create them in later lessons.

 KEY CONCEPT

LEFT OUTER JOINs retrieve all rows from the left table and the matching rows from the right table. In the event that there is no match in the right table, the result will still
show the corresponding row from the left table along with NULL values for the columns in the right table.

 KEY CONCEPT

In contrast, a RIGHT OUTER JOIN includes all rows from the right table and the matching rows from the left table. NULL values are also included in the left table columns
when there is no match.

 KEY CONCEPT

In a FULL OUTER JOIN, both LEFT and RIGHT OUTER JOINs are combined. All rows from both tables are included, and NULL values are filled in where there are no
matches. Using an OUTER JOIN provides a comprehensive view of data relationships in your PostgreSQL database, even when there are no matches in the specified
JOIN condition.
If you think about your left hand and right hand, the LEFT OUTER JOIN and RIGHT OUTER JOIN make more sense. If you have your customer table in your left hand and your
orders table in your right hand, you can join those tables using the OUTER JOIN, LEFT, RIGHT or FULL to determine what data you want to get. You want the match in the

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 5
customer table (LEFT OUTER JOIN), the match in the orders table (RIGHT OUTER JOIN), and then the data between the two of them (FULL OUTER JOIN).

We may use a LEFT JOIN to find data that exists in both tables and also data in the left table that does not match. This Venn diagram illustrates a LEFT JOIN.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 6
We may also want to find data that exists in the left table but does not match with data in the right table. This would be considered a LEFT OUTER JOIN:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 7
Using a RIGHT JOIN, we can get the data that exists in both tables, along with the data that does not match from the right table:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 8
We can also find data that exists in the right table but does not match with data in the left table, using a RIGHT OUTER JOIN:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 9
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you were introduced to the basics of JOINs, which are query operations that enable you to retrieve data from multiple related tables based on
specified criteria. This enables you to create complex queries that generate meaningful insights. There are several kinds of JOINs, including INNER JOIN, LEFT JOIN,
RIGHT JOIN, and FULL JOIN.

An INNER JOIN combines rows from two or more tables, returning only those rows that match the condition. Using this type of JOINs allows you to retrieve data that
have matching values in the specified columns of the participating tables. OUTER JOINs, including LEFT JOIN, RIGHT JOIN, or FULL JOIN, include nonmatching rows
from one or both tables as well as matching rows from one or both tables. FULL JOIN retrieves all rows from both tables, filling in NULL values where there are no
matches in either table. LEFT JOIN retrieves all rows from both tables, while RIGHT JOIN does the opposite. In each of the upcoming lessons in this challenge, you
will learn about the potential uses of each JOIN type and how to create them.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Venn Diagram
Venn diagrams illustrate logical relationships between two or more sets of items by using overlapping circles or shapes. In many cases, they serve as a visual
way to summarize similar and different aspects of items.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 10
JOINs That Specify Which Columns to Use
by Sophia

 WHAT'S COVERED

In this lesson, you will explore the INNER JOIN clause with USING to join tables, in two parts. Specifically, this lesson will cover:
1. INNER JOIN With USING
2. INNER JOIN USING Examples

1. INNER JOIN With USING


An INNER JOIN in SQL retrieves data from two or more related tables based on a specified condition. A USING clause is often used in conjunction with INNER JOIN to
simplify the JOIN condition by specifying which column or columns should be used for the JOIN. Avoiding repetitive column names in the result set helps streamline the
query syntax and reduce redundancy.

If the INNER JOIN clause is used with the USING clause, the database engine matches rows where the specified columns have equal values. This process eliminates
nonmatching rows from the result set by including only the rows that match the condition. Although the USING clause can simplify the JOIN process, it is important to note
that it restricts joining to columns with identical names. When columns have the same name but represent different data, or when joining conditions are more complex, using
the ON clause might be a better approach. You will learn about the ON clause in the next lesson.

Let's take a look at the product and category tables that we created in the prior lesson, which had an issue with the NATURAL JOIN:

CREATE TABLE category ( category_id serial PRIMARY KEY, name VARCHAR (100) NOT NULL );
CREATE TABLE product ( product_id serial PRIMARY KEY, name VARCHAR (100) NOT NULL, category_id INT NOT NULL, FOREIGN KEY (category_id) REFE
VALUES ('Game'), ('Movie'), ('CD');
INSERT INTO product (name, category_id)
VALUES ('Call of Duty', 1), ('Final Fantasy', 1), ('Wizard of Oz', 2), ('Jaws', 2), ('Great Hits', 3), ('Journey', 3);
Consider when we tried to run a NATURAL JOIN:

SELECT *
FROM product
NATURAL JOIN category;
We got the following result:

This was due to the fact that the tables had two common attributes. However, the name columns did not have values that matched between the two tables. The JOIN and
USING clauses will help with this and enable us to pick which common attribute to join.

The syntax for the command looks like this:

SELECT <columnlist>
FROM <table1>
INNER JOIN <table2> USING (<commonattribute>);
In our set of tables, we can use the JOIN and USING clauses to focus on category_id:

SELECT *
FROM product
INNER JOIN category USING (category_id);

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 11
Since “name” also exists in both tables, we can also use “name”:

SELECT *
FROM product
INNER JOIN category USING (name);

However, since none of the names between the tables match, no rows are returned.

Let us return to some of our other tables, like the album and artist table.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 12
So far, we have only focused on each individual table. We don’t know which album has which artist unless we look up the artist_id. However, since the artist_id exists in both
tables, we can use the JOIN and USING clause to join the tables on the artist_id:

SELECT *
FROM album
INNER JOIN artist USING (artist_id);

Now our data is starting to make a bit more sense, as we join it together.

2. INNER JOIN USING Examples


We can join more than just two tables by adding additional INNER JOIN statements with USING. If we take the example of artist and album, we can also identify the tracks on
each:

SELECT *
FROM track

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 13
INNER JOIN album USING (album_id)
INNER JOIN artist USING (artist_id);

As you can see, by using the * as we start to join more tables, we may have too many columns being returned. We can specify in the SELECT clause which columns should
be returned. It is a best practice to use the format <tablename>.<columnname> when we list the columns. Otherwise, if a column name exists in multiple tables, the database
does not know which column you want to display and returns an ambiguous error. For example, “name” exists in the track and artist table. Consider if we try to simply include
the name column in the SELECT clause:

SELECT clause:
SELECT name
FROM track
INNER JOIN album USING (album_id)
INNER JOIN artist USING (artist_id);
We will get the following error message:

If a particular column name appears in only one of the tables involved, you can reference it by its name only, omitting the table name. For example, you could write the
following statement where only certain columns have the table name as a prefix:

SELECT album_id, artist_id, track_id, album.title, artist.name, track.name


FROM track
INNER JOIN album USING (album_id)
INNER JOIN artist USING (artist_id)
ORDER BY album_id, artist_id, track_id;
However, this is not a good practice because at some point the table(s) or the query may be changed so that there are duplicate names. It is a better practice to use
consistent naming, with all the table names explicitly specified, like this:

SELECT album.album_id, artist.artist_id, track.track_id, album.title, artist.name, track.name


FROM track
INNER JOIN album USING (album_id)
INNER JOIN artist USING (artist_id)
ORDER BY album.album_id, artist.artist_id, track.track_id;
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that using an INNER JOIN, you can combine data from two or more related tables based on a defined condition. The USING clause
enables you to specify a single column that should be used for the INNER JOIN operation when you use INNER JOIN. It simplifies the query syntax by removing the
need to specify redundant column names.

In an INNER JOIN with USING, rows from joined tables are matched based on the specified column. The column name and data type must be the same in both
tables. As a result, only rows with matching values in the specified column will be included, effectively eliminating nonmatching records. Using the USING clause
offers a convenient way to perform INNER JOINs, but it's important that the column being used has consistent and meaningful data across tables, especially when
the complexity increases. For accurate and intended results, it's imperative to pay attention to the column characteristics to simplify queries and make them easier to
read, especially when dealing with tables with common attributes.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 14
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 15
NATURAL JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will use a NATURAL JOIN between tables, in two parts. Specifically, this lesson will cover:
1. Getting Started
2. Problems With NATURAL JOINs

1. Getting Started
NATURAL JOINs in PostgreSQL combine rows from two or more tables based on columns with matching names and data types. NATURAL JOINs differ from other JOIN
types in that PostgreSQL determines the matching columns automatically rather than you defining them explicitly. Using this method simplifies query syntax since columns do
not have to be explicitly specified.

 BIG IDEA

Despite their efficiency, NATURAL JOINs also pose certain risks and limitations. Changing column names or adding new columns could unintentionally affect query
results owing to the JOIN's exclusive reliance on column names. A NATURAL JOIN can also produce incorrect results if two or more columns share the same name but
contain different data. In order to avoid these potential pitfalls, many database professionals prefer explicitly defined JOIN conditions in order to control the JOIN process
and ensure a more accurate and predictable result.
Behind the scenes, a NATURAL JOIN goes through three stages:

The first stage creates the product of the two tables together.
The next stage takes the output from the prior stage and only displays the rows in which the common attributes are equal to one another.
In the last stage, a PROJECT is performed on the results to have a single copy of each attribute, which removes the duplicate column.

The user executing the JOIN operation doesn't have to issue commands for each of those stages, though; a single NATURAL JOIN clause handles it all. The final result based
on the NATURAL JOIN provides a result set that only has the matches between the two tables. It does not include any unmatched pairs from either table.
The syntax of a NATURAL JOIN looks like the following:

SELECT <column_list>
FROM <table1>
NATURAL [INNER, LEFT, RIGHT] JOIN <table2>
With a NATURAL JOIN, the default type is an INNER JOIN, but it can also perform a LEFT or RIGHT JOIN. Instead of a column list, we can also use the asterisk (*). This will
include all columns from both tables that have the same name.

Let's build a simple set of tables of product and categories:

CREATE TABLE category ( category_id serial PRIMARY KEY, category_name VARCHAR (100) NOT NULL );
CREATE TABLE product ( product_id serial PRIMARY KEY, product_name VARCHAR (100) NOT NULL, category_id INT NOT NULL, FOREIGN KEY (category_
INSERT INTO category (category_name)
VALUES ('Game'), ('Movie'), ('CD');
INSERT INTO product (product_name, category_id)
VALUES ('Call of Duty', 1), ('Final Fantasy', 1), ('Wizard of Oz', 2), ('Jaws', 2), ('Great Hits', 3), ('Journey', 3);
To join them using a NATURAL JOIN, we can set it as the following:

SELECT *
FROM category
NATURAL JOIN product;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 16
Note that we could also swap the two tables in the query:

SELECT *
FROM product
NATURAL JOIN category;

In both cases, the result set will display the common column first, which in this example is the category_id. Then, it will display all of the columns from the first table (other
than the common column) and all of the columns from the second table (other than the common column).

 TERM TO KNOW

NATURAL JOINs
SQL JOINs based on columns with identical names in the related tables. They are easier to set up than other JOINs but can result in unintended connections when
the tables share multiple common column names.

2. Problems With NATURAL JOINs


There are potential issues to be aware of with a NATURAL JOIN, and they can sometimes create unexpected results. Let’s recreate these tables, but instead of having
product_name and category_name, let's just call these columns “name” in both tables:

CREATE TABLE category ( category_id serial PRIMARY KEY, name VARCHAR (100) NOT NULL );
CREATE TABLE product ( product_id serial PRIMARY KEY, name VARCHAR (100) NOT NULL, category_id INT NOT NULL, FOREIGN KEY (category_id) REFE
INSERT INTO category (name)
VALUES ('Game'), ('Movie'), ('CD');

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 17
INSERT INTO product (name, category_id)
VALUES ('Call of Duty', 1), ('Final Fantasy', 1), ('Wizard of Oz', 2), ('Jaws', 2), ('Great Hits', 3), ('Journey', 3);
Consider if we now tried to run the NATURAL JOIN:

SELECT *
FROM product
NATURAL JOIN category;
We get the following result:

Although we have the category_id in both tables, we also have name columns in both tables. This common column has different meanings in each table, and since the
values in the two do not match, no rows are returned.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that in SQL, a NATURAL JOIN combines rows from two or more tables based on columns with the same name or data type. When you use
a NATURAL JOIN, the columns to join on are automatically determined by comparing their names, unlike other types of JOINs. With this approach, you don't have to
specify the JOIN conditions explicitly, which is especially useful when dealing with tables that share attributes.

NATURAL JOINs are convenient, but they also present certain problems. When joining tables that contain columns with the same names but different meanings or
data types, the JOIN might lead to unexpected or incorrect results. Any future changes to column names could also affect query behavior. Some database
professionals prefer using explicit JOIN conditions because they give them more control and clarity over the joining process. When used carefully, NATURAL JOINs
can be powerful tools, but there are limitations and dangers to be aware of.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

NATURAL JOINs
SQL JOINs based on columns with identical names in the related tables. They are easier to set up than other JOINs but can result in unintended connections
when the tables share multiple common column names.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 18
JOIN ON to Link Tables
by Sophia

 WHAT'S COVERED

In this lesson, you will explore using the JOIN ON clause to link table data across two tables, in two parts. Specifically, this lesson will cover:
1. JOIN ON
2. USING vs. ON

1. JOIN ON
By using JOIN ON, you can determine how rows from the participating tables are matched and combined. ON is usually associated with INNER JOIN, but in some cases, it
can also be used with other JOIN types that you will learn about in upcoming lessons.

Specifying the columns to be used in the ON clause, you can join tables on the basis of those columns. This enables you to establish more complex relationships between
tables than just columns with the same name. Specifying a JOIN condition explicitly enables you to work with tables that might have semantically related columns with
different names. It's particularly useful when dealing with tables that don't share identical column names to join datasets based on specific criteria, such as matching IDs or
dates. JOIN ON enables you to create comprehensive datasets and gain valuable insights from your relational database through precise and targeted data retrieval.

Table columns with relationships between them do not always have the same names. For example, if we look at the customer and employee tables, the customer table has a
support_rep_id which references the employee_id of the employee table:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 19
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 20
This makes sense because a support representative would be an employee, but it wouldn’t make contextual sense to have employee_id as the column's name in the
customer table. Since the column names are different, we could not use the USING clause. Rather, we must use the ON clause.

The structure of the statement looks like the following:

SELECT <columnlist>
FROM <table1>
INNER JOIN <table2> ON <table1column> = <table2column>
We can start the query by filling in the table and column names that we’re joining:

SELECT <columnlist>
FROM customer
INNER JOIN employee ON support_rep_id = employee_id;
We then need to determine what columns we want to return as part of the query. We don’t want to return all of the columns. For example, we may want to have all of the
customer and respective employee emails so that the employees can send out an email to their customers. Since the email addresses are in an email column in both tables,
we need to prefix the column with the table name, like this:

SELECT customer.email, employee.email


FROM customer
INNER JOIN employee ON support_rep_id = employee_id;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 21
2. USING vs. ON
The core difference between JOIN ON and JOIN USING in SQL lies in how they handle the specification of columns for the JOIN operation.

When you use JOIN ON, you can specify the conditions under which tables are joined. It involves specifying a comparison using one or more columns from the involved
tables. With this approach, you can join tables based on any column with related data, even if they don't have the same name. The feature is particularly useful when there is
a complex relationship between the columns or when the columns are named differently but represent the same data type.

With JOIN USING, you can specify a single column name common to both tables, simplifying the JOIN process. As a result, column names are not repeated in the result set.
By doing this, you can create a more concise query syntax and a more easily readable code. JOIN USING, however, requires that the column with the specified name and
data type exist in both tables. Despite simplifying the process for tables with the same column names, it might not be suitable for tables with more varied data or for joining
based on more than one column. As a result, your choice of JOIN ON or JOIN USING will depend on the complexity of your JOIN condition and your query's requirements.

Although the ON clause exists primarily to handle situations where the column names between the tables do not match, it does also work if the columns match (and
therefore, we could use the USING clause). Let's take a look at an example of the artist and album table with an INNER JOIN with the USING clause:

SELECT title, name


FROM album
INNER JOIN artist USING (artist_id);
We could convert this by removing the USING clause and adding the artist_id prefixed with the table name:

SELECT title, name


FROM album
INNER JOIN artist ON album.artist_id = artist.artist_id;
In both cases, the result set is the same:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 22
The only case where the result set will be different is if we use the * in the SELECT clause:

SELECT *
FROM album
INNER JOIN artist USING (artist_id);

SELECT *
FROM album
INNER JOIN artist ON album.artist_id = artist.artist_id;

Notice that in the ON clause JOIN, the artist_id appears twice (one for each table), whereas artist_id only appears once in the USING clause. This is because the USING
performs an equality JOIN and can only be used when the column names are identical. It is optional and not necessary to include the column twice. Other than that, they
function the same.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 23
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that columns and conditions are explicitly specified in JOIN ON, determining how the tables are joined. It provides flexibility by matching
tables on any column with related data, even if the columns have different names. In cases where tables do not share identical column names or when relationships
between them are complex, this method is ideal. In JOIN ON, you can fully control and customize the JOIN conditions according to your data and analysis.

JOIN USING simplifies the syntax of the JOIN and requires only the specification of one column name in both tables. Using this method eliminates column name
repetition in the result set, resulting in cleaner and more concise code. Both tables must contain the same column name and data type for this to work. JOIN USING
is well-suited for cases where tables have identical column names, making queries more readable and less prone to errors. You would choose between ON vs.
USING based on the data complexity and the query's specific requirements.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 24
AS/ALIAS to Rename Tables and Columns
by Sophia

 WHAT'S COVERED

In this lesson, you will learn how to use AS to create aliases for tables and columns. Specifically, this lesson will cover:
1. Table Aliases
2. Column Aliases

1. Table Aliases
A table alias enables us to assign tables or columns new names when a query is running. For a table in a SELECT statement, it could look like this:

SELECT *
FROM customer as c;
Essentially, we’ve renamed the table “c” instead of the customer. This can be very useful for making our queries easier to read when we start to join tables together,
especially when we prefix our tables. For example:

SELECT album.album_id, artist.artist_id, track.track_id, album.title, artist.name, track.name


FROM track
INNER JOIN album USING (album_id)
INNER JOIN artist USING (artist_id)
ORDER BY album.album_id, artist.artist_id, track.track_id;
By using a shorter alias name for the tables, we can simplify the query, making it easier to read the column list:

SELECT al.album_id, ar.artist_id, t.track_id, al.title, ar.name, t.name


FROM track as t
INNER JOIN album as al USING (album_id)
INNER JOIN artist as ar USING (artist_id)
ORDER BY al.album_id, ar.artist_id, t.track_id;

 KEY CONCEPT

When defining table aliases, you do not have to put the alias names in quotation marks unless they contain spaces. If you do use quotation marks, they must be double
ones ("). Using single quotes will generate an error.
In our case, the table names are quite short. But imagine if much longer table names are used in the database. Alias names would be much easier to follow. Let’s revisit the
representative and department tables we created previously to see an example:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department ( department_id INT PRIMARY KEY, department_name VARCHAR (100) NOT NULL, manager_id INT, constraint fk_manager FORE
INSERT INTO representative (representative_id, first_name, last_name)
VALUES (1, 'Bob','Evans'), (2, 'Tango','Rushmore'), (3, 'Danika','Arkane'), (4, 'Mac','Anderson');
INSERT INTO department (department_id, department_name,manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', null), (5, 'Support', null);
If we needed to use the prefixes for the column names, a query to list the departments and their managers would look like this:

SELECT department.department_name, representative.first_name, representative.last_name


FROM representative
JOIN department ON representative.representative_id = department.manager_id;
Through the use of table aliases, we can shorten all of the instances where we have the representative or department table names:

SELECT d.department_name, r.first_name, r.last_name


FROM representative as r
JOIN department as d ON r.representative_id = d.manager_id;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 25
 TERM TO KNOW

Table Alias
An alternative name assigned to a table for use in a query's result set.

2. Column Aliases
We can also create aliases for column names within the SELECT clause. Column aliases can be especially useful for expressions. You may remember our lessons on
aggregate functions, where the results would just show the function name. To a user reviewing the data, it may sometimes be confusing regarding what it represents.

For example, let's look back at one of the complex queries we created to find the max totals being greater than 15 and the customer_id being between 20 and 30:

SELECT customer_id, SUM(total), MAX(total)


FROM invoice
WHERE billing_country = 'USA'
GROUP BY customer_id
HAVING MAX(total) > 15
AND customer_id BETWEEN 20 AND 30;
The result set shows the following:

We could change each of the column names so that it is clearer what each of the expressions represents:

SELECT customer_id, SUM(total) as "Sum of Total", MAX(total) as "Max of Total"


FROM invoice
WHERE billing_country = 'USA'
GROUP BY customer_id

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 26
HAVING MAX(total) > 15
AND customer_id BETWEEN 20 AND 30;
Notice that in this case, we used double quotes around the alias column names since there are spaces in them:

This makes what the columns represent much clearer to the user.

Another example is the emails of the customers and the employees that support them:

SELECT customer.email, employee.email


FROM customer
INNER JOIN employee ON support_rep_id = employee_id;

Looking at the result set, it's unclear which email represents the customer's and the employee's emails, based on the column names. Here we can use both the table and
alias to simplify the query and change the column names:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 27
SELECT c.email as "Customer Email", e.email "Employee Email"
FROM customer AS c
INNER JOIN employee AS e ON support_rep_id = employee_id;

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 TERM TO KNOW

Column Alias
An alternative name assigned to a column for use in a query's result set.

 SUMMARY

In this lesson, you learned that using column and table aliases in SQL queries can enhance their readability and conciseness, especially when dealing with long or
complex table and column names. You can assign temporary names to the result set's columns using column aliases, resulting in a more intuitive and descriptive
output. For example, you can rename calculated columns, aggregate functions, or columns with ambiguous names. It improves the clarity of query results and makes
it easier for users to understand the data they are retrieving. In the same way, table aliases refer to renaming tables in a query with short and meaningful names. With
aliases, long table names do not have to be repeatedly typed out, which is especially useful in queries involving multiple tables. SQL statements can be more
succinct and easier to understand when using aliases.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Column Alias
An alternative name assigned to a column for use in a query's result set.

Table Alias
An alternative name assigned to a table for use in a query's result set.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 28
OUTER JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will explore using OUTER JOINs to query data from two or more tables, in two parts. Specifically, this lesson will cover:
1. FULL OUTER JOINs
2. Examples

1. FULL OUTER JOINs


OUTER JOINs can identify data that doesn't exist in one table by checking for NULL values in columns from the table where there is no match. In OUTER JOINs, there are
unmatched rows from one or both tables, and these unmatched rows are represented in the result set with NULL values in the columns that correspond to the missing data.

In SQL, a FULL OUTER JOIN combines data from two tables based on a specified condition while including all records from both tables, regardless of whether they have
matching counterparts. All rows from both tables are included in the result set, with NULL values filling in columns where there are no matches. A FULL OUTER JOIN
provides a complete view of the combined data and allows for thorough analysis and reporting when merging data from two sources without omitting any details.

For example, you could do a FULL OUTER JOIN between the Customers and Orders tables. To identify customers without orders, you would search for NULL values in the
OrderID column from the Orders table.

SELECT Customers.CustomerID, Customers.Name


FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID = Orders.CustomerID
WHERE Orders.OrderID IS NULL;
The FULL OUTER JOIN retrieves all customers and their orders (if any), and the WHERE clause filters the results to include only those customers for whom there is no
matching order (Orders.OrderID is NULL). Knowing what customers are not placing orders can help you backtrack that customer through the website and figure out where
they are abandoning the website or offer an incentive to shop, like a 10% off coupon.

 TERM TO KNOW

FULL OUTER JOIN


A JOIN that retrieves all records from both tables, even if there is no matching record in the other table; the missing column entries appear as NULL.

2. Examples
Let's come back to our initial dataset with the representative and department tables to help illustrate the OUTER JOINs:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department (
department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL,
manager_id INT,
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES representative(representative_id)
);
INSERT INTO representative (representative_id, first_name, last_name)
VALUES (1, 'Bob', 'Evans'), (2, 'Tango', 'Rushmore'), (3, 'Danika', 'Arkane'), (4, 'Mac', 'Anderson');
INSERT INTO department (department_id, department_name, manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', NULL), (5, 'Support', NULL);

 HINT

In this lesson, we are working with FULL OUTER JOIN, but there are other types of OUTER JOINs as well, such as LEFT and RIGHT OUTER JOIN. You will learn about
them in upcoming lessons.
The FULL OUTER JOIN returns the data from all rows in both the left and right tables. If they match, it will return data from both sides. If they don't match, the columns of the
table will be filled with NULL.

Let's first look at the structure of the statement:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 29
SELECT <columnlist>
FROM <table1>
FULL OUTER JOIN <table2> ON <table1>.<table1column1> = <table2>.<table2column1>;
Notice that this is similar to a regular JOIN ON statement, with only the FULL OUTER added. Using our two tables, we can run the FULL OUTER JOIN like this:

SELECT *
FROM representative
FULL OUTER JOIN department ON representative.representative_id = department.manager_id;
This will return the three records that match the representative_id from the representative table and the department_id from the department table. It will also return the rows
from the department table that do not have a matching row in the representative table (4th and 5th row in the result set). It will also return the rows from the representative
table that do not have a match in the department table (6th row):

It's important to note that the main purpose of an OUTER JOIN is to find the areas where there are no matches. If every row in both tables matched up perfectly, a FULL
OUTER JOIN would have the same result as an INNER JOIN.

The following Venn diagram shows what the data would return:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 30
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned how to use a FULL OUTER JOIN to retrieve a result set from multiple tables that includes both matched and unmatched records. A FULL
OUTER JOIN includes all the records from both tables, even if there are no corresponding records in the other table. This is in contrast to INNER JOINs, which fetch
only matching records. Any columns for which there is no matching value appear with a NULL value in the result set. OUTER JOINs are particularly useful for
scenarios involving incomplete data sets. In the example, you learned the command syntax for using a FULL OUTER JOIN to return data from the tables.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

FULL OUTER JOIN


A JOIN that retrieves all records from both tables, even if there is no matching record in the other table; the missing column entries appear as NULL.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 31
LEFT JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will explore a LEFT JOIN, in two parts. Specifically, this lesson will cover:
1. Understanding LEFT JOINs
2. Examples

1. Understanding LEFT JOINs


A LEFT JOIN combines data from two tables based on a specified condition. The result set includes all the records from the left (or first) table, plus matching records from the
right (or second table. Any unmatched rows from the left table will have NULL values. Using a LEFT JOIN, you can keep the context of all the left-hand records while
incorporating relevant data from the right-hand table.

The structure of the query looks like this:

SELECT <columnlist>
FROM <table1>
LEFT JOIN <table2> ON <table1>.<table1column1> = <table2>.<table2column1>;

 TERM TO KNOW

LEFT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the left (or first) table and matching records from the right (or
second) table. In the result set, any unmatched rows from the left table will have NULL values.

2. Examples
Let's revisit our data set with the representatives and departments again:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department (
department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL,
manager_id INT,
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES representative(representative_id)
);
INSERT INTO representative (representative_id, first_name, last_name)
VALUES (1, 'Bob', 'Evans'), (2, 'Tango', 'Rushmore'), (3, 'Danika', 'Arkane'), (4, 'Mac', 'Anderson');
INSERT INTO department (department_id, department_name, manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', NULL), (5, 'Support', NULL);
The LEFT JOIN clause is used to join the representative table with the department table. The first table listed in the FROM clause is considered to be the left table, and the
second table is considered to be the right table.

SELECT *
FROM representative
LEFT JOIN department ON representative.representative_id = department.manager_id;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 32
The LEFT JOIN starts to select data from the left table. It compares the representative_id from the representative table with the manager_id in the department table. If those
values are equal, the LEFT JOIN creates a new row that contains the columns of both tables and adds the new row in the result set, as you can see in the first three rows
returned in the image above. If the values are not equal, the LEFT JOIN also creates a new row containing columns from both tables but fills in the columns of the right table
(department) with a NULL value, as you can see in the 4th row. The Venn diagram of the LEFT JOIN looks like the following:

We can optionally add a WHERE clause to find only records from the left-hand table that do not have a matching value in the right-hand table. For example, you could use the
following query to return only records where manager_id is NULL—in other words, where there is no matching manager_id in the right-hand table:

SELECT *
FROM representative
LEFT JOIN department ON representative.representative_id = department.manager_id
WHERE manager_id IS NULL;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 33
This lesson's video references this type of JOIN as a "LEFT OUTER JOIN," but don't be confused by that. There is no separate LEFT OUTER JOIN clause in SQL (although SQL
will accept that syntax; it will just process it as a regular LEFT JOIN). The type of JOIN shown in the above query, and in the Venn diagram below, is accomplished by starting
with LEFT JOIN and adding the WHERE clause, not by using a different JOIN clause.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned how to use a LEFT JOIN to retrieve all the records from the left table and only the records from the right table that match them. This type
of query can help analyze data based on a specific condition. For example, an employee table and a department table could be joined with a LEFT JOIN to retrieve
data about all employees and their associated departments, including employees who do not yet have a department assigned.

In the examples section, you learned the syntax for two types of LEFT JOINs. In the first example, a LEFT JOIN is used to include all records from the left-hand table

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 34
and the data from any corresponding records from the right-hand table. In the second example, you learned how to add a WHERE clause to show only records from
the left-hand table that have no corresponding records on the right. This is sometimes called a LEFT OUTER JOIN, and it is referenced as such in this lesson's video.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

LEFT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the left (or first) table and matching records from the
right (or second) table. In the result set, any unmatched rows from the left table will have NULL values.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 35
RIGHT JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will explore using the RIGHT JOIN to query data from tables, in two parts. Specifically, this lesson will cover:
1. RIGHT JOINs
2. Examples

1. RIGHT JOINs
A RIGHT JOIN is the complement of a LEFT JOIN. It uses the same command syntax and options, but the right (or second) table is the one from which all records are
retrieved, whereas the left (or first) table is the one from which appears only matching records.

Most data analysis scenarios prioritize preserving data from the left table, so RIGHT JOINs are rarely used in practice. Just remember that they are available to you, in case
you encounter a situation where you need to focus on the right-hand table due to the data structure or query requirements.

The structure of the query looks like this:

SELECT <columnlist>
FROM <table1>
RIGHT JOIN <table2>
ON <table1>.<table1column1> = <table2>.<table2column1>

 TERM TO KNOW

RIGHT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the right (or second) table and matching records from the left
(or first) table. In the result set, only matching rows from the left and right tables are included, and any unmatched rows from the right table will have NULL values.

2. Examples
Let's again revisit our data set with the representatives and departments:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department (
department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL,
manager_id INT,
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES representative(representative_id)
);
INSERT INTO representative (representative_id, first_name, last_name)
VALUES (1, 'Bob', 'Evans'), (2, 'Tango', 'Rushmore'), (3, 'Danika', 'Arkane'), (4, 'Mac', 'Anderson');
INSERT INTO department (department_id, department_name, manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', NULL), (5, 'Support', NULL);
Let's see what they would look like with the RIGHT JOIN:

SELECT *
FROM representative
RIGHT JOIN department ON representative.representative_id = department.manager_id;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 36
The query itself isn’t that different from the LEFT JOIN, other than specifying RIGHT instead of LEFT. Notice that the last two rows are from the department table, where there
is not a match with the representative table.

Recall that the table listed in the FROM clause (representative) is considered to be the left table, and the table after the JOIN clause (department) is considered to be the right
table. The RIGHT JOIN starts by selecting data from the right table (department). It compares the manager_id from the department table to the representative_id in the
representative table. If those values are equal, the RIGHT JOIN creates a row containing both tables' columns and adds the new row in the result set. You can see that in the
first three rows in the table. If the values are not equal, the RIGHT JOIN also creates a row containing the columns from both of the tables but fills the columns from the left
table (representative) with NULL values.

The Venn diagram for the RIGHT JOIN looks like the following:

As with the LEFT JOIN in the previous lesson, you can add a WHERE clause to retrieve the rows from the right table that do not match any of the rows in the left table.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 37
SELECT *
FROM representative
RIGHT JOIN department ON representative.representative_id = department.manager_id
WHERE manager_id IS NULL;

The following Venn diagram illustrates the result of the RIGHT JOIN with the WHERE clause that specifies only unmatched rows from the right-hand table should be included.
This is sometimes called a RIGHT OUTER JOIN (including in this lesson's video below), but it's created using a WHERE clause; there is not a separate RIGHT OUTER JOIN
clause in SQL.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 38
In this lesson, you learned how to use a RIGHT JOIN to retrieve all the records from the right table and only the records from the left table that match them. This type
of query can help analyze data based on a specific condition.

In the examples section, you learned the syntax for two types of RIGHT JOINs. In the first example, a RIGHT JOIN is used to include all records from the right-hand
table and the data from any corresponding records from the left-hand table. In the second example, you learned how to add a WHERE clause to show only records
from the right-hand table that have no corresponding records on the left. This is sometimes called a RIGHT OUTER JOIN, and it is referenced as such in this lesson's
video.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

RIGHT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the right (or second) table and matching records from
the left (or first) table. In the result set, only matching rows from the left and right tables are included, and any unmatched rows from the right table will have
NULL values.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 39
CROSS JOINs
by Sophia

 WHAT'S COVERED

In this lesson, you will explore using CROSS JOINs to query data from tables. Specifically, this lesson will cover:
1. CROSS JOINs
2. Examples With Simple Tables

1. CROSS JOINs
CROSS JOINs are SQL operations that combine every row of one table with every row of another. A CROSS JOIN creates a new table with the same number of rows as the
product of the original table sizes, which differs from other types of JOINs that match conditions. It is also called a cross product or CARTESIAN JOIN since all row
combinations are included in the result. One possible use of a CROSS JOIN is to explore all possible combinations between two data sets.

This JOIN type differs from other JOIN types since it does not require specific conditions for joining. Although CROSS JOINs are powerful tools for generating all possible
combinations of rows between two tables, they should be used sparingly. CROSS JOINs can result in a huge number of rows being returned without the proper filtering
mechanisms. Data overload and performance issues may result from this.

Because CROSS JOIN can result in large data sets, it's helpful to apply filters or combine other JOIN types with it. By doing this, you can narrow your search results to only
show the information you seek.

The structure of the CROSS JOIN looks like this:

SELECT <columnlist>
FROM <table1>
CROSS JOIN <table2>;
Notice that there are no details of how the tables are joined. This statement would result in the same result set as if we did the following:

SELECT <columnlist>
FROM <table1>,<table2>;
Or, we could even use an INNER JOIN with a condition that always evaluates to true to force a CROSS JOIN:

SELECT <columnlist>
FROM <table1>
INNER JOIN <table2> ON true;

 TERM TO KNOW

CROSS JOIN
A clause in a query that creates a new table with the same number of rows as the product of the original table sizes.

2. Examples With Simple Tables


Let's try this by creating a couple of tables for size and color and joining them together.

CREATE TABLE color ( color_id INT PRIMARY KEY, color_name VARCHAR (50) NOT NULL );
CREATE TABLE size ( size_id INT PRIMARY KEY, size_name VARCHAR (30) NOT NULL );
INSERT INTO color (color_id, color_name)
VALUES (1, 'Blue'), (2, 'Red'), (3, 'Yellow');
INSERT INTO size (size_id, size_name)
VALUES (1, 'Small'), (2, 'Medium'), (3, 'Large');
Now that the tables have been created, we should expect to see nine rows returned from the CROSS JOIN, as there are three rows in the color table and three in the size
table:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 40
SELECT color_id, color_name, size_id, size_name
FROM color
CROSS JOIN size;

To understand the inner workings of CROSS JOIN, it's helpful to look at what is happening behind the scenes. You could run the following statement to do the same thing that
CROSS JOIN does much more compactly.

SELECT color_id, color_name, size_id, size_name


FROM color, size;
SELECT color_id, color_name, size_id, size_name
FROM color
INNER JOIN size ON true;

As you can see, the CROSS JOIN has taken each color and matched it to each size to display all possible combinations. This type of combination can become problematic as
the size of the data gets larger. In our example, we only have three rows in the color table and three rows in the size table. But imagine if we had 1,000 rows in one table and
1,000 rows in another table. The result would have 1,000,000 rows in the result. This is why CROSS JOIN is useful only in rare, specific scenarios.

 WATCH

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 41
 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that a CROSS JOIN, also called a CARTESIAN JOIN, combines every row in a table with every row in another table. There are no specific
criteria, unlike with other types of JOINs. CROSS JOINs are useful when you need to generate all possible combinations of rows from two simple tables. Use them
with caution, as they can quickly result in a large result set. You can reduce the result set size by applying filtering conditions or combining with other JOIN types.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

CROSS JOIN
A clause in a query that creates a new table with the same number of rows as the product of the original table sizes.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 42
Foreign Keys & Creating Tables
by Sophia

 WHAT'S COVERED

This lesson explores the use of foreign keys in more detail and the steps needed to ensure the keys are in the right tables, in two parts. Specifically, this lesson will
cover:
1. One-to-One Relationships
2. One-to-Many Relationships

1. One-to-One Relationships
A foreign key is a column, or a set of columns, that refers to the primary key of another table. As we have seen in prior lessons, foreign keys are used to establish
relationships between tables. Typically, we use a primary key in one table and a foreign key in another table to create a one-to-one or a one-to-many relationship between
those two tables.

In a one-to-one relationship, one table serves as a parent table, and the other table serves as a child table. With a foreign key constraint, a record must exist in the parent
table before a related record can be added to the child table. In other words, a record in the child table must have a related record in the parent table.

An example of this could be an employee table and a benefits table. In the employee table, we would have information specific to the employee, such as their name,
address, position, and date hired. In the benefits table, we could have an investment plan, medical plan, dental plan, and life insurance plan information. In this scenario, it
would not make sense to enter the benefits data first, as it depends on the employee. It would be illogical to have a record in the benefits table that is unrelated to a record
in the employee table.

The foreign key would be placed in the benefits table to reference the primary key of the employee table. We would take the primary key in the parent table, which is
employee_id, and copy it as a foreign key in the benefits table. This is where the term “foreign key” comes from, as the child table would have a primary key of its own, but
the primary key that we are introducing from the parent table (employee) is foreign to the child table (benefit).

We can also see this one-to-one relationship in the set of representative and department tables that we previously created:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department ( department_id INT PRIMARY KEY, department_name VARCHAR (100) NOT NULL, manager_id INT, constraint fk_manager FORE
);
A single manager (representative) is associated with only one department, and a single department is associated with only one manager. Interestingly, this is one case where
you could make the argument that either the department table or the representative table could be the parent table because both representatives and departments can exist
without the other in place. For example, we could have a department that recently lost its manager. It does not mean that we must remove the department because it still
exists, even though the manager is gone. Likewise, if we create a new department and still need a manager for it, it is still able to exist. So, you could make the argument that
the following relationship is also valid:

CREATE TABLE department (


department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL
);

CREATE TABLE representative (


representative_id INT PRIMARY KEY,
first_name VARCHAR(30) NOT NULL,
last_name VARCHAR(30) NOT NULL,
department_id INT,
CONSTRAINT fk_department FOREIGN KEY
(department_id) REFERENCES department(department_id)
);

 TERM TO KNOW

Foreign Key
A column or set of columns that refers to the primary key of another table.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 43
2. One-to-Many Relationships
The one-to-many relationship is similar to the one-to-one relationship when it comes to foreign keys. However, it is clearer which table is the parent table and which table is
the child table. Let's take a look at the artist and album table in our database. The one-to-many relationship follows the same guideline where we take a copy of the primary
key from the table on the “one” side of the relationship (the parent table) and then incorporate it into the table on the “many” side (the child table). In this example, the
artist_id that is the primary key in the artist table becomes the foreign key in the album table.

For every one-to-many relationship, the primary key of the “one” table will be a foreign key in the “many” table. It should never be the case that we have a foreign key on the
“one” side of a one-to-many relationship.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that foreign keys play a pivotal role in maintaining database relational integrity because they establish and enforce relationships between
tables. A foreign key is a column or set of columns that refers to the primary key of another table.

You learned that in a one-to-one relationship, each of the two tables involved uses the same information for its primary key, so the relationship can be set up so that
either table is the primary key and the other the foreign key. That's because in a one-to-one relationship, determining which table is the parent and which is the child
is situational. In contrast, in a one-to-many relationship, the "one" side is always the primary key, and the "many" side is always the foreign key. You also learned how
to set up a table as the foreign key of another table during the table's creation. You will learn more about creating foreign keys in upcoming lessons.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Foreign Key
A column or set of columns that refers to the primary key of another table.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 44
Foreign Keys & Altering Tables
by Sophia

 WHAT'S COVERED

This lesson explores using the ALTER TABLE command to add a foreign key, in two parts. Specifically, this lesson will cover:
1. ALTER TABLE
2. ON DELETE Clause

1. ALTER TABLE
An ALTER TABLE statement can be used to add a foreign key to a table. Although typically, the foreign key is added during a CREATE TABLE statement, there are instances
when it can be beneficial to add foreign keys after the table has been created and the data has been loaded. When we use a CREATE TABLE statement, we do not have to
define the foreign key constraint name, as PostgreSQL and other databases will assign an auto-generated name. However, when adding a foreign key with ALTER TABLE, we
do, using the ADD CONSTRAINT clause. The basic syntax to add a foreign key constraint to an existing table looks like this:

ALTER TABLE <child table name>


ADD CONSTRAINT <constraint_name>
FOREIGN KEY (<fk_column>)
REFERENCES <parent table name> (<parent_key_column>);
Consider the following tables:

CREATE TABLE representative ( representative_id


INT PRIMARY KEY,
first_name VARCHAR (30) NOT NULL, last_name
VARCHAR (30) NOT NULL );

CREATE TABLE department ( department_id INT


PRIMARY KEY,
department_name VARCHAR (100) NOT NULL,
manager_id INT );
We can add a foreign key to the department table for the manager_id by doing the following:

ALTER TABLE department


ADD CONSTRAINT fk_manager
FOREIGN KEY (manager_id)
REFERENCES representative (representative_id);
It is not that different from the syntax used in the CREATE TABLE statement, but the ordering of the statements is important.

 TERM TO KNOW

ALTER TABLE
A clause that changes the structure of a table.

2. ON DELETE Clause
You can optionally add other clauses in the ALTER TABLE statement that specify what to do with a record in the child table when the parent table's corresponding record is
deleted. For example, adding ON DELETE SET NULL will set the value in the child table to NULL if the parent record is deleted.

We can also use the ON DELETE SET DEFAULT clause and pass in a specific value. This means that if the parent record is deleted, the child record would update to a set
value. For example, if a manager is deleted, the department could automatically be moved under a specific generic employee and reallocated later on.

The ON DELETE CASCADE clause can be dangerous, but it has common uses. Using this clause, a delete on a parent record would also delete all of the referencing rows in
the child table. For example, in our database, let's assume we used the ON DELETE CASCADE clause on the support_rep_id in the customer table. If this was the case, and
we deleted employee_id 1 from the employee table, rather than throwing an error due to the foreign key constraint, it would instead attempt to delete all of the customer
records that had the support_rep_id. Subsequently, if we used ON DELETE CASCADE on all of our tables' foreign keys, the database would then try to delete all the invoices

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 45
that reflected those customers with the support_rep_id equal to 1. Then it would also delete the invoice_line records that reflect the deleted invoices. You can quickly see
why this could create many potential issues in the database if we were not careful.

If we wanted to add these additional clauses to the table, we would have to first drop the constraint:

ALTER TABLE department


DROP CONSTRAINT fk_manager;
Then we could add it back on with any changes to the clauses:

ALTER TABLE department


ADD CONSTRAINT fk_manager
FOREIGN KEY (manager_id)
REFERENCES representative (representative_id)
ON DELETE CASCADE;
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned how to use an ALTER TABLE statement to add a foreign key to an existing table. This involves defining the foreign key constraint name,
unlike when you create a foreign key at the same time that you create the table (in which case, the name is auto-generated). The clauses involved in this operation
are ADD CONSTRAINT, FOREIGN KEY, and REFERENCES.

You also learned about some optional ON DELETE clauses that you can use with ALTER TABLE to define what will happen in the child table (the foreign key side) if
the parent record is deleted. The ON DELETE SET NULL clause sets the value in the child table to NULL, for example, and ON DELETE CASCADE deletes the
referencing rows in the child table.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

ALTER TABLE
A clause that changes the structure of a table.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 46
Foreign and Primary Keys
by Sophia

 WHAT'S COVERED

This lesson explores if foreign keys can only derive from primary keys, in two parts. Specifically, this lesson will cover:
1. Introduction
2. Alternative Links for Foreign Keys

1. Introduction
Up to this point, we have focused on the concept that a foreign key should link to the primary key of another table. We use the foreign key to control the data that can be
stored in the foreign key table as well as control changes to the data in the primary key table. This constraint enforces referential integrity by guaranteeing that changes
cannot be made to the data in the primary key table if the changes invalidate the link to the data in the foreign key table.

There are some similarities and some differences between primary keys and foreign keys. Both a primary key and a foreign key can consist of more than one column.
However, every value in a primary key column must be unique and have a value (not NULL). A foreign key, on the other hand, can have a duplicate value in a one-to-many
relationship and can contain NULL values. A foreign key can also be temporarily disabled to simplify dropping tables or moving data, and they can be added after tables have
been created to avoid having to generate them in the right order.

2. Alternative Links for Foreign Keys


If we do not want to link a foreign key to a primary key, the other alternative is to link it to a column with a unique constraint. Note that a unique constraint can contain NULL
values unless a NOT NULL constraint is set. It is important to note that the foreign key has to reference a unique column, at the very least. As such, a foreign key could be
linked to a candidate key. A candidate key is not a specific type of constraint that can be set in a database. Instead, it is a set of attributes that uniquely identifies a table
record. A table could have multiple candidate keys, with one being set as the primary key for a table. However, any of those candidate keys could potentially have been
selected as the primary key. The candidate key would have all of the constraints similar to a primary key, but there may be better options to select as the primary key. The
properties of the candidate key include the following:

The column data should be unique.


The key can consist of multiple columns.
It should not contain any NULL values.

There are other instances where this can also work to link to a one-to-zero/one relationship, which is rare but possible. One example could be the employee and benefits
tables that we discussed in a prior lesson. As a reminder, the employee table would have information specific to the employee, such as their name, address, position, and
date hired. The benefits table would contain information on investment, medical, dental, and life insurance plans. Each employee would only have one benefit record.
However, it is possible that an employee would not have any correlating benefits record if they had yet to choose a benefit plan.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson you learned that in some cases, foreign keys do not always have relationships with primary keys in other tables. There are alternative links for foreign
keys—situations where a foreign key might reference another column in the same table, for example, in a hierarchical or recursive relationship. Another example
might be when a foreign key has a relationship to a candidate key in another table. Recall that a candidate key is a column containing unique values that could have
been selected as the primary key in the table but wasn't. Such situations are not typical, but the flexibility they represent offers database designers an option for
handling unusual relationship situations.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 47
Adding and Deleting Foreign Keys
by Sophia

 WHAT'S COVERED

This lesson describes the constraints involved in inserting or deleting data in tables with foreign key relationships and describes the correct order in which the
statements must be executed, in two parts. Specifically, this lesson will cover:
1. Inserting With Foreign Keys
2. Deleting With Foreign Keys

1. Inserting With Foreign Keys


When it comes to foreign keys, it can be challenging to determine the order in which data must be inserted into tables, especially when we have multiple tables that relate to
one another. In the example PostgreSQL database we have been working with so far in this course, we have the following foreign keys in place across the various tables:

In the above list, the first column contains the constraint name. Notice that the default naming scheme for foreign key constraints is the following:

_<column></column>_fkey
The second and third columns contain the information about the child table with a foreign key. The fourth and fifth columns show information about the parent table and the
primary key that the child is referencing.

At first glance, this seems backwards, given the column headings: You would think that the child table would be the one with "foreign" in the column headings, but it is the
reverse of that. Think of it this way: The first column lists the foreign key, and the second and third columns list the table and attribute that contain that foreign key.

Now, let's say that we want to insert records into both the parent and child tables. The order in which these insert operations occur is important. We must begin by inserting
records into the tables that do NOT have foreign keys to other tables. If we look at the set of table constraints, we can see that those tables are artist, genre, media_type,
playlist, and employee, as shown below. These are the tables that are not referenced in any of the constraints in the first column.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 48
Note that the employee table links to itself, as the manager is linked using the reports_to column in the same table. So, the first five tables that we would insert records into
are:

artist
employee
genre
media_type
playlist

The order among the five tables does not matter, as they are all on the first level of tables. Next, we look at the tables that reference those five tables.

Here we have the album, employee, customer, playlist_track, and track table. However, if we look at the second row, we see that the track table also has a foreign key to the
album_id in the album table. The album table was not in the first set of tables, so the track table has yet to be added. Likewise, in the last row, the playlist_track has a link to
the track table, which isn’t available yet. So, the next set of tables that can be inserted into is:

album
customer

Next, we can identify that the following tables are linked to the ones at the prior level and do not have other dependencies:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 49
Now that we have added the records to the album table, we can add each album's tracks to the track table. The invoice table depends only on the customer table, so
invoices can also be entered at this point. So, on the third level, we can insert records into the track and invoice tables, in any order.

Then, on the final level, we can now insert into invoice_line and playlist_track, as all dependencies have been accounted for.

In review, we would have to insert into any grouping of the following levels of tables:

Level 1

artist
employee
genre
media_type
playlist

Level 2
album
customer

Level 3
track

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 50
invoice

Level 4
invoice_line
playlist_track

Looking at the above structure, suppose that we need to create an invoice for an existing customer. We start out at Level 3, where the invoice table is located, and then after
creating the invoice, we move to the child table at Level 4, invoice_line, and enter the records that detail the invoice's contents.

2. Deleting With Foreign Keys


When deleting records, it's also important to do so in the right order. It's the same order as when inserting—but backwards. We want to delete the items at the lowest level
first, and work upward.

As you learned in the previous lesson, there are options you can set for foreign keys that can ensure that deleted data does not lead to a data integrity problem.

A foreign key constraint that specifies the ON DELETE CASCADE option will automatically delete all rows that depend on the referenced table when a row is deleted in the
referenced table. The purpose of this is to prevent orphaned records from being left behind.

The ON DELETE SET NULL option sets NULL the foreign key values in the referencing table when a row is deleted in the referenced table. When the deletion of data
shouldn't lead to data loss but rather nullify the relationship, it might be appropriate to do this.

Most database systems set ON DELETE NO ACTION as the default, which prevents the referenced row from being deleted if it has dependent rows in the referencing table.
As a result, the relationships between the data are maintained. The ON DELETE RESTRICT option is very similar to ON DELETE NO ACTION, and in most cases, it does the
same thing—it restricts the referenced row from being deleted.

In our sample database, the tables do not have ON DELETE CASCADE options set up for the foreign keys, so NO ACTION is the default. That means our delete operations
will fail unless we delete the records in the correct order: the opposite direction from when we inserted. In other words, we must delete the child rows before the parent
rows.

Records from the tables would have to be deleted in the following order by level:

Level 4

Invoice_line
playlist_track

Level 3
track
invoice

Level 2
album
customer

Level 1
artist
employee
genre
media_type
playlist

For example, if we wanted to delete a customer, we would have to delete from the invoice_line of the invoices that belong to that customer, then delete the invoices in the
invoice table that belong to the customer. Then, finally, delete the customer.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that SQL databases rely heavily on foreign key constraints to maintain data integrity and enforce referential integrity between tables.
When inserting new records in tables with foreign keys, we must insert starting with the parent tables and work our way down to the child tables. When deleting
records in tables with foreign keys, we go in the opposite direction: first the records from the child tables and then the corresponding records from the parent

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 51
tables. You were also reminded of the CASCADE options you learned about earlier, which can be used to specify what should happen when someone tries to delete
records in an order different from that. Some of these options include SET NULL and NO ACTION.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 52
Foreign Key Errors
by Sophia

 WHAT'S COVERED

This lesson explores the foreign key errors that can occur with SQL statements, in two parts. Specifically, this lesson will cover:
1. Foreign Key Errors When Inserting Records
2. Foreign Key Errors When Deleting Records

1. Foreign Key Errors When Inserting Records


As we have seen in the prior lesson, when we have a set of tables with foreign keys, we must insert, update, and delete in a specific order. Data changes can result in an
error if not done correctly.

When we insert data into a table, the most common issue with the foreign keys is the order in which we do it. However, another issue can arise if we're inserting data into a
table that does not have the referenced value from the foreign key in the parent table.

Let's revisit our data set with the representatives and departments again:

CREATE TABLE representative ( representative_id INT PRIMARY KEY, first_name VARCHAR (30) NOT NULL, last_name VARCHAR (30) NOT NULL );
CREATE TABLE department (
department_id INT PRIMARY KEY,
department_name VARCHAR(100) NOT NULL,
manager_id INT,
CONSTRAINT fk_manager FOREIGN KEY (manager_id) REFERENCES representative(representative_id)
);
);

INSERT INTO representative (representative_id, first_name, last_name)


VALUES (1, 'Bob','Evans'), (2, 'Tango','Rushmore'), (3, 'Danika','Arkane'), (4, 'Mac','Anderson');
INSERT INTO department (department_id, department_name,manager_id)
VALUES (1, 'Sales', 1), (2, 'Marketing', 3), (3, 'IT', 4), (4, 'Finance', null), (5, 'Support', null);
Suppose we then tried to insert a value for the manager_id that did not exist into the department table:

INSERT INTO department (department_id, department_name,manager_id)


VALUES (6, 'Test', 100);
We would get this error:

Query failed because of: error: insert or update on table "department" violates foreign key constraint "fk_manager"

This is because we attempted to pass 100 in the manager_id. But if we look at the representative table, we only have values 1–4 available. Likewise, if we attempted to
update a row in the department table’s manager_id to a value that didn’t exist, we would get the same error:

UPDATE department
SET manager_id = 100
WHERE department_id = 5;
We indeed do get the same error message:

Query failed because of: error: insert or update on table "department" violates foreign key constraint "fk_manager"

Again, this is because we're trying to update the department table from a valid value (NULL, in this case) to 100, which does not exist in the representative table. Both the
insert and update statements would be invalid in this case.

2. Foreign Key Errors When Deleting Records


Errors can also occur if we delete from a parent table when there are records associated with it in the child table. For example, the sales department has the manager_id set
to 1 in the department table, referencing Bob Evans in the representative table. Suppose we tried to delete that record from the representative table:

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 53
DELETE FROM representative
WHERE representative_id = 1;
We will get this error:

Query failed because of: error: update or delete on table "representative" violates foreign key constraint "fk_manager" on table "department"

This is because we’re trying to delete from the parent table representative when the department table still has that value as a foreign key. Consider if we tried to update the
representative_id in the representative table to another value:

UPDATE representative
SET representative_id = 100
WHERE representative_id = 1;
We would get the same error. If we needed to delete that particular representative, we would either have to delete the records that reference the representative_id in the
department table or update the representative_id to another value.

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned about some data insertion and deletion errors involving foreign keys. The foreign key error occurs when a value is inserted into a column
that references a primary key in another table, but the corresponding value does not exist in the referenced table. This error ensures referential integrity by
preventing inconsistent or disconnected data from being introduced. If you insert an order record without the associated customer ID in the customers' table, a
foreign key error would occur, safeguarding database coherence.

Foreign key errors when deleting records occur when a record from a referenced table is deleted while dependent records still exist in the referencing table. As a
result of this error, data loss is prevented, and relationships are maintained. If you try to delete a department from an organizational structure but there are
employees assigned to that department, a foreign key error would occur, prompting you to handle the dependencies first. The foreign key error acts as a safeguard
to maintain accurate and reliable data relationships in a database, regardless of whether the data is inserted or deleted.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 54
Subqueries
by Sophia

 WHAT'S COVERED

This lesson explores subqueries and how to use them in SELECT statements, in three parts. Specifically, this lesson will cover:
1. Subquery Basics
2. Using IN Operator in Clauses in Subqueries
3. Referring to Multiple Columns

1. Subquery Basics
Subqueries in SQL are queries nesting within each other. One query can be used in another query as part of the results. Using subqueries, you can retrieve data that will be
used for filtering, comparing, or computing in the main query. Combining and manipulating data from multiple tables and sources provides a powerful tool to check data in
multiple dimensions in one query.

SQL statements can contain subqueries in various places:

In the SELECT clause, a subquery can retrieve a single value used in the main query. For example, using a subquery, you might calculate an aggregate value and display
it alongside each row.
The FROM clause can contain derived tables, which are temporary tables that the main query can reference.
Subqueries in the WHERE clause are commonly used to filter results based on conditions that involve data from other tables.
The IN, ANY, and ALL operators are often used with subqueries to compare values returned by the subquery with values in the main query.
In a correlated subquery, columns from the outer query are referenced from the inner query. You can perform operations based on the outer query's values using related
subqueries.

EXAMPLE

SELECT product_name, price FROM products WHERE price > (SELECT AVG(price) FROM products);

This example retrieves the names and prices of products whose prices are greater than the average price calculated by the subquery (SELECT AVG(price) FROM
products).

By allowing you to operate on multiple tables and conditions simultaneously, subqueries enhance the flexibility of SQL queries. However, subqueries should be used with
caution, as not optimizing them can adversely affect performance.

There are many potential uses for subqueries in the PostgreSQL database you have been working with in this class. For example, we could use a subquery to find all of the
invoices that have a total amount larger than the average total amount.

We could do this in two separate steps based on what we’ve learned so far. We could first calculate the average total:

SELECT ROUND(AVG(total),2)
FROM invoice;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 55
Next, we can take that value and query the invoice table to find those that are larger than 5.71.

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE total > 5.71;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 56
Although this approach works, we can also do it in a way that can pass the result from the first query (that calculated the average) into the second query by using a subquery.
The subquery can be nested in a SELECT, INSERT, DELETE, or UPDATE statement, but we mostly see it in a SELECT statement.

To construct the subquery, we would enclose the second query in round brackets and place it in the WHERE clause as an expression. So, consider our original second
statement:

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE total > 5.71;
We would remove the 5.71 and replace it with round brackets enclosing the first statement, like this:

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE total > (SELECT ROUND(AVG(total),2) FROM invoice);
The query inside the round brackets is called the subquery, or inner query. The query that contains the subquery is called the outer query. Notice that we don’t include the
semicolon at the end of the inner statement in the subquery.

The database would execute the subquery first, then take the result from that subquery and pass it to the outer query. Then, the database executes the outer query.

 TERMS TO KNOW

Subquery
Also called an inner query, a query that is nested inside another query, so its results can be used as input for the outer query.

Outer Query
A query that uses the output of a subquery (inner query) as input.

2. Using IN Operator in Clauses in Subqueries


A subquery could potentially return 0 or more results, so it is important to consider the operator used to compare with its results. We can use any valid SQL operator in this
type of query. For example, if we use a = operator, we would expect that the subquery would return 0 or 1 row(s) as a result.

Let's look at an obvious subquery that is querying on the customer_id to return the customer_id:

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE customer_id = 1);

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 57
Since the primary key of the customer table is the customer_id, querying on the customer_id would only return one value.

Consider if we tried to select a value that doesn’t exist:

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE customer_id = 0);
The results would still run, but show that 0 rows were displayed:

However, consider the case in which we have multiple rows being returned:

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE customer_id =
(SELECT customer_id
FROM customer
WHERE country = 'USA');
Since there are 13 customers that live in the country USA, we will end up getting this error:

In order to avoid this error, we have to use the IN operator instead of the equal sign. This will allow 0, 1, or many results to be returned:

SELECT invoice_id, invoice_date, customer_id, total


FROM invoice
WHERE customer_id IN
(SELECT customer_id
FROM customer
WHERE country = 'USA');

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 58
3. Referring to Multiple Columns
We can also use multiple columns as criteria to compare with the subquery. In order to do so, we must use the round brackets around the columns within the WHERE clause
and have them match up with the columns that we want to compare within the subquery. For example, if we wanted to compare the customer_id and the billing_country in
the invoice table with the customer_id and country in the customer table, we could do the following:

SELECT invoice_id, invoice_date, customer_id, total


FROM invoice
WHERE (customer_id, billing_country) IN
(SELECT customer_id, country
FROM customer
WHERE country = 'USA');

If we did not put parentheses around the columns named in the first WHERE clause, we would get an error.

SELECT invoice_id, invoice_date,customer_id, total


FROM invoice
WHERE customer_id, billing_country IN
(SELECT customer_id, country
FROM customer
WHERE country = 'USA');

 WATCH

 TRY IT

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 59
Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned the basics of subqueries, which allow data from multiple tables or data sources to be integrated within a single query. They are nested
queries that are embedded within larger queries to retrieve specific data for use in the main query. A SQL subquery can appear in various parts of the statement,
such as the SELECT, FROM, WHERE, or HAVING clause, and it provides a means of manipulating and analyzing data more intricately.

It is possible to make SQL queries more dynamic and adaptable by leveraging subqueries. With them, you can filter, compare, and compute based on values from
other tables or even previous rows in your query. Subqueries can also be used to create derived tables that can be referenced like regular tables, simplifying
calculations, or data transformations. You explored using IN operator in clauses in subqueries to compare values returned by the subquery with values in the main
query. You also learned that SQL subqueries can be used to refer to multiple columns. Despite their flexibility, subqueries require optimization for performance, as
poorly constructed subqueries can cause slow execution. Using subqueries efficiently and accurately in SQL operations requires careful consideration of their
structure and purpose.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Outer Query
A query that uses the output of a subquery (inner query) as input.

Subquery
Also called an inner query, a query that is nested inside another query, so its results can be used as input for the outer query.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 60
Subquery Performance
by Sophia

 WHAT'S COVERED

This lesson explores the use of the EXPLAIN command to analyze query performance, in two parts. Specifically, this lesson will cover:
1. EXPLAIN Explained
1a. Node Types
1b. Analyzing a Query Plan
2. Subqueries vs. JOIN

1. EXPLAIN Explained
When using subqueries, you have to consider performance as it relates to each part of the query. This can and will be different each time you run a query. However, you can
get a general idea of how efficient a query will be in PostgreSQL because a query plan is created for every query that is attempted to run in the database. The database
looks at the query structure and the properties of the data and tables to create a good plan before it is executed.

Understanding all of the details of a plan can be quite complex, so our focus is on the basics of what to look for within a query execution plan. We can use the EXPLAIN
command on a query to display some of those details.

Let's query the invoice table and see what the results tell us:

EXPLAIN
SELECT *
FROM invoice;

Using just EXPLAIN, the database will not run the query. It will create an estimation of the execution plan based on the statistics that it has. This means that the plan can differ
a bit from reality. In PostgreSQL, though, we can execute the query as well. To do so, we can use the EXPLAIN ANALYZE at the beginning of the query instead of just
EXPLAIN.

EXPLAIN ANALYZE
SELECT *
FROM invoice;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 61
Note that this is an actual run process, so the timing will most likely be different each time you run it. For example, running it two more times yields the following results:

As such, we must rely on more than just the planning and execution time when comparing various statements.

Each query plan consists of nodes that can be nested and executed from the inside out. This means that the innermost node is executed before the outer node. Each node
has a set of associated statistics, like the cost, the number of rows that are returned, the number of loops that are performed (if needed), and some other choices. From the
last execution above, we can see that the cost shows 0.00..10.12, and we estimate that there are 412 rows returned. The width also shows the estimated width of each row in
bytes, which in our case is 65. The cost field shows how expensive the node was—in other words, how much processing time and other system resources it required to
execute. This can be a bit complex, as it is measured using different settings that start to become quite technical. The type of node appears at the left end of the first line. In
the above example, it is “Seq Scan.”

 TERMS TO KNOW

Query Plan
The results of an EXPLAIN command, describing the cost of each operation in a query.

EXPLAIN
A command that provides insight into the query execution plan for a given SQL query.

Node
A section of a query plan that describes a single operation within the query.

1a. Node Types


There are different types of nodes, and some are more efficient than others, all other factors being equal. Here is a generalized ranking from most efficient to least efficient:

Index Only Scan: This is the most efficient because it accesses all the required columns directly from their indexes without needing to access any unindexed table data.
This reduces processing time.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 62
Index Scan: This type is typically efficient, especially when the query can leverage an index to quickly locate and retrieve the necessary rows based on the indexed
columns.
Bitmap Index Scan: Bitmap index scans can be efficient when querying multiple columns or performing set operations (AND, OR, NOT) that can leverage bitmap indexes
effectively.
Hash JOIN (Hash Map): A hash JOIN is a type of join that combines data from two tables based on a join condition. It is typically faster than a sequential scan and is quite
effective for joining large data sets.
Sequential Scan: This is typically the least efficient because it involves scanning the entire table sequentially. This can be slow, especially for large tables.

Because index scans are the most efficient, queries are most efficient when they reference only indexed columns, or when the queries are able to pull the needed records
from the table based only on indexed columns. Keep in mind that primary key columns are automatically indexed, so querying on primary key columns can be very efficient
compared to querying on other columns.

1b. Analyzing a Query Plan


When reading the cost in a query plan, note that it is expressed as two numbers separated by two dots, like this: 0.00..10.12. The first number is the lower bound—in other
words, the minimum amount it could cost. The second number is the upper bound—in other words, the maximum amount it could cost. When comparing the cost of two
different queries, you would want to look both at the upper bound (that is, the worst-case scenario) and at the size of the range between lower and upper bounds.

When comparing the overall cost of two queries, you could sum all of the lower bounds for the various nodes and then sum all of the upper bounds, to derive an expected
range for the entire query. However, this does not always produce accurate information because a query execution plan can involve multiple operations that are not directly
additive, because the operations may interact with each other in nonlinear ways.

A better way to use cost data is to see which of the nodes has the highest cost and then look at the clause in the query that it represents to see if there is any way to make
that clause more efficient, such as by substituting a non-indexed column for an indexed one.

2. Subqueries vs. JOIN


Subqueries are more complex because you are using nested queries. For example, here is a subquery similar to what we used in the prior lesson:

EXPLAIN
SELECT invoice_id, invoice_date,customer_id, total
FROM invoice
WHERE customer_id IN
(SELECT customer_id FROM customer
WHERE city LIKE '%A');

Now let's try getting the same data using a subquery:

EXPLAIN
SELECT *
FROM customer
WHERE city IN
(SELECT billing_city
FROM invoice
WHERE COUNTRY like '%m')
As you can see below, this version has a much higher cost. The first Seq Scan node's upper bound is 331.81. That's a lot higher than any of the upper bounds in the other
version.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 63
 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that the EXPLAIN command provides insight into the query execution plan for a given query. Each query plan consists of different types of
nodes that can be nested and executed from the inside out. You explored analyzing a query plan by adding the EXPLAIN command to the beginning of a query. The
database engine uses this information to determine the order in which table scans, joins, and indexes are used to retrieve the requested data most efficiently. Lastly,
you compared subqueries vs. JOIN, understanding that you can optimize query performance by modifying clauses so that they have a lower cost as reported in the
query plan.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

EXPLAIN
A command that provides insight into the query execution plan for a given SQL query.

Node
A section of a query plan that describes a single operation within the query.

Query Plan Node


The results of an EXPLAIN command, describing the cost of each operation in a query.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 64
Find Duplicate Rows
by Sophia

 WHAT'S COVERED

This lesson explores finding duplicate rows of data in a table, in three parts. Specifically, this lesson will cover:
1. Introduction
2. Finding Duplicates
3. Duplicates for Counting

1. Introduction
Finding duplicate data in a table can be quite useful, as it can help us identify potential issues or matches. A duplicate row is one that refers to the same thing or person as a
whole other row. However, it is important to note that not all duplicate rows will have completely identical information, as it will depend on what columns of data we want to
search on. For example, we may have a large employee table that stores the employee’s Social Security number that uniquely identifies the employee in the U.S.

We could run a query to identify any instances of the same Social Security number appearing in multiple records. If we had customers, we could search for duplicate
accounts with an email address, as traditionally, an email address should only belong to a single customer for most e-commerce sites. If we had multiple records for a single
customer, it would be difficult to get a full order history for the customer, as they would have several rows that we would have to compare against.

 TERM TO KNOW

Duplicate Row
A row that refers to the same instance (such as a person, thing, or event) as another row.

2. Finding Duplicates
The structure of the query to find duplicates looks like the following:

SELECT <columnlist>
FROM <tablename>
GROUP BY <columnlist>
HAVING COUNT(*) > 1;
Note that the column list in the SELECT clause should match the column list in the GROUP BY clause. In addition, we could add a COUNT(*) in the SELECT clause to identify
how many duplicates there are of the same criteria that we grouped by.

For example, we want to verify that all of our customers have a unique phone number. We could do so like this:

SELECT phone, COUNT(*)


FROM customer
GROUP BY phone
HAVING COUNT(*) > 1;
We can verify that there are no customers that meet this criterion:

Similarly, we might want to know if there are customers who have placed multiple orders, and if so, how many. Here is a query that would produce that information.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 65
SELECT customer_id, COUNT(*)
FROM invoice
GROUP BY customer_id
HAVING COUNT(*) > 1;

We are quickly and easily able to identify those types of scenarios.

3. Duplicates for Counting


Another way to find duplicates is to count the number of rows that meet certain criteria. For example, in the following query, we are counting the number of customers
assigned to each support rep and listing the reps who have more than one customer assigned.

SELECT support_rep_id, COUNT(*)


FROM customer
GROUP BY support_rep_id
HAVING COUNT(*) > 1;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 66
And here's how we could list the states and countries that have more than one customer:

SELECT state, country, COUNT(*)


FROM customer
GROUP BY state, country
HAVING COUNT(*) > 1;

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, in the introduction you learned that the process of finding duplicates in a table involves identifying and quantifying rows with similar values. In this
process, you can gain insights into data quality, uncover potential anomalies, or support data cleansing efforts. SQL queries utilizing aggregation functions like
COUNT() and GROUP BY are commonly used in conjunction with filtering conditions to isolate duplicate records. This would be accomplished by constructing a SQL
query that groups the rows by the columns you're interested in and then applying the COUNT() function to each group. A group with a count greater than one
represents duplicate records since the specified columns have the same values. When you execute the query, you will receive a list of duplicate values and their
occurrence counts. You can also find duplicates by counting the number of rows that meet certain criteria. Using data patterns and potential data entry errors to
identify duplicate records is useful for data analysis and decision making.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 67
UNION to Combine Results
by Sophia

 WHAT'S COVERED

This lesson explores the UNION operator to combine result sets of various queries, in five parts. Specifically, this lesson will cover:
1. Introduction
2. Examples
3. Multiple UNION
4. Retaining Duplicates
5. Column Names

1. Introduction
You can use the UNION operator to combine data from multiple tables or queries. This can be useful when you want to temporarily combine multiple tables for some specific
purpose.

The syntax of the statement is as follows:

SELECT <columnlist>
FROM <table1>
UNION
SELECT <columnlist>
FROM <table2>;
The SELECT statements must use the same number of columns in the same order, and the corresponding columns must have the same data types. In the result set, column
names are determined by the column names in the first SELECT statement.

 KEY CONCEPT

A UNION operation deduplicates the result set—that is, it removes any duplicates. If you don't want duplicates to be removed, you can use UNION ALL instead.

 TERMS TO KNOW

Union Operator
An operator that combines the results of two or more SELECT statements and removes duplicate rows.

Deduplication
The removal of duplicates.

2. Examples
Let's look at an example in which our organization may want to email all users (customers and employees) an email to inform them of an upcoming sale. The same message
would be sent to all individuals. Instead of querying the customer and employee tables separately and having two result sets merge together, we can pull all the names and
email addresses into a single result set:

SELECT first_name, last_name, email


FROM employee
UNION
SELECT first_name, last_name, email
FROM customer;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 68
Note that this will not distinguish which individual is a customer or employee. We could add an extra string in the column list to distinguish the difference:

SELECT first_name, last_name, email, 'Employee'


FROM employee
UNION
SELECT first_name, last_name, email, 'Customer'
FROM customer;

To use the UNION operator, the tables we are querying from should have the same attribute characteristics, meaning that the number of columns and data types between the
two SELECT statements should match. Here's an example of a situation where they do NOT match:

SELECT customer_id
FROM invoice
UNION
SELECT first_name
FROM customer;
In the example above, the customer_id is an integer, while the first_name is a character string. As a result, we get the following error:

file:37233-RD 4.3.4.3_c.pngalt=A screenshot of a display with text: 'Query results. Query failed because of: error: UNION types integer and character varying cannot be
matched.'

If we have a different number of columns in each of the SELECT statements, we would run into another error. For example, in the following query, there is only one column
referenced in the first SELECT statement but two columns referenced in the second one.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 69
SELECT customer_id
FROM invoice
UNION
SELECT customer_id, first_name
FROM customer;

3. Multiple UNION
We could also create a UNION for more than two queries. For example, we may want to look at all the countries we operate in. We would need to look at the customer table,
invoice, and employee table.

SELECT billing_country
FROM invoice
UNION
SELECT country
FROM customer
UNION
SELECT country
FROM employee;
Notice that the result set only has 24 rows, as it excludes all duplicate values:

4. Retaining Duplicates
If we wanted to retain the duplicate values, we would use UNION ALL instead of UNION:

SELECT billing_country

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 70
FROM invoice
UNION ALL
SELECT country
FROM customer
UNION ALL
SELECT country
FROM employee;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 71
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 72
You should see now that the duplicates are included, and we have 479 results returned.

5. Column Names
Note as well that the column name that is used as the output reflects the first SELECT statement in the list. If we swapped the first SELECT statement with the second, the
country column displays as the output:

SELECT country
FROM customer
UNION ALL
SELECT billing_country
FROM invoice
UNION ALL
SELECT country
FROM employee;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 73
© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 74
As such, if we wanted to use aliases to rename a column, we would only need to do so for the first SELECT statement:

SELECT first_name as "First Name", last_name as "Last Name", email as "Email", 'Employee' as "Type"
FROM employee
UNION
SELECT first_name, last_name, email, 'Customer'
FROM customer;

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 75
In this lesson, in the introduction you learned that the UNION command is a powerful SQL command that combines the results of multiple SELECT statements. Data
is combined efficiently, and duplicate rows are automatically removed, resulting in only unique records in the final result.

You saw several examples demonstrating the UNION operator, in which two SELECT statements are run in a single statement, with the word UNION between them.
For successful merging, the column names, order, and data types must be the same in both SELECT statements. Multiple UNION operations are possible in the
same statement, including any number of tables. The UNION command deduplicates the result set; if you want to retain duplicates, you can use UNION ALL instead.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Deduplication
The removal of duplicates.

Union Operator
An operator that combines the results of two or more SELECT statements and removes duplicate rows.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 76
ANY and ALL Operators
by Sophia

 WHAT'S COVERED

This lesson explores using ANY and ALL operators to compare values with a list of values returned by a subquery. Specifically, this lesson will cover:
1. Introduction
2. Subquery Example
3. Potential Error
4. Operators for Comparisons

1. Introduction
The ALL and ANY operators allow us to query data by comparing a value with a list of values that are returned by a subquery. This is an important distinction for the ANY and
ALL operators, as they are focused on the lists from a subquery.

The syntax of the operator looks like the following:

<columnname> <operator> [ANY/ALL] (subquery);


The ANY operator is less restrictive than the ALL when it comes to comparisons. The ANY operator returns true if any of the values in the subquery meet the condition;
otherwise, it returns false. The ALL operator returns true if ALL of the values in the subquery meet the condition; otherwise, it returns false.

 TERMS TO KNOW

ANY
An operator that returns true if any of the values in the subquery meet its condition.

ALL
An operator that returns true only if all of the values in the subquery meet the condition.

2. Subquery Example
Let's take a look at an example where we’re needing to compare the average invoice totals per country:

SELECT AVG(total)
FROM invoice
GROUP BY billing_country;
We will use that average by country as our subquery. If we wanted to find invoices that have a value higher than the average of any of the totals from the country, we would
use the following:

SELECT *
FROM invoice
WHERE total > ANY(
SELECT AVG(total)
FROM invoice
GROUP BY billing_country);

To find invoices that have a value higher than all of the averages from all countries, use the following:

SELECT *
FROM invoice
WHERE total > ALL(

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 77
SELECT AVG(total)
FROM invoice
GROUP BY billing_country);

Notice the count difference between the two queries. The first query (with the ANY operator) is less restrictive.

3. Potential Error
With the ANY and ALL operators, the subquery must return a single column to compare. If we return more than one column in the subquery, only the first result set is
returned:

SELECT *
FROM invoice
WHERE (total,total) >= ALL(
SELECT AVG(total),max(total)
FROM invoice
GROUP BY billing_country);

Consider if we just compared the total to the max of the totals:

SELECT *
FROM invoice
WHERE total >= ALL(
SELECT max(total)
FROM invoice
GROUP BY billing_country);
One record would be returned:

4. Operators for Comparisons


You may notice that the query above uses >= instead of >. We can compare ANY and ALL using a variety of operators, as shown in the following table.

Operator The Expression Evaluates to True if...

>ALL A value in the main query is greater than the largest value returned by the subquery.

>=ALL A value in the main query is greater than or equal to the largest value returned by the subquery.

<ALL A value in the main query is less than the smallest value returned by the subquery.

<=ALL A value in the main query is less than or equal to the smallest value returned by the subquery.

=ALL Every value in the main query is equal to every value returned by the subquery.

!=ALL A value in the main query is not equal to every value returned by the subquery. This is equivalent to the <>ALL operator.

>ANY A value in the main query is greater than the largest value returned by the subquery.

>=ANY A value in the main query is greater than or equal to the smallest value returned by the subquery.

<ANY A value in the main query is less than the largest value returned by the subquery.

<=ANY A value in the main query is less than or equal to the largest value returned by the subquery.

=ANY A value in the main query is equal to any value returned by the subquery. This is equivalent to the IN operator.

 WATCH

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 78
 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned that SQL's ANY and ALL operators make complex comparisons easier and more efficient by offering a concise and flexible way to
compare values. They compare a value with a set of values returned by a subquery. These operators enable you to compare multiple values from a subquery without
specifying each value explicitly. You saw some subquery examples of the correct syntax for using these operators and looked at some potential errors. Finally, you
reviewed a table comparing the different operators used with ANY and ALL and what results they generate.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

Any and All


The ANY and ALL operators compare a value with a set of values returned by a subquery. These operators are particularly useful if you want to compare multiple
values from a subquery without specifying each value explicitly.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 79
Calculations in SELECT Statements
by Sophia

 WHAT'S COVERED

This lesson explores using calculations in SELECT statements to create more complex data results, in two parts. Specifically, this lesson will cover:
1. How Calculations Aid in Data Analysis and Reporting
2. Using a Calculation in a SELECT Statement

1. How Calculations Aid in Data Analysis and Reporting


Calculations in SELECT statements are important because they enable you to transform, manipulate, and derive meaningful insights from your data directly within your SQL
queries. They are a powerful tool for data analysis, reporting, and presentation. Here's why calculations in SELECT statements are crucial:

 BIG IDEA

By transforming and deriving data with calculations, you can perform various mathematical, logical, or string operations. You can create new columns representing
aggregated values, percentages, averages, differences, etc. This transformation converts raw data into a more meaningful and interpretable format, facilitating decision
making.
You can get more focused customized reporting by performing calculations within the SELECT statement; you can generate customized reports tailored to your specific
needs. This flexibility allows you to present data in a format that suits your requirements, whether it's summarizing sales data, calculating growth rates, or computing ratios.
Performing calculations directly in the database via SQL can be more efficient than exporting and processing data externally. Database engines are optimized for such
computations, potentially resulting in faster query execution times.

Calculations enable you to gain real-time insights into your data without preprocessing it outside the database. This is especially valuable when working with large datasets
or performing frequent and iterative analyses. Calculations within SQL queries maintain data integrity by ensuring that all calculations are consistent and accurate. It reduces
the risk of errors that might occur if data is manipulated in separate external tools. Calculations can be combined with joins to create more complex analyses that involve data
from multiple tables. This is especially useful when working with relational databases. Calculations are often used in combination with aggregation functions and grouping to
summarize data at different levels and produce meaningful insights for different segments.

 KEY CONCEPT

In effect, calculations in SELECT statements empower you to extract actionable insights directly from your data within the database environment. They save time,
enhance data accuracy, and contribute to better decision making by providing you with the tools to analyze and present data in ways that address your specific business
needs.
In most cases, the computed values are not stored in the database. Rather, these values are typically calculated at the time of the query so that they reflect the most up-to-
date data.

SELECT invoice_id, SUM(quantity * unit_price)


FROM invoice_line
GROUP BY invoice_id;
EXAMPLE If we wanted to calculate the total amount by invoice_id in the invoice_line table, we would need a query like the following: This query would calculate the
total of the invoices by adding up the product of the quantity multiplied by the unit_price per item. In this particular example, we already have that information available in the
total column of the invoice table, but in the invoice table, the value is rounded up to the nearest integer, whereas here in the calculated results, it appears with two decimal
places.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 80
2. Using a Calculation in a SELECT Statement
We can also do the same for a more complex query with dates to find the employee’s age when they were hired, based on their birthdate. At a very basic level, we could
simply use a minus sign to subtract the birth_date from the hire_date, like this:

SELECT employee_id, hire_date, birth_date, hire_date - birth_date


FROM employee;

However, the result is based on days, as shown above. To convert this to a year, we would have to do some further calculations. First, we will need to pull out just the number
of days by using the date_part function, like this:

SELECT employee_id, hire_date, birth_date, date_part('day',hire_date - birth_date)


FROM employee;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 81
Next, we need to convert the days to years by dividing the age in days by 365.25 (with the .25 used to account for leap years). However, if we simply divide by 365.25, the
result is still displayed in days instead of years:

SELECT employee_id, hire_date, birth_date, (hire_date - birth_date)/365.25


FROM employee;

Instead, let's divide the number of days by 365.25 and then round it to the nearest integer:

SELECT employee_id, hire_date, birth_date, ROUND(date_part('day',hire_date - birth_date)/365.25)


FROM employee;

This is a great example of calculation being used. You could even convert the hire_date to use now() to get the current date to find out the employee’s current age at the
time that the query is run:

SELECT employee_id, hire_date, birth_date, ROUND(date_part('day',now() - birth_date)/365.25)


FROM employee;
Give it a try and see what happens. This is not a calculation you could easily store in the database, as the value would constantly change. If it were to be stored, you would
have to update the table daily. Consider that aspect of live data as we look ahead to using views.

 WATCH

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 82
 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned how calculations aid in data analysis and reporting—more specifically, that calculations used in SELECT statements enable data
transformation, analysis, and customization. In these calculations, various mathematical, logical, and string functions are applied to data retrieved from a database.
This leads to valuable insights that can be gained from the derived values. You learned that they can also be used to keep queries current. Calculations allow for
data transformation, enabling aggregation, computation, and transformation of values to produce new columns. Business performance can be comprehensively
understood by calculating total sales, average prices, or growth rates. You learned that calculations help tailor data presentation to meet specific needs, facilitating
customized reporting. This flexibility is essential for generating insightful reports that address specific questions or provide insights into trends and patterns.

Data analysis is made more efficient by using calculations within SELECT statements. This avoids the time-consuming process of exporting data and external
processing by processing it directly within the database. With real-time processing, even large datasets can be analyzed quickly and iteratively. A SELECT statement
enables data transformation, custom reporting, and efficient real-time insights, ultimately facilitating informed decision making and a deeper understanding of data-
driven narratives.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 83
VIEW to Provide a Subset
by Sophia

 WHAT'S COVERED

This lesson explores using CREATE VIEW to provide a useful report on a subset of data, in two parts. Specifically, this lesson will cover:
1. Introduction to Views
2. View Examples

1. Introduction to Views
So far in this course, every time you wanted to query a database, you have had to manually type the statement. That's a lot of work! One way to avoid all that typing and
make accessing the database easier for people who aren't SQL experts is to create views.

A view is a saved query that acts as if it were a table. Users can query the view the same way they query a table. Views can simplify complex queries, abstract underlying
data structures, and provide some security.

Here are some of the benefits of using views:

Abstraction. Views enable you to present data in a user-friendly way. For example, a view can combine data from multiple tables and perform calculations on it without
the end user needing to know how that happens.
Security. By creating views that expose only the relevant data, you can control what data users are allowed to see.
Reusability. You can use a view to save a query that you run frequently so you don't have to retype it each time.
Performance. When you query a view, the underlying data has already been processed, potentially reducing the execution time of your queries.
Simplified access: Views can provide a unified interface to data that is stored in different tables. Instead of directly querying multiple tables, users can query views that
consolidate the necessary information.
Integrity: Views can enforce data integrity by ensuring that certain conditions are always met when accessing the data. For example, a view can filter out invalid or
inconsistent data.

 KEY CONCEPT

Although views are primarily used for reading data, some database systems support updateable views that allow limited data modifications.

 TERM TO KNOW

View
In a database, a view is the result set of a stored query, which can be queried like a persistent database collection object.

2. View Examples
A view can help simplify the complexity of a query, as we will see in the upcoming lessons, by allowing us to bypass entering the complex query each time. In addition, like a
table, you can grant users permission through a view containing specific data. For example, if you have an employee table with sensitive personal information like salary, you
could create a view that does not include the salary and allow users to query that view rather than the base table.

The statement would look like the following:

CREATE VIEW <viewname>


AS
SELECT <statement>;
For example, if we wanted to only have a list of the employee names and who they report to, we could create a view like this:

CREATE VIEW employee_report


AS
SELECT first_name, last_name, employee_id, reports_to
FROM employee;
In PostgreSQL, views appear in the navigation pane on the left, like tables do.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 84
To query from this view, we would create a SELECT statement as if it were querying from a table:

SELECT *
FROM employee_report;

If this were a query that we ran often, it would simplify having to write out the query each time.

If different employees wanted to query their customer list (through the support_rep_id), we could create a view for each:

CREATE VIEW customer_contact_3


AS
SELECT first_name, last_name, email
FROM customer
WHERE support_rep_id = 3;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 85
Instead of having to query that entire customer list and filtering out the data each time, the support_id_rep equal to 3 can query the customer_contact_3 view directly, like
this:

SELECT *
FROM customer_contact_3;
Otherwise, they would have to enter in the entire SELECT statement each time:

SELECT first_name, last_name, email


FROM customer
WHERE support_rep_id = 3;
In this case, the view filters the columns (first_name, last_name, and email) as well as the rows (those that have the support_rep_id equal to 3). Since the view does not store
any data, if anything changes in the customer table, the updates would be reflected in the view immediately when queried.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In the introduction to views section, you learned that a database view plays a crucial role in enhancing database usability, security, and efficiency. It enables the
logical abstraction of complex data structures, providing tailored insights into the underlying data. Users benefit from this abstraction because it simplifies user
interactions, makes queries more intuitive, and shields them from the complexities of the database schema. Users can utilize views when they need specific subsets
of data or calculations for specific tasks since they can be designed to present only those elements of information that they need. Views reduce the processing load
by performing calculations and aggregations in advance. They also reduce the amount of typing needed to execute SQL statements that must be run frequently.

You learned in the view examples that the security of data is also significantly enhanced by views. Views enable database administrators to limit user access to
sensitive or confidential data by creating views that omit the columns containing the data to be kept private. Users can then query the view instead of the table.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

View
In a database, a view is the result set of a stored query, which can be queried like a persistent database collection object.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 86
VIEW to Simplify Queries
by Sophia

 WHAT'S COVERED

This lesson explores using CREATE VIEW to join multiple table data to simplify queries, in two parts. Specifically, this lesson will cover:
1. Using CREATE VIEW to Combine Tables
2. Creating a View With More Than Two Tables

1. Using CREATE VIEW to Combine Tables


One helpful use for CREATE VIEW is to combine data from multiple tables. This enables us to make data available in a single result set that would normally not be located in
the same place. For example, seeing the support_rep_id may not be extremely useful in an organization unless you know who that value belongs to. Instead, you could
include the name of the support rep, similar to the following:

CREATE VIEW customer_contact


AS
SELECT customer.*, employee.first_name as support_first_name, employee.last_name as support_last_name
FROM customer, employee
WHERE customer.support_rep_id = employee.employee_id;
If we queried the customer_contact view, it would look like the following:

SELECT *
FROM customer_contact;

We could also specify the column names desired rather than displaying all columns:

SELECT first_name, last_name, support_first_name, support_last_name


FROM customer_contact;

Using CREATE VIEW can be a great time-saver because it helps you avoid having to re-type a query each time you want to run it. If not for VIEW, we would have to type and
run the following longish query every time we wanted its information:

SELECT customer.first_name, customer.last_name, employee.first_name as support_first_name, employee.last_name as support_last_name


FROM customer, employee
WHERE customer.support_rep_id = employee.employee_id;

2. Creating a View With More Than Two Tables


In most of the query examples in this course so far that have involved multiple tables, we have only included the primary and foreign key columns. For example, when we
query the track table, we have been focused on track_id. However, sometimes we might also want to look at the artist's name, album title, and track name, all at the same

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 87
time. Creating a view for this purpose can simplify that process:

CREATE VIEW artist_album_track


AS
SELECT artist.name as artist_name, album.title as album_title, track.name as track_name
FROM artist
INNER JOIN album ON artist.artist_id = album.artist_id
INNER JOIN track ON album.album_id = track.album_id;
Then, rather than querying the tables each time we want that list, as shown below:

SELECT artist.name as artist_name, album.title as album_title, track.name as track_name


FROM artist
INNER JOIN album ON artist.artist_id = album.artist_id
INNER JOIN track ON album.album_id = track.album_id;
We can simply query the view directly, like this:

SELECT *
FROM artist_album_track;

Consider if we wanted to add some filters into our SELECT statement, such as only listing the rows that belong to AC/DC. Instead of doing this:

SELECT artist.name as artist_name, album.title as album_title, track.name as track_name


FROM artist
INNER JOIN album ON artist.artist_id = album.artist_id
INNER JOIN track ON album.album_id = track.album_id
WHERE artist.name = 'AC/DC';
We would query the view like this:

SELECT *
FROM artist_album_track
WHERE artist_name = 'AC/DC';

The second option greatly simplifies the query without having to join each of the tables together.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned how to combine data from multiple tables using CREATE VIEW. The view can then be used for other queries, making the complex data
relationships easier for them to understand. Depending on the desired relationship between the tables, the query may involve various types of joins and creating a

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 88
view with more than two (i.e., any number) of tables.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 89
VIEW & Complex Queries
by Sophia

 WHAT'S COVERED

This lesson explores using views to provide a useful report on a complex set of data. Specifically, this lesson will cover:
1. Basing a Complex Query on an Existing View

1. Basing a Complex Query on an Existing View


Views can simplify processing aggregated data by precomputing and storing aggregated data, making it readily accessible to users. This saves time for both the humans
typing the commands and the database software processing them. Calculations for complex aggregations can be performed ahead of time and stored as a virtual table. This
eliminates the need to recompute aggregations with each query.

Querying views using simple SELECT statements makes querying more intuitive and reduces the chance of errors. Precomputing aggregate values in a view ensures
consistent aggregation logic across multiple queries, which helps maintain data accuracy and ensures that all users retrieve the same aggregate values.

For example, we may have a view created to list the calculated total of the amount per country:

CREATE VIEW invoice_country


AS
SELECT billing_country, SUM(quantity*unit_price) AS calculated_total, MAX(quantity*unit_price) AS max_total, MIN(quantity*unit_price) AS m
FROM invoice
INNER JOIN invoice_line ON invoice.invoice_id = invoice_line.invoice_id
GROUP BY invoice.billing_country;
It can get complex to look at specific criteria on the aggregate calculations, but querying the view would result in the following results:

SELECT *
FROM invoice_country
ORDER BY calculated_total ASC, billing_country;

From there, we may want to filter that data out further by doing the following:

SELECT *
FROM invoice_country
WHERE max_total = 1.99
ORDER BY calculated_total ASC, billing_country;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 90
Compare this to the full statement on the base tables to get that same information:

SELECT billing_country, SUM(quantity*unit_price) AS calculated_total, MAX(quantity*unit_price) AS max_total, MIN(quantity*unit_price) AS m


FROM invoice
INNER JOIN invoice_line
ON invoice.invoice_id = invoice_line.invoice_id
GROUP BY invoice.billing_country
HAVING MAX(quantity*unit_price) = 1.99;

We can take the data from the view and use the information to query other tables. Using the same example above, if we find that the max_total = 1.99, we may want to list all
of the customers in those countries while listing all of the countries' calculated criteria:

SELECT *
FROM invoice_country
INNER JOIN customer ON invoice_country.billing_country = customer.country
WHERE max_total = 1.99;

Think about what that query would need to look like if you queried this using just the base tables. Consider the complexity of that type of statement. Aggregate data
scenarios like this would be very difficult to create without using views.

Imagine, too, if you wanted to link all of the tables in our database together through a consistent dataset to get a list of the track names that a customer has purchased.
Rather than writing that query each time, having a view that joins all of the tables together may make it much simpler to query.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned the benefits of creating views to perform complex aggregation operations, and then use the views as a basis for future queries you might
create that require those aggregate values. Some of the benefits of this approach include consistency, reduced processing time, and reduced labor. Then you
looked at some examples of basing complex queries on existing views, making queries simple that would otherwise be complex and difficult to create.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 91
CREATE OR REPLACE VIEW to Update Views
by Sophia

 WHAT'S COVERED

This lesson explores making changes to views, in two parts. Specifically, this lesson will cover:
1. Introduction
2. Replacing or Altering a View

1. Introduction
If a view needs to be changed, you have two options.

First, you can use a command that replaces or alters the existing view statement. This method ensures atomicity: Either the view is replaced with the new definition, or it
remains unchanged. There is no point at which the view doesn't exist. It also preserves the existing view's dependencies, as long as those dependencies don't rely on
columns that have been changed or removed. However, different DBMSs have differing commands and syntax for doing this, and in some systems, the changes you can
make are limited; if you need to make big structural changes, you may have to delete and recreate.

In PostgreSQL, you use CREATE OR REPLACE VIEW to make changes to the view's structure, and you use ALTER VIEW to make changes to its non-content properties such
as its name and its security permissions.

 KEY CONCEPT

You cannot use CREATE OR REPLACE VIEW to drop columns from a view; you just drop and recreate the view if you need that.
Second, you can DROP (delete) and recreate the view. This approach is simple, and it provides complete control over the view's definition and structure. You will learn about
that method in the next lesson.

2. Replacing or Altering a View


Some DBMSs (including PostgreSQL, MySQL, and MariaDB) have a CREATE OR REPLACE VIEW statement that you can use to substitute a new view for an old one of the
same name. Here's the syntax for doing that:

CREATE OR REPLACE VIEW my_view AS


SELECT column1, column2, ..
FROM some_table
WHERE condition;
There are limitations to its use, though, and those limitations can vary depending on the DBMS.

 HINT

Some DBMSs call this operation CREATE OR ALTER VIEW. SQL Server is one example.
PostgreSQL also has an ALTER VIEW statement, but it is limited. It does not change the content of the underlying query; it only changes metadata about the view, such as its
name, owner, or security policies. The basic syntax is:

ALTER VIEW view_name OPTIONS (view_options);


Let's work through an example that demonstrates how to use both the CREATE OR REPLACE VIEW and the ALTER VIEW statements.

Let's create a view that has the invoice_id, total, and first and last name of the customer:

CREATE VIEW invoice_information


AS
SELECT invoice_id, total, first_name, last_name
FROM invoice
INNER JOIN customer ON invoice.customer_id = customer.customer_id;
To query the view, we can run the following SELECT statement:

SELECT *

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 92
FROM invoice_information;

If we wanted to have the employee’s first and last name instead of the customer’s first and last name, we could drop the view and then recreate it. However, if there were
objects that depended on this view, we would have to drop those as well. Using the CREATE OR REPLACE VIEW allows us to recreate it without getting rid of the dependent
objects.

Let us make the change in the view:

CREATE OR REPLACE VIEW invoice_information


AS
SELECT invoice_id, total, employee.first_name, employee.last_name
FROM invoice
INNER JOIN customer ON invoice.customer_id = customer.customer_id
INNER JOIN employee ON customer.support_rep_id = employee.employee_id;
Even though the names of the columns are the same and are in the same order, we will get this error:

This is because the first_name in the employee table is 20 characters, so the data size does not match. In this case, we could resize the employee table’s first_name column
to 40 characters, as well as the last_name:

ALTER TABLE employee


ALTER COLUMN first_name TYPE VARCHAR (40);
ALTER TABLE employee
ALTER COLUMN last_name TYPE VARCHAR (40);
Now, let's attempt to make the modification again:

CREATE OR REPLACE VIEW invoice_information


AS
SELECT invoice_id, total, employee.first_name, employee.last_name
FROM invoice
INNER JOIN customer ON invoice.customer_id = customer.customer_id
INNER JOIN employee ON customer.support_rep_id = employee.employee_id;
We are successful, as the data type, column names, and order all match:

SELECT *
FROM invoice_information

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 93
Since the data has changed, we may want to also change the name of the view. We can use the ALTER VIEW statement to rename it:

ALTER VIEW invoice_information RENAME TO invoice_employee_info;


 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In the introduction, you learned that when you want to make changes to a view, you have two options: replacing or altering a view. You can use the CREATE OR
REPLACE VIEW statement (to make changes to the underlying query) or the ALTER VIEW statement (to make changes to the view's metadata, such as its name).
Using CREATE OR REPLACE VIEW, the existing query in the view is completely replaced with a new one. You saw some difficulties that might arise due to column
size, and how to overcome them. You also learned that another option is to DROP (delete) and recreate a view, a topic that will be covered in the next lesson.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 94
DROP VIEW to Remove Views
by Sophia

 WHAT'S COVERED

This lesson explores using the DROP VIEW statement to remove views, in three parts. Specifically, this lesson will cover:
1. Introduction
2. Dropping a View
3. Managing the View's Dependencies

1. Introduction
When a view is no longer needed, or if you want to delete it so you can recreate it, you can use the DROP VIEW statement to remove it from the database.

The syntax for dropping a view in PostgreSQL is straightforward:

DROP VIEW [IF EXISTS] view_name [, ...] [CASCADE | RESTRICT];


The view_name is the name of the view you want to remove.

IF EXISTS is an optional clause that prevents an error from occurring if the view doesn't exist.

CASCADE allows you to drop dependent objects like rules or triggers associated with the view.

RESTRICT prevents the view from being dropped if there are any dependent objects, providing a safety mechanism to avoid accidental deletions.

When you execute the DROP VIEW command, PostgreSQL will permanently delete the specified view and all its associated metadata, including permissions. It's important to
exercise caution when using this command, especially in production environments, as it can impact applications relying on the view's existence and structure.

 TERM TO KNOW

DROP VIEW
A statement that deletes a view.

2. Dropping a View
Let's look at an example of dropping a view. First, suppose we create the following view for the album and artist names:

CREATE VIEW album_artist_names


AS
SELECT album.title, artist.name
FROM album
INNER JOIN artist ON album.artist_id = artist.artist_id;

If we wanted to drop the view, it would look like this:

DROP VIEW album_artist_names;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 95
If you try to drop a view that doesn't exist, an error message will appear. For example, if we tried to drop the album_artist_names view a second time, we would see the
following:

DROP VIEW album_artist_names;

However, by using the IF EXISTS parameter, the database will only drop the view if it exists. If not, no error message will be generated:

DROP VIEW IF EXISTS album_artist_names;

3. Managing the View's Dependencies


By default, the RESTRICT parameter is in effect in a DROP VIEW statement. RESTRICT prevents the view from being dropped if there are any dependencies upon it. For
example, suppose we created the album_artist_names view and then created another view called temp_album_artist_names that queried from the album_artist_names
view:

CREATE VIEW album_artist_names


AS
SELECT album.title, artist.name
FROM album
INNER JOIN artist ON album.artist_id = artist.artist_id;
CREATE VIEW temp_album_artist_names
AS
SELECT *

FROM album_artist_names;
If we tried to drop the album_artist_names view, we would get an error, as the temp_album_artist_names table uses that view:

DROP VIEW IF EXISTS album_artist_names


RESTRICT;

Alternatively, we could use the CASCADE option to have the statement automatically drop all of the objects that depend on the view, and all objects that also then depend on
those other objects. The statement with the cascade looks like this:

DROP VIEW IF EXISTS album_artist_names


CASCADE;

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 96
Be careful when using CASCADE, as there could be many unintended items dropped. A lengthier but more cautious approach would be to drop views one at a time:

DROP VIEW IF EXISTS temp_album_artist_names;


DROP VIEW IF EXISTS album_artist_names;
You can drop multiple views using a single DROP VIEW statement, like this:

DROP VIEW IF EXISTS album_artist_names, temp_album_artist_names;

Notice that even though there is a dependency between the two views that were dropped in the preceding statement, they could both be dropped together.

 WATCH

 TRY IT

Your turn! Open the SQL tool by clicking on the LAUNCH DATABASE button below. Then, enter in one of the examples above and see how it works. Next, try your own
choices for which columns you want the query to provide.

 SUMMARY

In this lesson, you learned in the introduction that the DROP VIEW statement removes views from a database, followed by exploring an example of dropping a view.
You might remove a view because it's no longer needed, or because you plan to recreate it in a different way.

The IF EXISTS clause can optionally be added to prevent an error from appearing if the view does not exist. You also learned about managing the view's
dependencies; for example, you can use RESTRICT to prevent the view from being dropped if any dependent objects exist, but it is unnecessary to add it to the
statement because it is the default behavior anyway. The CASCADE option does the opposite: It deletes any objects that are dependent upon the view being
deleted.

Source: THIS TUTORIAL WAS AUTHORED BY DR. VINCENT TRAN, PHD (2020) AND FAITHE WEMPEN (2024) FOR SOPHIA LEARNING. PLEASE SEE OUR TERMS OF USE.

 TERMS TO KNOW

DROP VIEW
A statement that deletes a view.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 97
Terms to Know
ALTER TABLE
A clause that changes the structure of a table.

Any and All


The ANY and ALL operators compare a value with a set of values returned by a subquery. These operators are particularly useful if you want to
compare multiple values from a subquery without specifying each value explicitly.

CROSS JOIN
A clause in a query that creates a new table with the same number of rows as the product of the original table sizes.

Column Alias
An alternative name assigned to a column for use in a query's result set.

DROP VIEW
A statement that deletes a view.

Deduplication
The removal of duplicates.

EXPLAIN
A command that provides insight into the query execution plan for a given SQL query.

FULL OUTER JOIN


A JOIN that retrieves all records from both tables, even if there is no matching record in the other table; the missing column entries appear as
NULL.

Foreign Key
A column or set of columns that refers to the primary key of another table.

LEFT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the left (or first) table and matching
records from the right (or second) table. In the result set, any unmatched rows from the left table will have NULL values.

NATURAL JOINs
SQL JOINs based on columns with identical names in the related tables. They are easier to set up than other JOINs but can result in unintended
connections when the tables share multiple common column names.

Node
A section of a query plan that describes a single operation within the query.

Outer Query
A query that uses the output of a subquery (inner query) as input.

Query Plan Node


The results of an EXPLAIN command, describing the cost of each operation in a query.

RIGHT JOIN
A clause that combines data from two tables based on a specified condition. It retrieves all records from the right (or second) table and matching
records from the left (or first) table. In the result set, only matching rows from the left and right tables are included, and any unmatched rows from
the right table will have NULL values.

Subquery
Also called an inner query, a query that is nested inside another query, so its results can be used as input for the outer query.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 98
Table Alias
An alternative name assigned to a table for use in a query's result set.

Union Operator
An operator that combines the results of two or more SELECT statements and removes duplicate rows.

Venn Diagram
Venn diagrams illustrate logical relationships between two or more sets of items by using overlapping circles or shapes. In many cases, they
serve as a visual way to summarize similar and different aspects of items.

View
In a database, a view is the result set of a stored query, which can be queried like a persistent database collection object.

© 2025 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 99

You might also like