Join
A SQL join clause combines records from two or more tables in a database. It creates a set that
can be saved as a table or used as it is. A JOIN is a means for combining fields from two tables
by using values common to each. ANSI standard SQL specifies four types of JOIN: INNER,
OUTER, LEFT, and RIGHT. As a special case, a table (base table, view, or joined table) can
JOIN to itself in a self-join.
A programmer writes a JOIN statement to identify the records for joining. If the evaluated
predicate is true, the combined record is then produced in the expected format, a record set or a
temporary table.
Sample tables
Relational databases are often normalized to eliminate duplication of information when objects
may have one-to-many relationships. For example, a Department may be associated with many
different Employees. Joining two tables effectively creates another table which combines
information from both tables. This is at some expense in terms of the time it takes to compute the
join.
In the following tables the DepartmentID column of the Department table (which can be
designated as Department.DepartmentID) is the primary key, while Employee.DepartmentID is
a foreign key.
CREATE TABLE department(
DepartmentID INT,
DepartmentName VARCHAR(20)
);
CREATE TABLE employee (
LastName VARCHAR(20),
DepartmentID INT
);
INSERT INTO department(DepartmentID, DepartmentName) VALUES(31, 'Sales');
INSERT INTO department(DepartmentID, DepartmentName) VALUES(33, 'Engineering');
INSERT INTO department(DepartmentID, DepartmentName) VALUES(34, 'Clerical');
INSERT INTO department(DepartmentID, DepartmentName) VALUES(35, 'Marketing');
INSERT INTO employee(LastName, DepartmentID) VALUES('Rafferty', 31);
INSERT INTO employee(LastName, DepartmentID) VALUES('Jones', 33);
INSERT INTO employee(LastName, DepartmentID) VALUES('Steinberg', 33);
INSERT INTO employee(LastName, DepartmentID) VALUES('Robinson', 34);
INSERT INTO employee(LastName, DepartmentID) VALUES('Smith', 34);
INSERT INTO employee(LastName, DepartmentID) VALUES('John', NULL);
CROSS JOIN
CROSS JOIN returns the Cartesian product of rows from tables in the join. In other words, it will
produce rows which combine each row from the first table with each row from the second table
Example of an explicit cross join:
SELECT *
FROM employee CROSS JOIN department;
Example of an implicit cross join:
SELECT *
FROM employee, department;
Employee.LastNa Employee.Departmen Department.DepartmentN Department.Departmen
me tID ame tID
Rafferty 31 Sales 31
Jones 33 Sales 31
Steinberg 33 Sales 31
Smith 34 Sales 31
Robinson 34 Sales 31
John NULL Sales 31
Rafferty 31 Engineering 33
Jones 33 Engineering 33
Steinberg 33 Engineering 33
Smith 34 Engineering 33
Robinson 34 Engineering 33
John NULL Engineering 33
Rafferty 31 Clerical 34
Jones 33 Clerical 34
Steinberg 33 Clerical 34
Smith 34 Clerical 34
Robinson 34 Clerical 34
John NULL Clerical 34
Rafferty 31 Marketing 35
Jones 33 Marketing 35
Steinberg 33 Marketing 35
Smith 34 Marketing 35
Robinson 34 Marketing 35
John NULL Marketing 35
The cross join does not apply any predicate to filter records from the joined table. Programmers
can further filter the results of a cross join by using a WHERE clause.
Inner join
An 'inner join' is the most common join operation used in applications and can be regarded as
the default join-type. Inner join creates a new result table by combining column values of two
tables (A and B) based upon the join-predicate. The query compares each row of A with each
row of B to find all pairs of rows which satisfy the join-predicate. When the join-predicate is
satisfied, column values for each matched pair of rows of A and B are combined into a result
row. The result of the join can be defined as the outcome of first taking the Cartesian product (or
Cross join) of all records in the tables (combining every record in table A with every record in
table B) and then returning all records which satisfy the join predicate
SQL specifies two different syntactical ways to express joins: "explicit join notation" and
"implicit join notation".
The "explicit join notation" uses the JOIN keyword, optionally preceded by the INNER
keyword, to specify the table to join, and the ON keyword to specify the predicates for the join,
as in the following example:
SELECT *
FROM employee INNER JOIN department
ON employee.DepartmentID = department.DepartmentID;
The "implicit join notation" simply lists the tables for joining, in the FROM clause of the
SELECT statement, using commas to separate them. Thus it specifies a cross join, and the
WHERE clause may apply additional filter-predicates (which function comparably to the join-
predicates in the explicit notation).
The following example is equivalent to the previous one, but this time using implicit join
notation:
SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;
The queries given in the examples above will join the Employee and Department tables using the
DepartmentID column of both tables. Where the DepartmentID of these tables match (i.e. the
join-predicate is satisfied), the query will combine the LastName, DepartmentID and
DepartmentName columns from the two tables into a result row. Where the DepartmentID does
not match, no result row is generated.
Thus the result of the execution of either of the two queries above will be:
Employee.LastNa Employee.Departmen Department.DepartmentN Department.Departmen
me tID ame tID
Robinson 34 Clerical 34
Jones 33 Engineering 33
Smith 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
Note: Programmers should take special care when joining tables on columns that can contain
NULL values, since NULL will never match any other value (not even NULL itself), unless the
join condition explicitly uses the IS NULL or IS NOT NULL predicates.
Equi-join
An equi-join is a specific type of comparator-based join, that uses only equality comparisons in
the join-predicate. Using other comparison operators (such as <) disqualifies a join as an equi-
join. The query shown above has already provided an example of an equi-join:
SELECT *
FROM employee JOIN department
ON employee.DepartmentID = department.DepartmentID;
We can write equi-join as below,
SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;
Natural join
A natural join is a type of equi-join where the join predicate arises implicitly by comparing all
columns in both tables that have the same column-names in the joined tables. The resulting
joined table contains only one column for each pair of equally named columns.
The above sample query for inner joins can be expressed as a natural join in the following way:
SELECT *
FROM employee NATURAL JOIN department;
As with the explicit USING clause, only one DepartmentID column occurs in the joined table,
with no qualifier:
DepartmentID Employee.LastName Department.DepartmentName
34 Smith Clerical
33 Jones Engineering
34 Robinson Clerical
33 Steinberg Engineering
31 Rafferty Sales
PostgreSQL, MySQL and Oracle support natural joins, Microsoft T-SQL and IBM DB2 do not.
The columns used in the join are implicit so the join code does not show which columns are
expected, and a change in column names may change the results
Outer join
An outer join does not require each record in the two joined tables to have a matching record.
The joined table retains each record—even if no other matching record exists. Outer joins
subdivide further into left outer joins, right outer joins, and full outer joins, depending on which
table's rows are retained (left, right, or both).
(In this case left and right refer to the two sides of the JOIN keyword.)
No implicit join-notation for outer joins exists in standard SQL.
Left outer join
The result of a left outer join (or simply left join) for tables A and B always contains all records
of the "left" table (A), even if the join-condition does not find any matching record in the "right"
table (B). This means that if the ON clause matches 0 (zero) records in B (for a given record in
A), the join will still return a row in the result (for that record)—but with NULL in each column
from B. A left outer join returns all the values from an inner join plus all values in the left table
that do not match to the right table.
For example, this allows us to find an employee's department, but still shows the employee(s)
even when they have not been assigned to a department (contrary to the inner-join example
above, where unassigned employees were excluded from the result).
Example of a left outer join (the OUTER keyword is optional), with the additional result row
(compared with the inner join) italicized:
SELECT *
FROM employee LEFT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNa Employee.Departmen Department.DepartmentN Department.Departmen
me tID ame tID
Jones 33 Engineering 33
Rafferty 31 Sales 31
Robinson 34 Clerical 34
Smith 34 Clerical 34
John NULL NULL NULL
Steinberg 33 Engineering 33
Right outer join
A right outer join (or right join) closely resembles a left outer join, except with the treatment of
the tables reversed. Every row from the "right" table (B) will appear in the joined table at least
once. If no matching row from the "left" table (A) exists, NULL will appear in columns from A
for those records that have no match in B.
A right outer join returns all the values from the right table and matched values from the left
table (NULL in the case of no matching join predicate). For example, this allows us to find each
employee and his or her department, but still show departments that have no employees.
Below is an example of a right outer join (the OUTER keyword is optional), with the additional
result row italicized:
SELECT *
FROM employee RIGHT OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNa Employee.Departmen Department.DepartmentN Department.Departmen
me tID ame tID
Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35
Right and left outer joins are functionally equivalent. Neither provides any functionality that the
other does not, so right and left outer joins may replace each other as long as the table order is
switched.
Full outer join
Conceptually, a full outer join combines the effect of applying both left and right outer joins.
Where records in the FULL OUTER JOINed tables do not match, the result set will have NULL
values for every column of the table that lacks a matching row. For those records that do match,
a single row will be produced in the result set (containing fields populated from both tables).
For example, this allows us to see each employee who is in a department and each department
that has an employee, but also see each employee who is not part of a department and each
department which doesn't have an employee.
Example of a full outer join (the OUTER keyword is optional):
SELECT *
FROM employee FULL OUTER JOIN department
ON employee.DepartmentID = department.DepartmentID;
Employee.LastNa Employee.Departmen Department.DepartmentN Department.Departmen
me tID ame tID
Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
John NULL NULL NULL
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35
Some database systems do not support the full outer join functionality directly, but they can
emulate it through the use of an inner join and UNION ALL selects of the "single table rows"
from left and right tables respectively
Self-join
A self-join is joining a table to itself.
Example
A query to find all pairings of two employees in the same country is desired. If there were two
separate tables for employees and a query which requested employees in the first table having
the same country as employees in the second table, a normal join operation could be used to find
the answer table. However, all the employee information is contained within a single large table.
Consider a modified Employee table such as the following:
Employee Table
EmployeeID LastName Country DepartmentID
123 Rafferty Australia 31
124 Jones Australia 33
145 Steinberg Australia 33
201 Robinson United States 34
305 Smith Germany 34
306 John Germany NULL
An example solution query could be as follows:
SELECT F.EmployeeID, F.LastName, S.EmployeeID, S.LastName, F.Country
FROM Employee F INNER JOIN Employee S ON F.Country = S.Country
WHERE F.EmployeeID < S.EmployeeID
ORDER BY F.EmployeeID, S.EmployeeID;
Which results in the following table being generated.
Employee Table after Self-join by Country
EmployeeID LastName EmployeeID LastName Country
123 Rafferty 124 Jones Australia
123 Rafferty 145 Steinberg Australia
124 Jones 145 Steinberg Australia
305 Smith 306 John Germany
For this example:
F and S are aliases for the first and second copies of the employee table.
The condition F.Country = S.Country excludes pairings between employees in different
countries. The example question only wanted pairs of employees in the same country.
The condition F.EmployeeID < S.EmployeeID excludes pairings where the EmployeeID
of the first employee is greater than or equal to the EmployeeID of the second employee.
In other words, the effect of this condition is to exclude duplicate pairings and self-
pairings. Without it, the following less useful table would be generated (the table below
displays only the "Germany" portion of the result):
EmployeeID LastName EmployeeID LastName Country
305 Smith 305 Smith Germany
305 Smith 306 John Germany
306 John 305 Smith Germany
306 John 306 John Germany
Only one of the two middle pairings is needed to satisfy the original question, and the topmost
and bottommost are of no interest at all in this example.