Using INNER and OUTER JOINs in SQL
 
 

Joining Tables

SQL has a JOIN operator that makes querying an external table more efficient. Joining tables provides better performance than using subqueries, so the JOIN operator is preferred over other methods. With the JOIN statement, you can link one or several tables and consolidate SELECT queries into one returned data set. The result is one data set, faster queries, and an easier method of showing data and reports to your users. 

Table Linking and INNER JOIN Conditions

Before you write a JOIN statement, you must understand your table structure. The previous Customer and Order tables are a perfect example of tables with a link. A link contains a column within two tables and the values match between tables. With the Customer and Order tables, the CustomerId column is located in both. The CustomerId values match. If a customer has an entry in the Order table and you delete the customer record from the Customer table, you create an orphaned record. For this reason, most database administrators avoid deleting records and instead opt for updates to existing records. 

We'll take the Customer and Order table, which are listed below. 

Customer

CustomerId

First_name

Last_name

City

State

321

Frank

Loe

Dallas

TX

455

Ed

Thompson

Atlanta

GA

456

Ed

Thompson

Atlanta

GA

457

Joe

Smith

Miami

FL

458

Frank

Doe

Dallas

TX

Order

OrderId

CustomerId

Total

OrderDate

1

321

10

1/2/2014

2

455

40

3/2/2014

3

456

20

3/10/2014

Just browsing the content, you can see that the two CustomerId columns share the same data. Suppose you want to get a list of customers that also have orders with you. The following SQL statement is what you could write.

SELECT c.CustomerId, c.First_name, c.Last_name FROM

Customer c JOIN Order o ON c.CustomerId=o.CustomerId 

Interested in learning more? Why not take an online SQL course?

There are two new SQL techniques in the above statement. First, the tables are aliased. Just like column aliases mentioned in previous chapters, you can also set aliases for tables. Aliases make it easier for you to build SQL statements and reduce the amount of code. Aliases make long SQL statements easier to read and type, because you use a few letters instead of long table names. In this example, the alias for Customer is "c" and the alias for Order is "o." After you alias your tables, you can use this letter or phrase in place of the true table name. 

Next is the JOIN statement. The JOIN statement tells the SQL engine to connect one table with another. It his case, the two tables joined are the customer (the alias "c") and the order table (the alias "o"). The two tables are joined on the CustomerId fields. These fields are named the same in both tables, but you can have some table designs that link columns with different names. When designing tables, most database administrators use similar or the same column names where developers join data. 

The JOIN keyword comes after the initial table name, the "ON" keyword is next with the linked columns. Notice that the link uses the table aliases. The result is the following data set. 

CustomerId

First_name

Last_name

321

Frank

Loe

455

Ed

Thompson

456

Ed

Thompson

The data set only contains three records, because the JOIN operator only has three records to link to. When using JOIN, you're using an INNER JOIN. An INNER JOIN gets all records from the first table and then links to the second table. If a matching record is not found, the record is removed from the data set.  Since there are only three records in the Order table, only three records are returned. 

The other issue with the JOIN statement is duplicate records. Let's add another order to the Order table and give it a CustomerId that matches a previous record. In other words, suppose your customer created two orders, so you have two orders with the same customer ID value in your table. This is very likely to happen in a real-world scenario, so you should account for duplicate records in a JOIN. 

Let's add a record to the Order table. 

OrderId

CustomerId

Total

OrderDate

1

321

10

1/2/2014

2

455

40

3/2/2014

3

456

20

3/10/2014

4

456

50

4/2/2014

All of the data is different except for the CustomerId field, which has the value of 456 just like the previous record. Now, when you run the same query with the JOIN statement, the following records are produced. 

CustomerId

First_name

Last_name

321

Frank

Loe

455

Ed

Thompson

456

Ed

Thompson

456

Ed

Thompson

Notice that the same record is shown twice. The reason is because the same customer made two orders, so the JOIN statement links to the same customer twice.

To fix the issue with this particular query, you can add the DISTINCT keyword. The following SQL statement will list each customer only once.

SELECT DISTINCT c.CustomerId, c.First_name, c.Last_name FROM

Customer c JOIN Order o ON c.CustomerId=o.CustomerId

You can also add the WHERE clause to your JOIN statements. For instance, you might want to get a list of orders from the month of March. If you remember, we used the IN statement with a subquery in previous chapters. You can streamline your SQL code instead and use a JOIN statement with the WHERE clause. The following statement searches for customers with an order in March.

SELECT DISTINCT c.CustomerId, c.First_name, c.Last_name FROM

Customer c JOIN Order o ON c.CustomerId=o.CustomerId

WHERE o.OrderDate BETWEEN ‘3/1/2014' AND ‘3/31/2014'

Notice the alias name is specified in the WHERE clause. When you start joining tables, you must include the alias name with the columns you want to return, the columns you join to, and the WHERE clause columns. If you don't include alias names, the SQL engine will return an error that you have more than one column with the same name. For instance, the following code returns an "ambiguous column name" error.

SELECT DISTINCT CustomerId, First_name, Last_name FROM

Customer c JOIN Order o ON c.CustomerId=o.CustomerId

WHERE OrderDate BETWEEN ‘3/1/2014' AND ‘3/31/2014'

In the first part of the SELECT statement, CustomerId is located in both the Order and Customer tables. The SQL engine doesn't know which column to return, so it returns an error. 

OUTER JOIN Statements

The SQL language also has an OUTER JOIN clause that you can use instead of an INNER JOIN. Other than technical differences, the results of an OUTER JOIN are different than those of an INNER JOIN and the different data sets you get for each statement are your main concern as a beginner. 

Remember with the INNER JOIN, the SQL engine excluded customer records that did not have a matching order record. With an OUTER JOIN, you return all records from the outer join and any matching records from the following tables. Let's look at the following OUTER JOIN statement. 

SELECT c.CustomerId, c.First_name, c.Last_name FROM

Customer c OUTER JOIN Order o ON c.CustomerId=o.CustomerId 

The OUTER keyword is specified in front of the JOIN statement. The result from the above SQL statement is the following. 

CustomerId

First_name

Last_name

321

Frank

Loe

455

Ed

Thompson

456

Ed

Thompson

456

Ed

Thompson

457

Joe

Smith

458

Frank

Doe

The OUTER JOIN query tells SQL to take all records from the Customer table and link the data with the Order table. Notice again that the customer with an ID of 456 is listed twice. Even though there is no link for customers 457 and 458 with the Order table, the record is still returned. 

Adding Conditions to Your Joined Tables

Standard JOIN conditions ensure that you have a link between tables, but the SQL language also lets you include conditions with your JOIN statements. These conditions are similar to a WHERE clause except you place them with the JOIN instead of in the WHERE clause. Consider the following SQL statement.

SELECT c.CustomerId, c.First_name, c.Last_name FROM

Customer c OUTER JOIN Order o ON c.CustomerId=o.CustomerId AND o.OrderDate > ‘3/1/2014'

The JOIN condition is similar to the WHERE clause and returns the same records as if you put the condition into a WHERE clause. The above statement tells the SQL engine only to return records from the Order table that have an OrderDate older than 3/1/2014.

In this example, the same number of values are returned if you put the same statement in the WHERE clause. When you have multiple JOIN statements, you limit the records returned by the JOIN, so you receive a different record set than what you would normally get from our sample data. When you build your JOIN statements and place conditions on the JOINs, consider the JOIN condition as a filter for your inner table, which passes on the filtered data to the outer, main SQL statement.

Aggregate Functions and JOIN Statements

The SQL language lets you use any function or standard SQL condition with a JOIN statement as long as you follow SQL syntax rules. Remember, you need to return distinct records with your grouping, so aggregate functions must include columns in a GROUP BY clause. The following SQL statement is an example of using an aggregate function with a JOIN clause. 

SELECT COUNT(*), c.City FROM

Customer c JOIN Order o ON c.CustomerId=o.CustomerId

GROUP BY c.City

The above example SQL query is a bit different than former GROUP BY statements. The GROUP BY column (city) is given the alias prefix, and the count includes the Order table. With the above statement, the JOIN is an INNER JOIN that returns only 4 records. The records are counted and grouped by City, but the COUNT function is only able to count and group records returned by the main INNER JOIN, which does not include records where there is no match in the Order table.

You can combine JOIN statements with any SQL SELECT statement. You can even put JOIN statements in SELECT subqueries. Using JOIN is faster and more efficient than using subqueries, so these statements make your SQL code faster. They also make them more efficient by combining two SELECT queries into one.