How to Do Advanced Queries in MySQL
 
 

Advanced queries are a part of every database administrator or developer job. Advanced queries must be handled delicately, because improperly coded SQL or poorly performing SQL can create bugs and application crashes. Advanced queries are typically used for reporting, joining multiple tables, nesting queries, and transaction locking. All of these concepts are covered in this article.

Using Aliases in Your Queries

Aliases let you create a shortcut name for different table options. With aliasing, you can use one letter for a table name instead of typing the full name throughout your entire query. Aliases make queries easier to read, and they make them faster to code for developers.

You have the option to alias either table columns or table names. Aliasing tables is probably the most common shorthand form in everyday queries, but some coders alias columns as well.

If you remember from previous chapters, we typed the full Customer table and column names when we queried more than one table with a WHERE clause. The above query is the same as the following query:

SELECT first_name, last_name, order_id

FROM Customer

INNER JOIN Order on Customer.customer_id = Order.customer_id

WHERE Customer.customer_id = 1;

You need to specify a table name in the WHERE clause to avoid ambiguous column names. Both the Order table and the Customer table have a customer_id column. If you leave out the alias or full table name, the MySQL engine doesn't know which table to use in the WHERE clause. The result is an error from MySQL that indicates ambiguous column names.

To specify tables without being forced to type the full table name for every column, MySQL lets you use aliases. You use aliases with the AS command.

Let's first take a look at table aliasing. Here is an example query.

SELECT first_name, last_name, order_id

FROM Customer AS c

INNER JOIN Order AS o on c.customer_id = c.customer_id

WHERE c.customer_id = 1;

The result from this query is the following result set:

first_name, last_name, order_id

-------------------------------------------------

John   Smith 1

Notice the AS keyword just before the alias name. We've used the letter "c" to create an alias for the table Customer.

If you alias one table, it's best to alias the rest of the tables in your query. We aliased the table Order with the letter "o." If you compare the previous query with the one above it, you can see that the code is cleaner, shorter, and easier to read.

You can also alias column names. This is common when you have aggregate functions in your queries and need to give the results a column name. Let's take the following query as an example:

SELECT COUNT(customer_id) FROM Customer;

The above statement counts the number of customer records in your Customer database. Without an alias name, you rely on MySQL to generate the column name, and it becomes difficult to reference the results in your applications. To overcome this hurdle, you use aliases to define the column name.

Let's modify the previous example to use aliases:

SELECT COUNT(customer_id) AS customer_count FROM Customer;

Now, when we run the above query, the result is the following data set:

customer_count

----------------------------------------------------

3

Aliasing column names makes it much easier to identify aggregate counts and functions within your queries.

Aggregating Your Data

The COUNT function above is just one example of aggregating data. MySQL has numerous functions that make it easier to group, add, average, and count data.

There are five main MySQL internal functions, but you can create your own functions. The five aggregate functions include:

  • COUNT

  • SUM

  • AVG

  • MIN

  • MAX

We covered COUNT in the previous section, so this section will focus on the other four. You'll run into all of them as you create MySQL queries.

The SUM function adds values together. For SUM to work properly, you should only run it on numeric columns. You can run SUM on decimal or integer columns. Any NULL values are excluded from the summation. Consider NULL as a 0 value when you add your column values.

Let's assume we have a total_value column in the Order table. The total_value contains the total amount for the customer's order. We want to know the total amount in revenue for the month of August. We can use the SUM function to total all order revenue for the month.

Take a look at the following SQL query:

SELECT SUM(total_value) AS total_revenue

FROM Order AS o

WHERE o.create_date >= ‘8/1/2015' and o.create_date <= ‘8/31/2015';

We used an alias, so the result has a proper label for the column. The result is the following data set:

total_revenue

------------------------------------------

200.45

Had we not used an alias, the result would be the same except the column name would be chosen by MySQL. You can also remove the WHERE clause, and the SUM function will total all revenue in the entire table.

Suppose you want to add a column to the results that indicated a total for each customer. You want to know the total spent by each customer in the Order table for the month of August. To group values by a specific column, you need to add the GROUP BY statement.

Let's take the following example:

SELECT SUM(total_value) AS total_revenue, customer_id

FROM Order AS o

WHERE o.create_date >= ‘8/1/2015' and o.create_date <= ‘8/31/2015'

GROUP BY customer_d;

The example above groups each summed value with the customer_id column, so you can see how much a customer spent for the month. The resulting data set looks like the following:

total_revenu, customer_id

------------------------------------------------

100.45                        1

100.00                        2

The result shows that customer with the ID of 1 bought 100.45 in product, and the customer with the ID of 2 bought 100.00 in product. The GROUP BY statement groups each customer first, and then the SUM function runs a total for each of those customers. With this example, only customers who bought product in August are included.

Averaging Your Data

When you create reports, you'll need to use the AVG function to display averages. You might want to average the total amount of orders each day, average out total revenue each day, or display the average number of new customers you receive each month. Averages are an important part of writing reports, and luckily MySQL has an AVG function that makes it easy to calculate without writing complicated functions. Just like the SUM function, the AVG function only works with numerical values. NULL values are excluded, but there is more of a trick with AVG. AVG divides the total summed value by the number of columns it finds. When the value is NULL, the record is completely excluded from the calculation. This is a point of confusion for new SQL programmers.

Let's assume you have NULL values in your Order table. You can review your data first by using a SELECT statement. Let's take the following SELECT statement as an example:

SELECT customer_id, total_value

FROM Order

The result displays the following data:

total_value, customer_id

------------------------------------------------

100.00                        1

100.00                        2

100.00                        2

NULL              1

The SELECT statement shows that you have 4 orders, and each order cost the customer $100.00. The fourth order contains a NULL value. When you average these four orders, the AVG function disregards the fourth value, because it's NULL. Let's look at how AVG calculations these values.

SELECT AVG(o.total_value) as avg_sales

FROM Order AS o;

The result is the following:

avg_sales

-------------------------

100.00

If you were expecting 75 as the return value, this is an important aspect to note when working with the AVG function. The total number of records in the calculation is 3 records and not 4. The AVG function just drops the fourth NULL value from the calculation. When working with reports, you might want to include these NULL values. You could tell the MySQL engine to first turn all NULL values to 0, which in turn would include NULL columns in the calculation. Here is an example:

Interested in learning more? Why not take an online MySQL course?

SELECT AVG(IFNULL(o.total_value, 0)) as avg_sales

FROM Order AS o;

Now, let's look at the results.

avg_sales

-------------------------

75.00

Notice that four records were included in the averaging calculation instead of 3, which makes your reports and queries more accurate.

The IFNULL function detects if the column contains a NULL value, and if it does sets it as 0. You can set any value as the default value, but integer values are typically converted to 0 when they contain NULL. You can also use IFNULL with other data types. It's a handy function to create a default value when NULL is detected. Note that the table's physical value is not changed. IFNULL functionality only affects the query output results and not the physical data.

MAX and MIN Functions

In some cases, you need the maximum and minimum values in a data set. You might want to know the smallest order you received for the month and the largest order you processed. The MIN and MAX functions help you find these values.

Again, when you set up your query, you should run a SELECT statement to do a quick review of table structure and data. Let's use the SELECT statement again.

SELECT customer_id, total_value

FROM Order ;

The result displays the following data:

total_value, customer_id

------------------------------------------------

200.00                        1

100.00                        2

300.00                        2

We changed the values from the previous example to better explain these two functions.  Now, let's first run the MAX function using the following SELECT statement.

SELECT MAX(total_value) AS max_value

FROM Order ;

The result is the following data set:

max_value

-------------------------

300.00

Since the largest order is 300.00, the MAX function returns the highest value. The MIN function works similarly. The following SQL statement finds the smallest number in the table:

SELECT MIN(total_value) AS min_value

FROM Order ;

The result is the following data set:

min_value

-------------------------

100.00

Using Advanced JOINs

When you work with long, advanced SQL queries, you'll need multiple JOIN statements. The most advanced JOIN statements are OUTER joins. MySQL includes RIGHT OUTER JOIN statements, and these statements are more difficult to understand.

A LEFT JOIN statement takes all table rows from the left side of the ON command and matches records it finds on the right side of the ON command. If no record is found, the result displays as NULL.

A RIGHT OUTER JOIN is the opposite. This join takes all records from the right side of the ON command, and then attempts to link records on the left side of the ON command.

Let's first take the example to show the LEFT OUTER JOIN statement.

SELECT first_name, order_id FROM Customer

LEFT JOIN Order ON Customer.customer_id = Order.customer_id;

The data result is the following:

first_name, order_id

---------------------------------------

Jake    NULL

John   NULL

Jane   1

Jane   2

Notice that we used shorthand this time and removed the OUTER specification. Using just LEFT or RIGHT in the JOIN statement is enough for MySQL to identify that you want to use an outer join.

Let's now change the JOIN statement to a RIGHT JOIN.

SELECT first_name, order_id FROM Customer

RIGHT JOIN Order ON Customer.customer_id = Order.customer_id;

Take a look at how the data changes when you change join types.

first_name, order_id

---------------------------------------

Jane   1

Jane   2

The difference in data is because the Order table does not have any orders for Jake and John. The statement takes all Order records and matches them with customers. Since there are no records for Jake and John, the entire data set changes to only two records.

This advanced technique is important when you need accurate result sets. While this query only had a difference in two records, large databases have huge changes in results when the wrong join type is used.

Nesting Your JOIN Statements

With complicated reports, you'll need to join multiple tables together. MySQL lets you nest your JOIN statements, so you can join several tables together while still maintaining performance. With nested joins, you join the first table and create an alias for it. The next table is joined to the aliased table, and then results are returned for the merged results. If you have a third table, you can use the first two joined results to then merge with a third table.

Let's take a look at the first SQL statement that only joins two tables.

SELECT c.customer_id, o.order_id

FROM Customer AS c

INNER JOIN Order AS o ON c.customer_id = o.customer_id;

This query displays the following results:

customer_id, order_id

--------------------------------------------

1          2

2          3

2          4

Suppose you want to use this result to then join to your Product table. You can nest your statements to create a second result set.

The following code is an example of a nested JOIN statement.

SELECT p.product_id

(SELECT c.customer_id, o.order_id

FROM Customer AS c

INNER JOIN Order AS o ON c.customer_id = o.customer_id) AS Temp1

JOIN Product AS p ON Temp1.product_id = p.product_id;

The result is the list of products customers ordered and stored in the Order table. The result is the following data set:

product_id

-----------------------------

1

2

2

User Variables

To make your MySQL database truly dynamic for applications, you need to define user variables. User variables are just like any other application variable. You give the variable a name, and then use it when a user passes input. You can also use user variables to store values returned from a table query. User variables let you store dynamic data depending on your input either from the database or application users.

User variables are indicated with an @ symbol. They only hold one value, so even if your queries return multiple values, the user variable only contains the last value found. All other values are overwritten.

Let's look at a simple user variable.

SET @order_id = 2;

SELECT @order_id;

The first statement is the SET command. The SET command tells MySQL to store a value in the variable designated. In this example, we create a variable named @order_id and assign it with a value of 2. The SELECT statement then displays the results. In this example, the value 2 is displayed from the database.

We used the equal sign to assign values to variables using the SET statement, but you must use the := assignment operator when you use the SELECT statement to assign a user variable value.

Let's create two variables and then add them in a SELECT statement.

SET @num1 = 2, @num2 = 3;

SELECT @num3 := @num1 + @num2;

The result data set is the following:

@num3

-------------------

5

The first statement in our SQL code creates two variables named @num1 and @num2. They are assigned the values 2 and 3 respectively.

The second statement creates a third variable named @num3. Notice the := assignment operator. If you use the equal sign, MySQL assumes you are using it as a comparison operator, so make sure you switch between operators depending on your SQL statement.

You can also use user variables with table queries. For instance, you might want to store the order ID for a specific order. Let's look at an example SELECT query.

SELECT @id := order_id

FROM Order

WHERE order_id = 1;

SELECT @id;

The first statement gets the order_id value from the Order table. The last statement displays the result data set.

You can also use user variables in SELECT statements that return more than one result. The variable won't contain all values returned, but it holds the final value that you can use in your stored procedures.

Take a look at the following query:

SELECT order_id FROM Order;

The data set returned is the following result:

order_id

--------------------

1

2

3

Let's take a look at a SELECT query that uses a user variable.

SELECT @id := order_id FROM Order;

Because there are three values returned and user variables only contain one value, MySQL decides to use the final value. Let's see what happens when we return the variable value:

SELECT @id;

And the result is:

@id

----------------

3

One last benefit of user variables in SELECT queries is obtaining the most recent ID inserted into a table. For instance, when a customer places an order, you need to the latest order_id record to update it as the customer pays and reviews the order. If the order_id column is set as an auto-incrementing column, then you can use the LAST_INSERT_ID() function.

Take a look at the following example:

INSERT INTO Order (product_id, customer_id) VALUES (3, 2);

SELECT @id := LAST_INSERT_ID();

The result is the following

@id

----------------

4

 The order_id column is auto-incrementing, so the database uses the last number inserted and increments it by 1. The result is then stored in the @id user variable using the LAST_INSERT_ID() function, which is an internal MySQL function available to programmers and database administrators.

Transactions and Locking

When you use InnoDB, you have the option to use constraints and rollback procedures. This advantage lets you check specific criteria and only commit a transaction when all criteria are met. The process helps keep data integrity throughout your database, so you don't have orphaned records, mismatched data, or other malformed information that could ultimately cause bugs in your applications.

The three main steps for transactional procedures include:

  • Start transaction

  • Commit

  • Rollback

This section discusses how to work with transactions, and helps you understand the process.

Let's look at the following basic transaction example:

START TRANSACTION;

SELECT @id := order_id FROM Order where order_id = 2;

UPDATE Customer_Log SET @order_id = @id WHERE order_id=3;

COMMIT;

The first statement is the beginning of your transaction. Think of the START TRANSACTION statement as the beginning of your verification block. Everything after the START TRANSACTION statement is held in a holding pattern until you tell the database server to commit to it or roll it back. In this example, we decide to COMMIT to the transaction regardless of what happens.

The above statement commits no matter what happens, but we can use the ROLLBACK feature to avoid updating the Customer_Log table depending on certain criteria. For instance, suppose we don't want to update the Customer_Log table if the @id user variable is NULL. If no record is found in the Order table, we want to rollback without committing. Let's add some fault tolerance to our previous statement.

START TRANSACTION;

SELECT @id := order_id FROM Order where order_id = 2;

UPDATE Customer_Log SET @order_id = @id WHERE order_id=3;

IF (@id = NULL)

            ROLLBACK;

ELSE

COMMIT;

END IF

The above statement adds some criteria to the COMMIT statement. If the @id user variable is NULL, then the transaction rolls back without causing any harm to your current table data or structure. This transactional technique is what makes InnoDB valuable over MyISAM tables. If you need this type of checks and balances, InnoDB is the only table type that offers this type of data integrity.

You cannot roll back all statements in SQL. You can't roll back any statements that create, drop or alter tables.

When MySQL sends queries to a table, the table record is locked to avoid "dirty reads." Dirty reads is an uncommon but possible phenomenon where a record is read before it is properly updated by an update or delete query in the queue. MySQL performs locks on records to ensure that two statements don't manipulate a record at the same time. If two records are manipulated at the same time, they cause a deadlock in MySQL.

To avoid deadlines and dirty reads, MySQL has a LOCK TABLES statement. This statement locks the table from other session interaction, so only the current session can manipulate the data. You can lock tables for read or write commands.

The following SQL statement is an example of locking a table for read access only:

LOCK TABLES Order READ;

SELECT COUNT(order_id) AS order_count FROM Order;

The result is that this session can only read data from the Order table. If you attempt to add or change records, MySQL returns an error.

One issue with locking tables is that a session can then only access tables that were locked. Take the previous example. If you tried to run a SELECT statement on the Customer table without locking it first, MySQL returns an error. If you lock one table, you must lock all tables in use during your MySQL session.

Knowing how to work with advanced queries, transactions, and table locking greatly improves your ability to create fast reports with accurate results. If you program for any length of time in MySQL, you'll run into aggregate functions often. These advanced queries and table transactional methods will ensure that your data is accurately reported and data integrity is kept throughout all code modules.