Previous Table of Contents Next


Complex Joins As Subqueries

When thinking about which method to use, consider that writing subqueries often takes more work on your part than writing joins to accomplish the same goal. Consider this example, which joins the authors table to the titles table in order to find authors who have written books with “computer” in the title:

select     au_fname, au_lname
from           authors a, titleauthor ta, titles t
where       a.au_id = ta.au_id
                and ta.title_id = t.title_id
              and t.title like “%[Cc]omputer%”
au_lname                              au_fname
---------------------------                  --------------------
O’Leary                               Michael
MacFeather                             Stearns
Green                                  Marjorie
Straight                             Dean
MacFeather                             Stearns
Karsen                                 Livia

(6 row(s) affected)

To develop that query took five lines, with two JOIN clauses and one WHERE condition. Now, consider the subquery approach:

select      au_fname
from         authors
where       au_id in(
             select   au_id
             from     titleauthor
             where    title_id in(
                       select   title_id
                     from     titles
                        where    title like “%[Cc]omputer%”))

You can make your own decision as to which is better, but, as I said before, the join syntax is more my cup of tea.

The most notorious way that the subquery approach can hurt you is with regard to the join order. Remember from yesterday that by providing multiple join orders to the optimizer, you allow the server to make the best available choice (performance-wise) for the inner table. When you write subqueries, you are “hard coding” the join order by forcing the innermost query to run first, followed by the second innermost query, and so on. This could result in a dramatically different query time.

Subqueries and Aggregates

Subqueries are extremely useful for dealing with aggregates. Recall that an aggregate may not be used in a WHERE clause. Without subqueries, it is not possible to ask for books that cost more than the average in a single step:

/*
This code does not work!
*/
select      title
from         titles
where   price > avg(price)
Msg 147, Level 15, State 1
An aggregate may not appear in the WHERE clause unless it is
      in a subquery contained in a HAVING clause or a select list,
       and the column being aggregated is an outer reference.

You can’t get this information by grouping, because that wouldn’t allow for the proper calculation of the whole average. Grouping on title_id, for example, is meaningless because each average will be the average of a single row. However, subqueries let you solve this problem:

select      price,  title
from        titles
where         price >  (select avg(price) from titles)
price     title
------                           ---------------------------------------
19.99     The Busy Executive’s Database Guide
19.99     Straight Talk About Computers
19.99      Silicon Valley Gastronomic Treats
22.95     But Is It User Friendly?
20.00      Secrets of Silicon Valley
21.59      Computer Phobic AND Non-Phobic Individuals: Behavior Variations
19.99      Prolonged Data Deprivation: Four Case Studies
20.95      Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean
14.99     Sushi, Anyone?
(9   row(s)  affected)

The subquery calculates the average price from the titles table. You don’t see this result in the result set because it is used to feed the outer query. (The average is $14.77.) Once this average is obtained, the server compares this value to all the rows in titles and selects those rows whose prices are greater than the average.

In effect, after the subquery has completed, the outer query runs with these values:

select      price, title
from        titles
where      price > $14.77

The HAVING Clause with Subqueries

Remember from Day 6 that the HAVING clause allows you to evaluate rows in an intermediate result set, then makes a determination about which rows to ultimately return. The HAVING clause may include an aggregate if it is contained in the SELECT, which you saw a couple of days ago. All that the HAVING does is compare against a column in the intermediate result set.

The HAVING clause may also include a subquery, for comparison to those aggregates. Suppose you want to know which book types have at least one book priced above the average. To get this information, you could use this HAVING query:

select       type, max(price)
from         titles
group  by  type
having       max(price) > (
                  select avg(price) from titles)
           )
order  by   max(price)
type
------------                 --------------------------
business        19.99
mod_cook        19.99
trad_cook       20.95
psychology      21.59
popular_comp    22.95
(5    row(s)  affected)

In fact, every book type has at least one book priced above the average of $14.77. The UNDECIDED type, which has only null as a price, is excluded from the final list. Because it has no actual prices, the server cannot tell for sure if the UNDECIDED rows satisfy the condition in the HAVING clause.

Let me take each clause in the example and explain what is going on. First, the GROUP BY: each type of book is searched, and the maximum price is obtained, by type. The SELECT list specifies that each type should appear in the result set with its maximum price. This results in six rows: one for each type of book. The UNDECIDED type actually does appear in this intermediate result set, which you can confirm by running this query without the HAVING restriction.

Now the HAVING clause is evaluated. The HAVING contains a subquery, so that subquery runs, returning $14.77 for the average price. There is no GROUP BY in the subquery, so the average for all books is returned. Now, HAVING compares each row in the intermediate result set to see if it is greater than $14.77. All rows pass this test except for the null priced book type. These five rows are returned in the final result set.


Previous Table of Contents Next
Используются технологии uCoz