Previous | Table of Contents | Next |
To use the ORDER BY clause with the GROUP BY, use it after the GROUP BY. However, when you order your grouped result set, you must order by a column that appears in the results.
This is because the rows in your result set are calculated, aggregate rows. Consider this output, from the first example:
select type, avg(price) Average from titles group by type
type Average ------- ------------------------ UNDECIDED (null) business 13.73 mod_cook 11.49 popular_comp 21.48 psychology 13.50 trad_cook 15.96
To ORDER BY title, for instance, would make no sense. In fact, if you attempt to do so, the server returns this error:
select type, avg(price) Average from titles group by type order by title
Msg 8127, Level 16, State 1 Column name titles.title is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.
The averages that appear in this result set are not an actual part of the table. You have two ordering options here: you could order by type to alphabetically sort the categories, or you could order by the averages themselves:
select type, avg(price) Average from titles group by type order by Average
type Average ------- ------------------------ UNDECIDED (null) mod_cook 11.49 psychology 13.50 business 13.73 trad_cook 15.96 popular_comp 21.48
When the server receives this query, it calculates the sum of prices for each type of book and a count of the number of titles in each type. These results are placed into a temporary worktable that the server creates behind the scenes and destroys after solving the query. Once the worktable has all the results, the server handles the division of sum and count and sorts the result by this calculated Average column.
I could have asked for ORDER BY avg(price) to achieve the same result, but in this case I used a column alias and named that in the ORDER BY. I also could have used ORDER BY type.
Lets take a look at another grouping example. This time, well group by multiple columns.
The sales table has a qty column to indicate the numbers of books ordered, a stor_id column to indicate which store ordered the books, and an ord_num column (order number).
The purpose of grouping with the GROUP BY clause is to calculate aggregates from rows that share a common value. In the sales table, a store that places an order and orders several different books has multiple lines in the table. For example, for order number P2121, store #7067 ordered 40 TC3218s, 20 TC4203s, and 20 TC7777s.
Note: To see a complete list of these orders, use this query against the sales table:select stor_id, ord_num, qty from sales order by stor_id, ord_num, qty
If you want a report that listed the total quantity of books ordered on a particular order number (in this case it would be 40 + 20 + 20 = 80), together with the store id, you use this query:
select stor_id, ord_num, sum(qty) Books Ordered from sales group by stor_id, ord_num
stor_id ord_num Books Ordered ------ --------------------- --------- 6380 6871 5 6380 722a 3 7066 A2976 50 7066 QA7442.3 75 7067 D4482 10 7067 P2121 80 7131 N914008 20 7131 N914014 25 7131 P3087a 85 7896 QQ2299 15 7896 TQ456 10 7896 X999 35 8042 423LL922 15 8042 423LL930 10 8042 P723 25 8042 QA879.1 30 (16 row(s) affected)
GROUP BY groups rows together based on their sharing a common value among one or more columns. In this exercise, you asked for rows sharing stor_id and ord_num. For each group, the SUM of (qty) was calculated under the Books Ordered column.
Starting with MS SQL Server 6.5 the CUBE and ROLLUP extensions were available. These are used with the GROUP BY clause and allow you to calculate a super-aggregate on columns for which regular aggregates are being found. These operators are useful because they provide information on non-grouped columns without requiring you to run an additional query. They also include the information as a normal (that is, not a COMPUTE) row in the result set.
Previous | Table of Contents | Next |