Previous Table of Contents Next


Using COUNT(*)

One piece of information that people using SQL Server always want is, “How many rows are in my table?” To answer that question, the COUNT(*) query was invented. This counts the number of rows in a table without regard to their null status. Even if all columns for a row contain null, the COUNT(*) will include them. This special case is the only time an aggregate may include a null value.

So to get the number of rows in the titles table, use this query:

      select   count(price)  “Priced  titles”,  count(*)   “Total  titles”
      from     titles
      Priced   titles     Total   titles
      -------------------------------
      16                 18
      (1    row(s)   affected)

This query demonstrates the difference between using COUNT(<column>) and COUNT(*). If you did the exercises at the end of Day 3, you may recall that there are two titles that don’t have price information. Those titles aren’t included in the COUNT(price) calculation.

Using COUNT(DISTINCT …)

There is one more form of the COUNT aggregate. The COUNT(DISTINCT<column>) aggregate will count the individually distinct values in a particular column. You cannot use the regular DISTINCT operator on the results of COUNT, because COUNT returns only a single row, which is, of course, distinct. So to get a count of the different values in a column, this third form of COUNT was developed.

To see how many different prices are in the titles table, use this query:

      select     count(distinct price)
      from    titles
              ----------------
      11

This query displays the number of different prices in the titles table. Like other aggregates, it does not consider nulls.


Warning:  In earlier releases of MS SQL Server, you could ask for a column name and an aggregate value in the same SELECT. For example, you could ask for select price, avg(price) from titles to get a listing of each price followed by the average price of all books. The second column would contain $14.77 for each row.

This feature was commonly misunderstood and confused with the GROUP BY clause. (We’ll get to GROUP BY tomorrow.) The ANSI standard dictates that any non-aggregate, non-constant (that is, a column) that appears in the SELECT list must appear in the GROUP BY for the query to make sense. To comply with this behavior, Microsoft now considers it an error to try a query like this:

select     price,   avg(price)
from       titles

Msg  8118,  Level  16,  State  1:
Column  ‘titles.price’  is  invalid  in  the  select  list  because  it  is
      not  contained  in  an  aggregate  function  and  there  is  no  GROUP
     BY clause

Aggregates with Null

Only the count(*) aggregate will count nulls in its calculation. All others simply do not consider them. This is because the count(*) calculation is operating on complete rows. All other aggregates operate on columns.

Suppose that you have a table (t1) with one column (c1 integer null) that has the integers zero through five in it, plus a single null value, as shown in Table 5.2.

Table 5.2. A simple table.

c1

null
0
1
2
3
4
5

What are the aggregate values for this table? Max(c1) is easy—it’s 5, because that’s the highest value in the column. Min(c1) is 0 because, although nulls might be sorted before zeroes, null values aren’t counted in aggregate evaluation.

The sum of c1 is 0+1+2+3+4+5 = 15.

The average is the sum divided by the number of elements, or 15 / 6, which is 2.5, so you might expect 2.5 to be returned. Because this is an integer column, however, the average must keep the same data type, so the average will actually be returned as 2. Any decimal portion left over is dropped. Also, the average is calculated based on six values in the column, not seven, because the null is not considered.

Both COUNT(c1) and COUNT(DISTINCT c1) return six. There are six distinct, non-null values in the c1 column. A COUNT(*) returns seven, because there are seven rows in the t1 table.


Note:  If you want to be sure that the remainder is not dropped, you can use a simple data type conversion trick:
select  avg(c1  *  1.0)  from  t1

The integer stored in c1 is first multiplied by a floating point 1.0, which means that the data type returned by the average will also be a floating point. This query will return 2.5.


The Aggregate of No Rows

To calculate the average of a set of numbers, you add the numbers together and divide by the number of numbers in the set. What happens if you ask for the average, but your WHERE clause excludes all rows?

In this case, the server calculates the sum of no rows, which is null. To find the average, the server divides null by 0. Any mathematical operation on null will also return null. (If I have some unknown number and add one to it, what is the result? It is also unknown.)

Any time that an aggregate operates on the null set (a set containing no data), that aggregate will return null.


Previous Table of Contents Next
Используются технологии uCoz