Previous | Table of Contents | Next |
Using COUNT(*)
One piece of information that people using SQL Server always want is, How many rows are in my table? To answer that question, the COUNT(*) query was invented. This counts the number of rows in a table without regard to their null status. Even if all columns for a row contain null, the COUNT(*) will include them. This special case is the only time an aggregate may include a null value.
So to get the number of rows in the titles table, use this query:
select count(price) Priced titles, count(*) Total titles from titles
Priced titles Total titles ------------------------------- 16 18 (1 row(s) affected)
This query demonstrates the difference between using COUNT(<column>) and COUNT(*). If you did the exercises at the end of Day 3, you may recall that there are two titles that dont have price information. Those titles arent included in the COUNT(price) calculation.
Using COUNT(DISTINCT )
There is one more form of the COUNT aggregate. The COUNT(DISTINCT<column>) aggregate will count the individually distinct values in a particular column. You cannot use the regular DISTINCT operator on the results of COUNT, because COUNT returns only a single row, which is, of course, distinct. So to get a count of the different values in a column, this third form of COUNT was developed.
To see how many different prices are in the titles table, use this query:
select count(distinct price) from titles
---------------- 11
This query displays the number of different prices in the titles table. Like other aggregates, it does not consider nulls.
Warning: In earlier releases of MS SQL Server, you could ask for a column name and an aggregate value in the same SELECT. For example, you could ask for select price, avg(price) from titles to get a listing of each price followed by the average price of all books. The second column would contain $14.77 for each row.This feature was commonly misunderstood and confused with the GROUP BY clause. (Well get to GROUP BY tomorrow.) The ANSI standard dictates that any non-aggregate, non-constant (that is, a column) that appears in the SELECT list must appear in the GROUP BY for the query to make sense. To comply with this behavior, Microsoft now considers it an error to try a query like this:
select price, avg(price) from titles Msg 8118, Level 16, State 1: Column titles.price is invalid in the select list because it is not contained in an aggregate function and there is no GROUP BY clause
Only the count(*) aggregate will count nulls in its calculation. All others simply do not consider them. This is because the count(*) calculation is operating on complete rows. All other aggregates operate on columns.
Suppose that you have a table (t1) with one column (c1 integer null) that has the integers zero through five in it, plus a single null value, as shown in Table 5.2.
c1 | |
---|---|
null | |
0 | |
1 | |
2 | |
3 | |
4 | |
5 | |
What are the aggregate values for this table? Max(c1) is easyits 5, because thats the highest value in the column. Min(c1) is 0 because, although nulls might be sorted before zeroes, null values arent counted in aggregate evaluation.
The sum of c1 is 0+1+2+3+4+5 = 15.
The average is the sum divided by the number of elements, or 15 / 6, which is 2.5, so you might expect 2.5 to be returned. Because this is an integer column, however, the average must keep the same data type, so the average will actually be returned as 2. Any decimal portion left over is dropped. Also, the average is calculated based on six values in the column, not seven, because the null is not considered.
Both COUNT(c1) and COUNT(DISTINCT c1) return six. There are six distinct, non-null values in the c1 column. A COUNT(*) returns seven, because there are seven rows in the t1 table.
Note: If you want to be sure that the remainder is not dropped, you can use a simple data type conversion trick:select avg(c1 * 1.0) from t1The integer stored in c1 is first multiplied by a floating point 1.0, which means that the data type returned by the average will also be a floating point. This query will return 2.5.
To calculate the average of a set of numbers, you add the numbers together and divide by the number of numbers in the set. What happens if you ask for the average, but your WHERE clause excludes all rows?
In this case, the server calculates the sum of no rows, which is null. To find the average, the server divides null by 0. Any mathematical operation on null will also return null. (If I have some unknown number and add one to it, what is the result? It is also unknown.)
Any time that an aggregate operates on the null set (a set containing no data), that aggregate will return null.
Previous | Table of Contents | Next |