Previous | Table of Contents | Next |
Im going to bring up referential integrity here because it will be useful during the exercises as you formulate joins. Dont worry if these details are sketchy or you feel as though theres more going on that I havent provided. You learn more about this on Day 9.
Referential integrity is an important but very simple concept. It has one rule.
Referential integrity exists when all values in foreign key columns have corresponding primary key values.
Thats it.
Referential integrity is useful for ensuring that the columns in your tables and the relationships they share with other tables contain meaningful data. If there were a record in sales for a title_id that didnt exist in titles, what would that mean? (It would mean there was some kind of mistake! Did someone order the wrong title? Is a title missing?) Referential integrity is often enforced with triggers, discussed on Days 18 and 19.
At the beginning of the chapter, I started by telling you that Usually, the server knows nothing about the relationships between tables. The exception to this rule is when declarative referential integrity (DRI) is used. DRI can be defined when a table is created. You will learn more about exactly how to use it on Day 9. The reason I bring it up here is that the pubs database is using declarative referential integrity.
Note: Right now, dont worry too much about the details of referential integrity, declarative or otherwise. Ive skimmed over the basics, but havent discussed all the intricate details of implementing it on your own tables, dealing with it inside application programs, or managing it.Because declarative referential integrity is defined on pubs, you can use it to your advantage when formulating joins.
The real point of my bringing up this topic is so that you can use the defined relationships in pubs to help you join tables together. The sp_help output for a table in pubs will display the keys defined on a table, along with any foreign keys referencing this table. Here is (abbreviated) output for the publishers table; it appears at the bottom of the sp_help output if you want to run this yourself:
sp_help publishers
constraint_type constraint_name constraint_keys ---------------- --------- --------- PRIMARY KEY (clustered) UPKCL_pubind pub_id Table is referenced by ------------------------------------------------ pubs.dbo.employee: FK__employee__pub_id__337FAFDD pubs.dbo.pub_info: FK__pub_info__pub_id__2AEA69DC pubs.dbo.titles: FK__titles__pub_id__0E4E2B2E
The defined primary key for the publishers table, shown first, is pub_id. The server has generated names for each of the rules or constraints that are defined with declarative referential integrity.
The tables that reference publishers are listed in the Table is referenced by section. This section lists the full table name (composed of the database, pubs, the owner, dbo, and the table names themselves, employee, pub_info, and titles). (There is a detailed section on object naming on Day 11.) All you need is that last, familiar part. This is followed by the name of the constraint, which includes the letters FK (for foreign key), the name of the table containing the foreign key, the column name (pub_id), and a system-generated hexadecimal key included for uniqueness. Here, all you need is the column name, pub_id.
This output shows you where to go to find tables that reference publishers. Knowing this lets you directly join publishers to any of those tables. You can use this strategy to identify joins when declarative referential integrity is being used on your tables. If it is not, the server wont know anything about those relationships and cant help you out. Then, youll need either an ER diagram or someone helpful who is familiar with the database.
When two (or more) tables are joined, the server will consider the join keys between the tables, and formulate the most efficient way to generate the joined result set. To do this, it designates a join order for the tables.
The join order is the order in which tables are joined, where one table is designated as the inner table and the other is designated the outer table. Each row in the outer table generates a search for all joining rows in the inner table, so the choice the server makes is important to performance.
Technical Note: Indexes factor heavily into the performance of joins. There is a detailed discussion of indexes on Day 15. Indexes allow the server to find in a few milliseconds what would take many minutes or hours without indexes.Because the tables in pubs are so small, the indexes are rarely used to resolve joins; its faster to simply read all the rows and do the join in memory. If you connect to the bigpubs database on the metis website, though, you can get some indication of join performance.
Generally speaking (and this is oversimplifying a very complex process that the coders of SQL Server have spent years refining), the inner table is the smaller table that will be searched once for every row in the outer table.
Previous | Table of Contents | Next |