Previous | Table of Contents | Next

Page 35

Normalization Rule 2: All Columns Depend Only on
the Primary Key

For an entity to be in Second Normal Form (2NF), all of its columns must depend on the primary key only. Put simply, this rule means that a table must not contain extraneous information. For example, in addition to other information, the Class table identifies each course and the instructor for that course. The instructor is identified by the column Instructor ID. The primary key for the table is the Class ID. If the Class table also contained the instructor's position, the table would not be in 2NF; the instructor's position depends on Instructor ID only, not on the primary key.

Normalization Rule 3: All Columns Depend on the
Primary Key and Nothing But the Primary Key

To be in Third Normal Form (3NF), a table's columns must be entirely dependent on the primary key. The key word in that last sentence is entirely. Each column in the table must be dependent on the entire primary key, not just a portion of it.

For example, the Course table identifies each course and the department that offers it. The primary key consists of the Department ID and the Course ID; both columns are necessary to uniquely define a row in the table. If the Course table also contained a column for storing the department chairperson, the table would not be in 3NF; the department chairperson depends only on the Department ID, not on the entire primary key. So it can be said that Department Chairperson is dependent on part of the primary key, but it is not wholly dependent on the primary key.

You'll sometimes see this concept referred to as the derived column. In the previous example, Department Chairperson can be derived from Department ID. According to relational theory, a table should not contain any derived columns. In practice, tables frequently contain derived columns.

Applying Normalization to Database Design

Normalization theory discusses normal forms beyond the Third Normal Form (3NF), but I won't delve into that topic here. However, I would like to discuss the application of normalization theory. If you read articles about relational database technology, you will encounter what can be called "the great debate." Relational purists say that all tables must be in at least 3NF, although practitioners argue that to achieve acceptable performance, you must denormalize a database—in other words, reduce the database design from 3NF to 2NF. My position is somewhere in the middle. Here are my recommendations.

Page 36

DO DON'T
Do ensure that your design is in 3NF.

Don't make any assumptions about poor performance. Generate realistic test data, and characterize the performance of your database.

Do try to solve performance problems with hardware improvements rather than with database design compromises. Realize that, in the long run, better hardware is usually less expensive than a denormalized database design.

Don't denormalize your tables unless you fully understand the trade-offs that you are making—improved query performance for more data redundancy, more complicated update logic, and more difficulty in augmenting the database design during the application's life cycle.

A schema is the set of database objects—tables, columns, primary keys, foreign keys, and other objects—that implement a logical data model.

Entity Relationship Diagramming Tools

To assist you in the development of a logical data model, you should seriously consider obtaining an entity relationship (ER) modeling tool such as Designer/2000, Oracle Data Designer, LogicWorks ERwin, or Sybase S-Designer. All of these tools are available for Windows NT or Windows 95. Briefly, these tools enable you to construct an entity-relationship diagram by selecting objects from a toolbar and drawing relationships between them.

The advantages of these tools are as follows:

Page 37

Figure 3.1.
Using Oracle Data-
base Designer to
develop a data model.

The Sample Application Database

The example used throughout this book is a database for a small, fictitious college. In 1884, Reginald Flugle, a wealthy manufacturer of cuff links and tie pins in the northern high desert of Southern California provided a sizable endowment to a small, dust-ridden college. Until now, the administration of Flugle College has been a paper-based system. The college dean has awarded you a contract to develop a computerized information system based on a client/server architecture.

Requirements

The high-level requirements for the Flugle College Information System (or FCIS as it will be known), include the following:

Page 38

Using these high-level requirements, the following are the entities needed to support the information needs of Flugle College:

The next sections look at each table in greater detail.

The Student Table

The purpose of the Student table is to maintain information about each student at Flugle College. Each student is identified by a Student ID—the table is defined so that two students cannot have the same Student ID. In addition to the student's name, address, and other pertinent information, the Student table also contains an attribute named Year which indicates the year of a student—for example, freshman. In this respect, the sample database may be somewhat unrealistic; it's probably more important to keep track of a student's accumulated units than the year they are in. Another note: this table doesn't indicate the student's status—for example, active, leave of absence, inactive. There is no attribute to indicate if a student has received a degree. Nevertheless, the attributes shown in the following Student table list provide sufficient realism for our purposes.

Student

Student ID
Last Name
First Name
Middle Initial
Street Address
City
State
Zipcode

Previous | Table of Contents | Next

Используются технологии uCoz