Previous | Table of Contents | Next |
Although data marts have many benefits, they are not without problems. A data mart has the potential of becoming yet another isolated system. Companies sometimes allow the development of data marts without the proper coordination between them. This coordination is not easy to achieve. Data marts are designed in parallel, with each team making decisions that will later affect the data warehouse. For example, one team may decide on the date format yyyy/mm/dd. This data mart may standardize on that format to the detriment of the data warehouse project as a whole.
Before any data mart development begins, a base data warehouse design should be in place. This design does not need to delve into the details of code for extraction, but it needs to document the current systems. It must do this to establish the standards necessary for the data warehouse project to be successful. Once a data mart team understands the overall guidelines, they can design a data mart that can be easily tied into the data warehouse at a later date.
The data mart team must also decide on the type of data mart. If it is a data mart that will eventually merge directly into the data warehouse (Definition 1), it must use the same technology as the future data warehouse. If this is the case, the data mart team will design extraction processes to work directly with the data mart.
If the data mart will eventually pull data from the data warehouse (Definition 2), the data mart team must design extraction processes to extract the operational data to the data warehouse. Next, the data mart will pull the data from the data warehouse. This will allow the data warehouse to incorporate the data used in the data mart without rewriting extraction processes. This also gives the data mart designer the ability to choose the technology of the data mart, which may differ from the data warehouse technology. This type of data mart requires that a corporate-wide data warehouse team establish data standards and technologies before the data mart project can begin. The data warehouse team will also need to present a complete data model for all data that the data marts will be using.
Metadata is data about data. It is all the information a user or administrator needs to fully utilize the data. The metadata will reveal such things as where to find the data, what data exists on a certain topic, who created the data, where it came from, and who has permission to use it.
Metadata is one of the most neglected parts of the data warehouse. It is similar to documentation in that almost everyone realizes its value, but few actually find time to develop it. Without proper metadata, it can be difficult for users to work with the data warehouse. The three types of metadata in a data warehouse are system, management, and business metadata.
System metadata is data that will not be changed by the data warehouse team. The system data is maintained by the technologies used by the data warehouse. Inside the Oracle database there are virtual system tables such as dba_tables, dba_indexes, and dba_users. These tables describe what type of data is inside the database. This type of metadata is almost always used exclusively by system or database administrators. It is typically too raw to give directly to the users.
Management metadata explains all the details of data management. The following are some examples of management metadata:
One example of management metadata is recording the process of reading the redo logs for the data extraction process. If the team has chosen a vendor to perform the retrieval of data from the redo logs, they should document the functions of the tool that is being used. This will allow the data warehouse team to understand the impact of a new software release by the redo log vendor. If the team has chosen to design its own tool to extract information from the redo logs, it is important to document the format of the redo logs as well as the API of the custom tool.
A second example of management metadata is user activity. This includes recording the CPU and disk I/O of queries issued by the users. This form of metadata allows the data warehouse team to properly bill each department for its usage of the system. It also allows the team to better understand what types of queries are being issued. This will reveal the following:
The data warehouse team must decide if data is not being used because of a lack of interest or a lack of metadata that prevents users from finding or understanding the data. If the users are not interested, this data may be moved to a less expensive media or deleted.
Business metadata is directly pointed to the business users. It allows users to find and understand the data in the data warehouse. Typical business metadata models will contain a search engine. The search engine allows users to look up data sources based on key words. This search engine will return the hits on the specific key word as well as suggest similar terms that may be more appropriate. If the user determines that the key word he chose was incorrect, he can choose one of the words the metadata dictionary has suggested. The user is also given a detailed description of the true meaning of the data and how it should be interpreted.
The business metadata also contains a subset of the information found in the data warehouse management metadata.
Other business metadata includes connections to other data sources such as online documentation, Web sites, and reports on industry and economic trends.
Although metadata is easy to explain, it can be difficult to implement. One reason is a lack of metadata standards. A group called the Metadata Coalition is attempting to standardize this field. They are a group of vendors trying to enhance the sale of all of their products by bringing metadata into the mainstream. The Coalition is developing a metadata standard, but it is still in its infancy.
Previous | Table of Contents | Next |