Companies seek to use the information built up by daily operations for market research and marketing.




Companies that have been in business for awhile realize they have accumulated huge amounts of data in various operational databases. Those databases work just fine for their intended purposes, but the companies want to "mine" that data for other purposes, particularly for sales, marketing and strategic planning.

So, Data Mining is the process of extracting information from the company's various databases and and re-organizing it for purposes other than what the databases were originally intended for. What data is to be mined and for what use varies radically from one company to another, as does the nature and organization of the data, so there can be no such thing as a generic "data mining tool".

A Data Warehouse is a place where data can be stored for more convenient mining. This generally will be a fast computer system with very large data storage capacity. Data from all the company's systems is copied to the Data Warehouse, where it will be scrubbed and reconciled to remove redundancy and conflicts. Complex queries can then be make against the Warehouse information storage.

Of course the data must be continuously refreshed, so the scrubbing and reconciliation process must be a permanent feature of the Warehouse, and will have to be modified every time the databases are modified or new databases become available.

Creating and maintaining a Data Warehouse is a huge job even for the largest companies. It can take a long time and cost a lot of money. In fact, it is such a major project companies are turning to Data Mart solutions instead.

A Data Mart is an index and extraction system. Rather than bring all the company's data into a single warehouse, the data mart knows what data each database contains and how to extract information from multiple databases when asked.

Creating a Data Mart can be considered the "quick and dirty" solution, because the data from different databases is not scrubbed and reconciled, but it may be the difference between having information available and not having it available.

