|
Companies that have been in business for awhile realize they have accumulated
huge amounts of data in various operational databases. Those databases work
just fine for their intended purposes, but the companies want to "mine" that
data for other purposes, particularly for sales, marketing and strategic
planning.
So, Data Mining is the process of extracting information from the
company's various databases and and re-organizing it for purposes other than
what the databases were originally intended for. What data is to be mined and
for what use varies radically from one company to another, as does the nature
and organization of the data, so there can be no such thing as a generic
"data mining tool".
A Data Warehouse is a place where data can be stored for more
convenient mining. This generally will be a fast computer system with very
large data storage capacity. Data from all the company's systems is copied
to the Data Warehouse, where it will be scrubbed and reconciled to remove
redundancy and conflicts. Complex queries can then be make against the
Warehouse information storage.
Of course the data must be continuously refreshed, so the scrubbing and
reconciliation process must be a permanent feature of the Warehouse, and will
have to be modified every time the databases are modified or new databases
become available.
Creating and maintaining a Data Warehouse is a huge job even for the
largest companies. It can take a long time and cost a lot of money.
In fact, it is such a major project companies are turning to Data Mart
solutions instead.
A Data Mart is an index and extraction system. Rather than bring
all the company's data into a single warehouse, the data mart knows what data
each database contains and how to extract information from multiple databases
when asked.
Creating a Data Mart can be considered the "quick and dirty" solution,
because the data from different databases is not scrubbed and reconciled, but
it may be the difference between having information available and not having
it available.
|