DATA WAREHOUSES, AD HOC QUERY TOOLS AND OTHER WAYS TO DESTROY YOUR
COMPANY
Dr. Paul Dorsey
Dulcian, Inc.
Introduction
Data
warehouses are threatening to become the most overrated hype in the history of
the computer industry. Not since the failed promises of the CASE tools of the
80’s have we seen so many products and so many experts
leap into a field with so few skills, so little training and so little
expertise.
Every
consulting firm in the business claims to have been involved in dozens of
successful data warehouse projects. They have all been creating data warehouses
since before the term was invented. If this is the case and there is so much
expertise, why are most organizations that undertake warehouse projects
dismally disappointed with the results?
The fact
is that data warehouse projects are much harder to complete successfully than
traditional systems development projects. There are several reasons why this is
the case:
For all of
these reasons, data warehouse projects fail much more frequently than
traditional systems development projects. Most presentations, papers and books
on data warehousing are overly optimistic. They tend to paint a rosy picture of
how, by following their method, you can build successful data warehouses. This
paper will describe the less flattering reality of data warehousing. It is a
world of unmet user expectations, wasted time and money, and ruined careers.
Some of the more dramatic reasons for failure will be outlined. The paper will
conclude with the critical success factors associated with data warehouse
projects in the hope that your data warehouse project will not be among the
failed projects discussed here.
Data Warehouse
Development
The
traditional impetus for a data warehouse project is often when a manager goes
to a conference and sees an ad hoc query tool that looks exciting. Next, the
organization sets some criteria (typically technical) and buys a tool. Notice
that the tool has been purchased without gathering any user requirements. If
the organization is reasonably savvy, they will undertake a small pilot
project. Subsequently, they will design and build a major portion of the
enterprise warehouse, usually allowing the DBA’s to
drive the project. The resulting effort is then rolled out to the user
community and management is left wondering why it cost so much and no one is
using it.
Of course,
one problem is that the project was completed without doing careful
requirements analysis. All of the collective wisdom of the last thirty years
about building traditional systems by doing careful requirements analysis is
somehow deemed unnecessary when doing a data warehouse project.
What is a Data Warehouse?
Part of
the root of the problem of failed data warehouse projects is that by calling it
a data warehouse, we have already decided on the solution to a problem that
hasn’t even been well defined. Therefore, I propose redefining what is meant by
a data warehouse.
Traditionally,
data warehousing has meant buying a data warehouse machine, installing a data
warehouse database on it (usually in a star or snowflake architecture) and then
accessing that information with an ad hoc query tool.
Instead,
we should define a data warehouse as being "end user access to
information." Conceptually, a data warehouse is simply the user view of
the data. Physically, a data warehouse can be implemented in any number of
ways:
Until you
have carefully collected user requirements, you can’t intelligently decide
which solution is best.
Why Do We Want a Data
Warehouse?
What does
the concept of the user view of data give us? What it buys us is built around
the notion of iterative querying. Traditional querying involves a user faced
with a decision who determines the information
necessary to make the decision and requests or crafts a report to retrieve that
information. With a data warehouse, we give users the ability to ask for
information from the database. Then, based upon what is returned, more
information is requested. This process can be repeated as many times as
necessary until the user has gathered adequate information to act intelligently
on behalf of the organization. This means that users really can make better
decisions in a carefully thought-out data warehousing environment than in a
traditional systems environment. However, for us as database designers and
developers to deliver this incredible functionality, we need to be able to
understand the types of decisions users make and how they make them.
Tied to
this is the idea that, for the first time, users have the ability to feel as if
they have ownership of the data. This can help users participate, not only in
the way that applications are developed, but also in the way that information
is structured and gathered within the organization. Although subtle, this
concept can have a huge impact on the quality of our databases. When users began taking ownership of applications, user
participation in application development as well as the quality of the
applications created rose dramatically. Imagine the improvement in
database design if users, along with developers and designers, can be
thoughtfully engaged in the development process.
What Can a Data
Warehouse Do?
There are
four different aspects of a data warehouse project, any or all of which can be
included:
These four
aspects are what should drive the definition of the data warehouse. A data
warehouse project should not be defined by its implementation solution. It
should be defined by what is trying to be accomplished.
Why Do Ad Hoc Query
Tools Fail?
Ad hoc query
tools are often regarded as a panacea for satisfying user reporting
requirements. We create objects, give users access to those objects and still
experience project failure. This is due to the fact that users need to be very
well educated in the productive use of an ad hoc query tool. Users need to have
some understanding of the following factors in order to successfully utilize an
ad hoc query tool:
A) Users need to
understand Boolean logic.
A
computer’s perception of "and/or" is fundamentally different from
human perception. One of my favorite examples is to
try to ask a managerial user to craft a correct logical statement to support
the information request: "Return a list of all employees who are neither
from
Having
performed this experiment numerous times with executives at all levels, the
best results I have had are indistinguishable from chance. The first time I ran
the experiment with nine managers. They were asked to select one answer. The
results were that only one of nine selected one of the correct answers. It took
him ten minutes to figure it out and that person had been a
systems professional for eight years. If an experienced systems
professional needs ten minutes to figure out a WHERE clause for a simple query
(and we have evidence that few managers will be able to do this), why do we
expect that the reports that managers are creating using an ad hoc query tool
are the reports they think they are creating?
B) Users need to know
basic SQL and relational theory.
With very
few exceptions, these tools are all obviously front-ends with SQL. If users do
not understand the concept of a table and a foreign key, they cannot be
expected to intelligently use tools with "group by" options on their
main screen.
C) Users need to
understand the ad hoc query tool itself.
We have
done a fairly good job of training users to push the right buttons. I am,
however, reminded of experiments with monkeys in space and piano-playing
chickens.
D) Users need to understand
the structure and nuances of the database being queried.
For
example, in one data warehouse, the sales table had three years of information
in it. The user wanted to find out which department had the best performance
for the last year. The user was not aware that this table needed to be filtered
to get the relevant information. The user assumed that the system would
automatically choose only the current year’s information because that makes
sense.
E) Users need to
understand iterative querying.
The true
impact of a data warehouse on a user is evident if iterative querying is used.
Users don’t do this naturally. They must be trained to think in those terms.
Meta-Layer Set-Up
The
meta-layer is what the user interacts with. In general, the user does not
directly access the tables even in a data warehouse. Either views are created;
or, if an ad hoc query tool is used, the complexities of the data structure are
hidden from the user in the tool itself.
Ad Hoc Query Tool Types
The world
of ad hoc query tools is an order of magnitude more complex than is generally perceived. Not only are there
many tools, but many types of tools. Normally, a warehouse team will only be
familiar with one or two ad hoc query tools, let alone one or two types. A full
discussion of each of these types is beyond the scope of this paper. Each type
of tool has its own appropriate use. Usually, in order to support all user
requirements requires more than one tool from more than one category.
CATEGORY |
TOOLS |
||
Ad Hoc Query |
Oracle Data Query |
Cognos Impromptu |
Business Objects |
End User Reporting |
|
R & R |
|
Power User Reporting |
IQ |
ProReports |
|
OLAP – Server side |
Holos |
Oracle Express |
Micro Strategies |
OLAP – Client side |
Cognos PowerPlay |
Business Objects |
|
Production Reporting |
Developer/2000 |
SQR |
|
EIS |
Forest & Trees |
|
|
DSS |
SAS |
|
|
Implementing a Data
Warehouse Project
The back-end
of a data warehouse project can be implemented various ways:
There is a
spectrum of cost associated with these implementations from the lowest which
involves simply placing views on the production system to the most expensive
involving creating data marts. Paradoxically, the most frequently selected
alternatives are the most expensive. This leads one to wonder whether what most
projects are trying to maximize are development dollars.
The
front-end can be implemented in any one of four ways:
These front-ends follow a spectrum of both risk of failure and from the
safest and cheapest flexible reporting system to the riskiest and most
expensive using a full ad hoc query tool. It is now possible to build a
flexible reporting system quickly and easily. For more detailed information on
this topic, see Joseph P. Strano’s paper on Flexible
Reporting Systems in these proceedings. Such a flexible reporting system can
often replace the need for an expensive ad hoc query tool while providing users
with a safe and friendly environment to generate their own reports where they
are protected from runaway illogical and mis-specified
reports.
Data Migration: The
Hidden Nightmare
Getting
information from a production system is always a complex task. Unfortunately,
the way this is usually done is by using the production system software
(usually COBOL) to build flat file images of the warehouse tables. Even most
existing data migration utilities are built on this model. Performing data migration
this way can be disastrous. In one case I encountered, the population utility
for a 10-table pilot project required 2,000 pages of COBOL code. Often, a far
better solution is to create all of the complex migration scripts within Oracle
on the warehouse side rather than the legacy system side. Very simple legacy
tables can be extracted into Oracle where more complex migration is easier to
accomplish. See Figure 1 below.
Figure 1: Data Migration Strategy
The 2,000
page COBOL script mentioned above done in a similar Oracle system required
approximately 100 pages of code. In addition, we wrote a migration script
generator using Oracle Forms. This generated over 95% of the PL/SQL migration
script automatically.
Top Ten List of Reasons
Why Data Warehouse Projects Fail
All of the
reasons and situations mentioned are real-world examples.
Critical Success Factors
for Building a Data Warehouse
The
following are of primary importance in assuring that a data warehouse project
is successful:
Conclusions
Keeping
all of these factors in mind, the likelihood of creating a successful data
warehouse will be greatly increased.