BIg questions; Where did all that bad data come from? How do we fix it? April 21, 2006
Posted by Cyril Brookes in BI metadata documentation, General, Metadata management.trackback
We all have bad data, and there’s no such thing as good bad data, but we don’t have to fix it all. Bad data is like curved roads, its not in the nature of the human race to create perfect data and more than it is to build straight roads. But not every road needs to be straight, some never get travelled.
Where should we start fixing bad data? Are we wasting our time on frivolous pursuits correcting data we don’t need? Are we ignoring the data we should be getting right?
If you’ve dipped into my earlier posts, you’ll know I’ve a fixation on requirements definition, or rather the lack of same, when starting out building BI systems. Surely, though, knowing what we want to report, and the manner in which we want to report it, is an essential prerequisite for any data cleansing operation.
It’s pretty clear now that the IT corporate standard approach to BI systems development is driven by weight of marketing effort by key suppliers – playing the “Keep up with the CIO Jones” syndrome to absolute perfection. If you don’t have DW, ETL, Scorecarding, Dashboards, blah you’re not in the game. Data accuracy isn’t an issue at this purchasing point. Its like buying a several packets of blueberries at the supermarket, then finding out they’re all soft and fungicidal when you get home.
The resultant evolving continuum of BI environment is something like:
Buy data warehousing softwareBuy ETL software
Try and figure out what the hell to do with the above
Buy, or implement “included with server”, software to do BI reporting, data mining, data analysis.
Try and figure out what the hell to do with the above [especially since a lot of the blueberries are crook]
Analyze lots of existing reportsSpecify and then create a set of corporate standard KPIs by aggregating everything everybody gets now [regardless of data quality]
Realize how bad the data is. Start a data integrity project.
Create the ETL and computation transforms, cubes and other magic things that should have integrity.
Build new BI systems that replicate the existing reports, with flashy presentation, but poor data.
Wonder why no one is excited after spending all that money – just like no one really wants a jazzed up version of Ben Hur – it was bad enough first time.
Or, more accurately, as Peter Drucker opined: Efficiency is doing things right, effectiveness is doing the right things.
What is wrong? I reckon that the problem can be summed up by the age old IT adage: We’re building solutions looking for problems.
Information delivery is not matched to the business process.
Worse, information presentation is not matched to executive mental processes
Data collection, and integrity expenditure, is not directed at the real, important, useful information needs
Much effort is wasted on cubes, ETL, etc. that are never to be needed.
No effort is spent to identify new information metrics not yet collected, or not yet present in a report.
Get the specifications right first, then find out what data we really need and how to present the information, then collect clean versions of the same. Simple really.
Comments»
No comments yet — be the first.