Green-field to Brown-field BI Environments – metadata analysis holds the key September 28, 2006
Posted by Cyril Brookes in BI metadata documentation, General, Metadata management.trackback
Our BI systems are getting on in years. What was an appealing virgin territory to the BI enthusiast has become a difficult pre-teen. Maybe bringing up BI systems is like raising children – every stage is the worst. Client executives are getting more aware of what’s possible as well and, of course, the data is still messy.
But the biggest challenge could well be the need to renovate existing systems, rather than have the luxury of starting afresh. Welcome to the brown-field.
I outlined the metadata complexity issue for BI analysts in my September 21 post. Earlier, in my June 30 post, I described the stages and tasks involved in a bottom-up approach to BI system design. I opined that the bottom-up, or rapid development, methods will remain dominant in the BI systems space for the foreseeable future. Now I am attempting to bring these two issues together.
To recap, with some adjustments to fit my current convictions and practice, the bottom-up approach to BI system development requires answers to the following queries:
1. Where are we now? E.g.
- What is the current, synchronized, state of the data warehouse and BI tables: the SQL database documentation, ETL (covering all the space from source transaction system to data warehouse), cubes and dimensions?
- What reports and query specifications are out there, what do they contain?
- What are the outstanding BI specification requests and user complaints/suggestions?
2. What is stopping us perform better in our BI reporting? E.g.
- Bad source data content?
- Inadequate ETL packages corrupting data warehouse content?
- Under-utilized data?
- Unused, wasted, data collection?
- Unavailable data to satisfy specified requirements?
3. What can we do to perform better? E.g.
- Repair the data sources
- Find and fix the ETL errors and omissions, starting with the most dysfunctional
- Check with the client executive base to see if the under utilized data should be reported more or better
- Review the available data with client executives to improve satisfaction
- Design new data warehouse content and associated reporting where possible and desirable (Of course, in this business everything is desirable, only a few things are essential!)
4. Prioritize the work schedule and get going – probably in a series of iterations with highest/fastest ROI first.
I know this is obvious, motherhood and flag, stuff to the experienced BI analyst. However, it is often not understood – I believe – that we are completely dependent on the stage 1 being done right. We must have comprehensive, accurate, up-to-date, access to the synchronized metadata. Further, we need to be able to explore this metadata, its relationships and sensitivities.
Iterative BI design is impossible if we cannot document the metadata adequately. Green-field sites are all migrating to the evolutionary mode as they age, and metadata is king of the brown-field BI environment. Once we have this documentation and analysis capability the rest of the project is fairly straightforward.
Documenting the relatively narrow BI software context isn’t enough; we must be able to reach out to the source data locations wherever possible.
Plus, passive documentation itself is also short of the full whack. It’s like being in the supermarket without the meal plan and recipes to tell us what we need to buy.
I reckon we need to have a “what-if” capability to allow us to explore the metadata world as documented, checking on data existence, link existence and integrity; and reviewing the Drill-down and Drill-through capabilities.
Of course, this capability is also very useful in the virgin, green-field site as well, but we usually have more flexibility to maneuver then.
There are many software documentation tools on the market, and I leave you, or others, to evaluate them for this purpose. However, it is my view that a more fully featured tool is required than those currently available.
To provide an evaluation yardstick, and to act as a design guide for my specific software design purposes, I’ve put together my checklist for a metadata repository tool that serves BI analysts in both their support and design roles. It also avoids the problems, e.g. the dog’s breakfast syndrome, raised in my last post and has the required “what-if” exploring capability.
My repository checklist is based on the Microsoft products, but others would be similar:
- Take regular snapshots of the relational database contents relevant to BI – Tables, Columns, etc.
- Also, snapshot the Integration Services packages: Data sources, Data Flows, Data destinations, Tasks
- Ditto the Analysis Services tables: Cubes, Dimensions, Report Specifications
- Ditto the Report Services content; Datasources, Datasets, parameters, drill-through reports
- Facilitate HTML and diagrammatic documentation for each of these snapshots as needed, with extended diagramming capability to capture design ideas and extensions.
- Provide a server application based analysis tool that:
- Tracks data lineage from SSRS, through SSAS and SSIS to the relational, legacy or other data source, and displays the transformations that occur on the way.
-
Provides cross component impact analysis capabilities, i.e. what is affected if I change the definition of this field – in the database, in the cube, in the report?
It’s taking a while to build this, but even the current status should be useful to the designer. If you’re interested you can monitor the progress and try it out yourself at www.bidocumenter.com. There is a new release due next week.
Hi Cyril, interesting post (and nice BI blog). Yesterday I joined an in-company presentation of Informatica Data Explorer. This product offers a set of tools to discover and define the content and structure of data structures. It looked quite impressive.
At the moment I am preparing for Microsoft BI certification, because I’m very impressed by the Microsoft BI Suite. I will try your BI Documenter software for sure!
[…] We’ve explored why CRM and Competitive Intelligence isn’t shared in the previous blog post of October 9. A major contribution factor is that few people understand that soft information has metadata, just like the hard stuff. If you come here often you’ll know I’m a fan of BI oriented metadata repositories (to the extent of creating a new one!). […]
I need to know how is this going to effect the community??
And if so what are the effects that are happening now