Green-field to Brown-field BI Environments – metadata analysis holds the key September 28, 2006
Posted by Cyril Brookes in BI metadata documentation, General, Metadata management.3 comments
Our BI systems are getting on in years. What was an appealing virgin territory to the BI enthusiast has become a difficult pre-teen. Maybe bringing up BI systems is like raising children – every stage is the worst. Client executives are getting more aware of what’s possible as well and, of course, the data is still messy.
But the biggest challenge could well be the need to renovate existing systems, rather than have the luxury of starting afresh. Welcome to the brown-field.
I outlined the metadata complexity issue for BI analysts in my September 21 post. Earlier, in my June 30 post, I described the stages and tasks involved in a bottom-up approach to BI system design. I opined that the bottom-up, or rapid development, methods will remain dominant in the BI systems space for the foreseeable future. Now I am attempting to bring these two issues together.
To recap, with some adjustments to fit my current convictions and practice, the bottom-up approach to BI system development requires answers to the following queries:
1. Where are we now? E.g.
- What is the current, synchronized, state of the data warehouse and BI tables: the SQL database documentation, ETL (covering all the space from source transaction system to data warehouse), cubes and dimensions?
- What reports and query specifications are out there, what do they contain?
- What are the outstanding BI specification requests and user complaints/suggestions?
2. What is stopping us perform better in our BI reporting? E.g.
- Bad source data content?
- Inadequate ETL packages corrupting data warehouse content?
- Under-utilized data?
- Unused, wasted, data collection?
- Unavailable data to satisfy specified requirements?
3. What can we do to perform better? E.g.
- Repair the data sources
- Find and fix the ETL errors and omissions, starting with the most dysfunctional
- Check with the client executive base to see if the under utilized data should be reported more or better
- Review the available data with client executives to improve satisfaction
- Design new data warehouse content and associated reporting where possible and desirable (Of course, in this business everything is desirable, only a few things are essential!)
4. Prioritize the work schedule and get going – probably in a series of iterations with highest/fastest ROI first.
I know this is obvious, motherhood and flag, stuff to the experienced BI analyst. However, it is often not understood – I believe – that we are completely dependent on the stage 1 being done right. We must have comprehensive, accurate, up-to-date, access to the synchronized metadata. Further, we need to be able to explore this metadata, its relationships and sensitivities.
Iterative BI design is impossible if we cannot document the metadata adequately. Green-field sites are all migrating to the evolutionary mode as they age, and metadata is king of the brown-field BI environment. Once we have this documentation and analysis capability the rest of the project is fairly straightforward.
Documenting the relatively narrow BI software context isn’t enough; we must be able to reach out to the source data locations wherever possible.
Plus, passive documentation itself is also short of the full whack. It’s like being in the supermarket without the meal plan and recipes to tell us what we need to buy.
I reckon we need to have a “what-if” capability to allow us to explore the metadata world as documented, checking on data existence, link existence and integrity; and reviewing the Drill-down and Drill-through capabilities.
Of course, this capability is also very useful in the virgin, green-field site as well, but we usually have more flexibility to maneuver then.
There are many software documentation tools on the market, and I leave you, or others, to evaluate them for this purpose. However, it is my view that a more fully featured tool is required than those currently available.
To provide an evaluation yardstick, and to act as a design guide for my specific software design purposes, I’ve put together my checklist for a metadata repository tool that serves BI analysts in both their support and design roles. It also avoids the problems, e.g. the dog’s breakfast syndrome, raised in my last post and has the required “what-if” exploring capability.
My repository checklist is based on the Microsoft products, but others would be similar:
- Take regular snapshots of the relational database contents relevant to BI – Tables, Columns, etc.
- Also, snapshot the Integration Services packages: Data sources, Data Flows, Data destinations, Tasks
- Ditto the Analysis Services tables: Cubes, Dimensions, Report Specifications
- Ditto the Report Services content; Datasources, Datasets, parameters, drill-through reports
- Facilitate HTML and diagrammatic documentation for each of these snapshots as needed, with extended diagramming capability to capture design ideas and extensions.
- Provide a server application based analysis tool that:
- Tracks data lineage from SSRS, through SSAS and SSIS to the relational, legacy or other data source, and displays the transformations that occur on the way.
-
Provides cross component impact analysis capabilities, i.e. what is affected if I change the definition of this field – in the database, in the cube, in the report?
It’s taking a while to build this, but even the current status should be useful to the designer. If you’re interested you can monitor the progress and try it out yourself at www.bidocumenter.com. There is a new release due next week.
The BI Metadata Enigma – where did these data come from; and where have they been? September 21, 2006
Posted by Cyril Brookes in BI metadata documentation, General, Metadata management.1 comment so far
These are the kinds of questions we ask our children – where were you, what happened last night? And to the putative, hopeful, BI support analyst and developer, the DW content, the cubes, the dimensions; they are our children. Strangely, the answers are just as hard to get on metadata as with the kids, or at least the right up-to-date answer.
The issue is going to get much worse. Why? Because we are rapidly moving from BI “Greenfield” design situations to “Brownfield” support and re-design environments. We’re stuck with the treadmill of history as well as client executives who want better BI.
There are two different sets of metadata related problems here:
- Support of operational BI systems, and
- Design of new or improved BI
But, they both have the same solution.
Everything is related to the BI metadata, sql database documentation and associated BI development system tables, cubes, etc. This is a problem to us all because:
- Metadata is not in one place – it’s a real dog’s breakfast; even when all the BI tools come from the same vendor
- It is dynamic, so we need to be able to monitor the sql documentation to see what the DW architects have done since last time we looked
- Comprehensive commercial repositories, at least those available today, are cumbersome and not kept current
- Synchronization of the metadata component documentation is super-important to the analyst – without it your project must surely fail.
- BI design is not concerned with much of the metadata available for a database, the focus is on the business related items and not the process type – schedules, update counts, etc. Often, the unrelated stuff dilutes that which is important.
- Metadata standards are too complex, and not a useful categorization tool for BI analysts at present
- Essentially, the issues for BI analysts are: What is in the current DW, the cubes, KPIs, tables, columns, dimensions, reports, etc.; and where did they come from. Most database and BI software analysis tools are inadequate for this purpose.
Support analysts find it difficult to track down the sources of bad data. Clients complain about inaccurate reports, but why is it so? We need, indeed must have, better BI related metadata repositories. We don’t want all the crap, pardon my expletive, just the business related stuff, so we can find out …..Where it came from and where has it been – to make such lousy, and inscrutable, content.
That’s probably enough stating the issue for support analysts.
Next post I’ll detail how these same issues impact the BI developer, especially the person trying to improve on an existing BI system.
I’ll then outline my proposed solution, a specialist BI metadata repository tool.
Knowledge Management vocabularies for tacit information processing; creation issues, scalability and auto-categorization September 14, 2006
Posted by Cyril Brookes in General, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies.add a comment
My August 5 post introduced the topic of KM vocabularies and their essential role in building a balanced BI reporting environment – one that delivers both hard and tacit (soft) information This post offers more detail on vocabulary construction and practical use. These guidelines and suggestions are based on my experience building hundreds of vocabularies in KM implementations.
The vocabulary is important since it controls the categorization, retrieval and dissemination of documents. Without it there is little prospect of meaningful collaboration on important issues in the enterprise.
Creation Issues:
Automatic creation of a prototype vocabulary is a common starting point; using widely available text analysis systems. These systems process the content of many documents and compile a categorization list based on keywords, or more sophisticated contextual analysis.
These automatic systems have varying degrees of success in creating the hierarchies a KM vocabulary requires, identifying synonyms and especially determining the context (for example differentiating between alternative meanings of “heat” -
- A batch of steel being produced;
- The agent increasing temperature; and
- Something female dogs exhibit.
Orphan topics also introduce indexation and classification and subsequent retrieval. Orphans are those terms that are unrelated to others in a hierarchy, that is no parents, or children. They ought be avoided.As highlighted in my earlier post, synonym processing is not desirable. All terms in the vocabulary should be “preferred” terms, and should become the universal identifiers for subject matter of documents, messages, etc. Synonyms should be handled in the auto-classification stage, where various common usage topics are converted to the preferred term for use in retrieval, etc.Automatic analysis of newly arrived documents (after the initial vocabulary is created), without reprocessing and reorganizing the entire collection of documents, may be difficult. So, the automatic procedure for building a vocabulary will often be useful only once, at the start of the exercise. Thereafter, the update most probably have to be manual.To facilitate vocabulary navigation it is necessary to embed higher level parent terms in the hierarchy. Manual editing is required here since the automatic process will not do this, and, in any case, the vernacular appropriate to the enterprise ought be used, not some standard industry or linguistic term.
A general purpose vocabulary may be useful testing platform, particularly if the information sources are news and other external sources. Internal documents tend to require a more enterprise and industry specific topic list. Similarly, a test platform can be built easily from an industry oriented set of KPIs, metrics or measures. These will be a subset of a complete vocabulary.
This is often the most effective starting point for creating an enterprise specific vocabulary and the associated rules for auto-classification.
All business oriented KM vocabularies are organic and will, therefore, evolve as the business interests and issues change. Normally, in a well designed vocabulary, these changes only involve the third and fourth levels of the hierarchy (new customers, competitors, mergers with same, new products, technologies, etc.), occasionally the first and second (a takeover creates a new business segment, a new class of customers is created to assist vocabulary navigation, etc.). Evolution is normally best achieved by manual adjustment, with suggestions being made by users as they encounter inadequate terminology or rules.Templates:Notwithstanding the industry focus on automatic construction of vocabularies by analysis of a pool of documents, my preference is to build a set of industy template vocabularies, and to create new versions by modifying earlier ones. This is because the terminology relevant to KM is very similar across organizations in the same industry, and similar, especially at the higher levels, across all businesses with similar operations, irrespective of industry.
Therefore, banks will have almost identical vocabularies, except at the third and fourth levels of detail, and an insurance company vocabulary will be quite similar to that of a bank.
Issues of scalability:
400 to 2000 topics is the common size range for useful BI vocabularies. Smaller than 400 is unlikely to provide sufficient granularity in categorization to satisfy inquiries. More than 2,000 terms will compromise navigation for complexity reasons.
Multiple vocabularies, one for each different community of interest – e.g. marketing, research, executives, etc. – are often required for large businesses. Multiple vocabularies covering similar subjects, but in different languages, are common in large corporations. If multiple vocabularies are used, the BI system needs to support cross-community browsing and alerting, with exchange of relevant documents and collaboration.
Auto-classification:
Vocabularies of a useful scale almost certainly require auto-classification, since it is not practical to allocate categories manually for an individual corporation.
Auto-classification means the mechanical assignment of vocabulary terms to documents, messages, news items, reports, etc. when they become accessible to the enterprise network. It involves matching new documents to the appropriate preferred topics in the vocabulary using classification rules or inference techniques.
Rules tend to be set for the narrower topics. Therefore, the selected terms are narrow concepts – such as a customer name, supplier, competitor, or a product or service. The higher level, parent terms, such as customer, problem customer, etc. are then added using inheritance provisions. Normally this is done as soon as a document comes within the scope of the KM system. Reclassification is required whenever the vocabulary changes significantly.
If you are interested in more detail on merging hard and tacit information, you can see some examples of KPI templates, a subset of a complete business vocabulary, in the download for my BI Pathfinder project www.bipathfinder.com
BI for CRM – Much hyped as easy, but hard to build effectively September 10, 2006
Posted by Cyril Brookes in BI Requirements Definition, BI metadata documentation, General, Tacit (soft) information for BI.add a comment
We all know that every enterprise needs automated CRM, and BI is a big part of that. All the IT gurus on or off payola, marketing white papers, and newsletters tell us so. As the saying goes: “It must be true, it’s in all the papers”.
Most of this advice avalanche accepts as axiomatic:
-
Successful CRM, like Minerva, springs full-panoplied from the brow of Jove , or
-
Implement any CRM software package and, bingo, we have instant lift-off, etc.
But, of course, CRM success isn’t automatic, there’s a lot of hard work. As I have posted before, my mantra in these situations is drawn from the late Peter Drucker:
Efficiency is doing things right.
Effectiveness is doing the right things
This is most important when considering how to design or, more likely in today’s enterprise, make more effective, the CRM environment. There are very many options, and they can often be implemented quickly, so they are efficient – but they don’t do the job as it needs to be done. Effective CRM is complex, elusive and requires careful thought, not rushed procurement.
This post is rather lengthy, but I strongly believe that there is too much written on the net about this topic that is a series of common knowledge checklists, without the accompanying detail that suggests, or occasionally prescribes, how the essential components can/should/must be implemented.
Of course, I’m banging on again about the need to define the requirements before you build BI solutions. Here, however, it is especially true, for defining CRM is essentially about defining its BI component. Why? Let’s consider what is involved.
CRM‘s context and solutions involve much more than BI reporting. We all know, truly this time, that for good CRM we potentially need some or all:
- Data Warehouse content detailing some or all transactions and static profiles with:
- Customer “touches” including SFA data; the statistics plus the sales representative contact reports
- Sales transactions and associated accounts receivable
- Customer intelligence; the statistics, order patterns, etc. plus what the customers themselves and the industry is saying about our customers
- Service calls and results; not just numbers, but also the comments from service personnel on issues as they arise
- Customer complaints and resolutions; not just the numbers, but also the content and especially the trends
- Call-center and telemarketing activity and intelligence; what the operators are learning from contacts, and the relevant statistics
- Marketing promotion performance; how our marketing and promotional efforts are paying off,
- Competitor intelligence; what competitors are doing and saying and what the industry is saying about them
Note that these data, in full scope, are drawn from both hard and soft (tacit) sources. Regular readers will know my conviction, and emphasis, that tacit information is a vital part of an effective BI environment, and CRM is no exception – in fact it is a prime example of a field where collection and dissemination of tacit information plus collaboration on its significance, Knowledge Management if you will, is absolutely key.
I’ll deal with all the above categories of data collection in more detail in later posts.
It is obvious that only the largest corporation can afford to collect all the above material in its finest detail. So the “some or all” filter is critical. Even with all the data, the “can’t see the wood for the trees” syndrome often leads to failure.
CRM for each enterprise, large or small, needs access to some or all of these databases, BUT it is the executive interface to them for reporting, inquiry, alerting and analysis that determines success or otherwise – effectiveness or not. Make no mistake, the executive interface determines what must be collected and stored in the data warehouse.
THEREFORE, the BI specification for on-demand reports, pre-formatted inquiry responses, alerting messages, collaboration between sales, marketing, service and product professionals, decision support environment, etc. must drive the specification for the data warehouse content and the CRM application software.
BI requirements definition is not everything in CRM planning, but when it comes to ensuring success, it is truly the only thing that’s important. All the rest is relatively straightforward; just connect the dots and you’re done, courtesy of the plethora of software that’s just waiting to be turned on.
So how do we go about defining the CRM BI requirements? We have the two basic options I’ve discussed in earlier posts, but I repeat some of it here – see the June 30 and some earlier contributions.
Top-down, or “waterfall” is the logical process for the green-field site with time and resource for planning activity. Bottom-up is for everyone else.
Both approaches have the same objectives. They require that we interview executives and carry out research into data availability to establish:
-
What information needs to be reported routinely, or available to be reported on-demand?
-
How can potential problems or opportunities be detected and executives alerted to them?
-
What reporting and models are required for Drilldown – especially diagnosis analysis and decision support (see post of Aug 27)?
-
What data items are required in the data warehouse cubes to satisfy the above reporting needs?
-
What degree of detail, i.e. the data dimensions, is required for the above?
-
What additional dimensions are required for Drilldown?
The comprehensive nature of top-down analysis should produce the best result, but it will take time and may unacceptably delay that Holy Grail – the quick ROI. Bottom-up will almost always give a sub-optimal result, but quicker. I’ll compare them briefly after presenting the general principles.
The differences are in the effort involved and the starting point. CRM BI reporting specifications include the following:
-
Pre-formatted reporting for routine and on-demand inquires: This is intended to give the client executives comfort that they are fully informed about the customer relationship status.
-
Pre-specified alerting: This is intended keep executives up-to-date with any unusual existing or forecast situations that could imply problems.
Prespecified CRM BI therefore involves some or all of:
Summarization of KPIs and other metrics dealing with customer relations
Comparitive analysis of KPIs and metrics with CRM pre-specified benchmarks
Forecasts of key variables
Trend analysis of relevant KPIs and metrics
Alerting to unusual situations
Determination of the data items and dimensions required for the above
In all cases, it is vital to ensure that the tacit information – collected from sales people from calls and industry gossip, meeting minutes, rumors, opinions, etc. is assembled, assessed, communicated and escalated in a structured context. It is the early warning system for the enterprise, and especially important when the topics are customer related.
-
Drilldown reporting is the other CRM BI specification component: modeling and forecasting desirably (IMHO) follows the principles discussed in my August 27 post. It involves some or all of:
Scenario building to predict potential problem situations
Assessment of the inquiry, statistical analysis and data mining capabilities required for resolving situations in those scenarios
Assessment of the additional data and data dimensions required by these capabilities
Assessment of the additional access to tacit information and collaboration capability; particularly the identification of subject experts who can comment on emerging problem situations.
I return now to the methodologies involved. Top-down methods for requirements definition involve significant interaction with executives, to understand the business processes and how the executives use information to manage them adequately. The data resource and dimension design for the data warehouse cubes follow naturally from the information requirements as they are determined. Of course, the overall design is iterative, as some of the required data and its dimensions will not be readily available at the outset.
For those interested in the detail of this approach, the processes and documentation I use in CRM and other BI projects is described at www.bipathfinder.com.
Bottom-up methods focus on what is available in the data warehouse cubes for reporting and the existing reports. Although the process is abbreviated, compared with top-down, it follows the same general steps, and has the same objective – that is to prepare a requirements definition to guide the CRM environment development.
Basically, Bottom-up methods involve answering the questions:
-
Where are we with CRM reporting now?
-
What data is available, in what detail?
-
Where does it come from, how good is it; who uses it?
-
What better use can be made of the existing resources?
-
How should we structure the project to obtain maximum ROI in the shortest time?
The first step in Bottom-up approaches to CRM BI must be documentation of the existing data warehouse meta data, including the links to source data resources – such as legacy transactions processing systems and their databases.
There are several products that do this. However, recognizing this imperative, I have helped specify, and now use, a new product – it works at this time only for Microsoft environments – called BI Documenter to document the SQL meta data and the links to cubes, reports and sources. If you want more detail it is given at www.bidocumenter.com. There is a free version that documents SQL database content, and an Enterprise version that includes all the BI metadata content for Analysis Services and Integration Services.
I’ll amplify these concepts, especially the data requirements and pre-specificied reporting content in a later post.