jump to navigation

Collaboration and Knowledge History Creation in BI – The Twin Pyramid Model October 2, 2007

Posted by Cyril Brookes in General, Issues in building BI reporting systems, Taxonomies, Tags, Corporate Vocabularies.
add a comment

Pyramids and BI deficiencies are a popular blog topic. Rising to the challenge of Andy Bailey in his “Where has BI fallen short” paper, I have some comments on the Collaboration and Knowledge/History categories of shortcomings. Other examples include, for example, James Taylor’s observation here and Neil Raden’s paper of a few months ago.

First the Collaboration bit. Regular readers will know that this is a big issue with me. I believe most businesses do it badly, for the reasons I’ve already given. But to explain how it needs to be “operationalized” we need to look at pyramids, one regular and one inverted. They’re different from Neil Raden’s but are pyramids nonetheless.

The basic problem of managing knowledge creation, collecting history and making valuable stuff rise to the top of the “action” pyramid stems from abundance.

Herbert Simon got it right when he said “The impact of information is obvious. It consumes the attention of its readers. Therefore, a wealth of information creates a poverty of attention.” The totality of information available, both internally and externally, is overwhelming. It follows that filtering and other controls on information delivery are necessary if benefits from information resources are to be achieved.

Hence the pyramid pair as depicted below. Most documents are of interest to only a few, perhaps only one person, in a business. They can be said to have an importance at Level 1. But a few documents are of Level 4 import, and are of interest to many people. Obviously, it needs to be the function of any Collaboration and Knowledge Creation function to cause the important items to rise to the top of the pyramid.
The Collaboration Pyramids

Knowing his makes the specification of the application relatively straightforward. It needs a web-crawler, document trawling feature, categorization capability, a subject expert and escalation of importance sub-system and the usual alerting, search, browse features. Simple; just like the picture below!


Note: Click on the diagram and it may be clearer.


Collaboration Process

If you, Dear Reader, are going to overcome shortcomings in your BI context, this is a great place to start.

Unstructured Information in BI – Implementation Practicalities with Tacit Data August 30, 2007

Posted by Cyril Brookes in General, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies, Unstructured Information.
add a comment

Designing an unstructured information based BI system must take account of the explicit and tacit distinction. There is consensus for this, in blog-speak that means “me and my mate in the next office agree”! Feedback on my last two posts does, however, unanimously support this contention. The issue remains, however, so what do we do about it? Here’s what I propose.

Most businesses have a reasonably adequate process for collecting explicit unstructured information, the documents, news, emails, reports, etc. And if your’s doesn’t, the corporate portal experience awaits your attention. For heavy hitters the UIMA approach with its multi vendor retinue is available, and willing, for a substantial sum.

I have opined in the last posts here and here that explicit unstructured information is not where BI relevance is at. It can be a start, but the real value lies in the qualification that the executive and professional mind-space can give to seeds of BI, both explicit and tacit. The tacit realm is the goldmine; it is where the current, relevant, actionable, validated business intelligence lies.

How, then Dear Reader, do you capitalize on your tacit resources?

It’s a 9 step process, as I see it

  1. Encourage contributions from everyone, everywhere, based on credible rumor, opinion, assessment, etc.
  2. Scour the corporate world for knowledge building seeds, explicit and tacit – web crawlers, internal and external portals, news feeds, etc.
  3. Selectively disseminate raw data seeds to subject specialists – formally appointed for preference
  4. Encourage comments on those seeds by the specialists – acts, sources, cross-references, importance, time criticality – with discussion threads escalating in importance as appropriate
  5. Selectively disseminate comments – dynamic audience creation, so that more people, and more senior executives, are aware of more important issues
  6. Encourage issue identification by executives and professionals – implications, assessments, importance value adjustments, criticality adjustments
  7. Selectively disseminate the discussion – dynamic audience modification as business significance becomes clearer, possibly creating closed group discussions if the issue becomes strategic
  8. Propagate decisions made to the appropriate staff
  9. Store knowledge created – with time stamp, sunset clause if appropriate, to help avoid multiple solutions to the same problem

Obviously this must be an explicit process, where the tacit input is first encouraged, then amplified, assessed, amplified again until either the issue dies, is resolved, or mutates into another issue. But make no mistake; it’s the tacit input that drives the successful implementation.

Essentially we are making explicit that which was tacit; but on a selective basis, right time, right people, right place.

There is downside, however. Creating a workable tacit unstructured information BI system with the above features is non-trivial. I have done it many times, and it was never easy.

Caveats and Dependencies

Cultural Crevasses

Culture of collaboration is the all important enabler. If the people related barriers to sharing the knowledge creation process are not addressed, the venture will fail. No question about it. I have made an earlier post on the cultural issues and how they can be managed, but, briefly, the most critical barriers are, in my experience:

  • There’s no reward mechanism for contributing intelligence, and it’s a lot of work for no personal benefit
  • You don’t know who to tell, and it’s a lot of effort to find out
  • You don’t know if this BI snippet you have come across is accurate, you don’t want to bother someone else unnecessarily and someone else must know it anyway
  • There’s no important person around to hear what you have to say; so keep this intelligence to yourself until there is the right audience – the more valuable it is, the longer you’ll wait.
  • Tall poppies lose their heads, so keep your head down, and messengers get shot
  • You don’t want to embarrass your boss, or peer group, so keep it quiet

Source Validation

The source of intelligence is most people’s key to determining apparent accuracy of any tacit input. If you get a stock market tip, you will always check where it came from before acting. It’s the same for a rumor on a competitor’s product recall.

Audience Creation

Dissemination is completely dependent on adequate categorization. If a document, email, news item, etc. is not classified it cannot be circulated to the right audience. And everyone in the business must use the same terms for categorization, or they will miss relevant documents.

Crucial Taxonomies

This implies a standard comprehensive corporate vocabulary or taxonomy. Setting this up is not trivial either.

Automatic Categorization is Oversold

It’s not sufficient to classify documents by internal content references. The real, useful keywords for document that is relevant to BI may not even appear in the text. In spite of the tremendous advances in text analysis, the personal categorization by a subject expert still wins the classification stakes, in my opinion. By all means use the automated technique to get the item to a subject expert, but he/she will always be the best determinant of cross-references, importance and time criticality.


I believe that an important principle BI analysts need to fully understand is “the strategic and most valuable information in your business is in the minds of the managers and professionals” as first enunciated by Henry Minzberg. Turning this tacit unstructured information into explicit useful stuff is universally a high priority task. Done well, it creates the difference between learning and non-learning enterprises.

Social Bookmarks and Tagging in BI Fail the Just-in-Time Test February 20, 2007

Posted by Cyril Brookes in General, Issues in building BI reporting systems, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies.

Tagging and Social Book-marking for BI applications is a hot topic. See, for example, Bill Ives comment. But I think there are barriers to it’s success in the corporate context. It doesn’t lend itself easily to the dynamics that are, or should be, key aspects of BI system design.

Sure, I am completely in agreement that information, particularly soft information, needs to be tagged, or classified, before it can be useful. I’ve talked about this several times in this blog. Social book-marking is better than none.

If information isn’t categorized then it cannot be selectively disseminate or easily searched for.

The social book-marking ethos implies that people create their own tags. But, of course, no one else knows (at least knows in a short time frame) that this tag is being applied for this purpose.

Until the tag’s existence and meaning is widely known, no item of, say, competitive intelligence with this tag can be subject to targeted personalization to relevant decision makers. More importantly, if the tag describes a concept that is identical to, or nearly so, those linked to one or more other tags then confusion is likely.

It follows that social book-marking can be effective in information retrieval, if the tags are managed, moderated and disseminated. However, this approach is not likely to be valuable for alerting purposes, especially in dynamic business environments. This is because those being alerted will not know of the tags existence, and will be frustrated by multiple tags with the same meaning.

In any case, corporate wide management of social bookmark tags is always going to be a big ask.

Knowledge in a business is often created via group collaboration. The smart corporation enables such new knowledge to be disseminated rapidly to those who should know it, and can take requisite action. There is no time to create new tags that may be redundant anyway, and to disseminate their existence and meaning widely.

Business intelligence has two basic purposes:

1. Helping executives and professionals assess status and find problems

2. Supporting problem solving, usually by less senior staff

For the corporate BI context the alerting and problem finding objectives are usually more valuable than problem solving. Knowing an issue exists will often be absolutely critical, resolving it is usually less difficult and less important. We cannot solve problems we don’t know exist.

As I opined recently, it is the combination of subject matter and assessed importance that is the key to effective alerting, or selective dissemination. And if an executive is to have a personalization profile it must use tags that are pre-specified and whose meaning is understood widely. Social book-marking does not usually imply assessing importance. Often importance can only be determined by people outside the group that creates the information, and the tag.

In the BI context a corporate vocabulary of preferred terms will be more useful than various sets of personally created, and probably redundant, social bookmarks. This is because the standard terms are widely known. Further, they are usually grouped in hierarchies of broader and narrower concepts and this facilitates retrieval and alerting.


Executives can seek items of high importance that are classified by a broader term (say, overall gross margin issues), or those about a narrower term (say, product X gross margin) that are of lower importance. In either case, they will not be inundated with large numbers of items.

Of course, inside a project team and other tightly knit groups social bookmarks may be suitable ways to tag documents and other material for retrieval.

However, I don’t believe that the wider corporate environment will benefit to the same extent. It’s a case where more formality and discipline brings better results.

Knowledge Management vocabularies for tacit information processing; creation issues, scalability and auto-categorization September 14, 2006

Posted by Cyril Brookes in General, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies.
add a comment

My August 5 post introduced the topic of KM vocabularies and their essential role in building a balanced BI reporting environment – one that delivers both hard and tacit (soft) information This post offers more detail on vocabulary construction and practical use. These guidelines and suggestions are based on my experience building hundreds of vocabularies in KM implementations.

The vocabulary is important since it controls the categorization, retrieval and dissemination of documents. Without it there is little prospect of meaningful collaboration on important issues in the enterprise.

Creation Issues:

Automatic creation of a prototype vocabulary is a common starting point; using widely available text analysis systems. These systems process the content of many documents and compile a categorization list based on keywords, or more sophisticated contextual analysis.

These automatic systems have varying degrees of success in creating the hierarchies a KM vocabulary requires, identifying synonyms and especially determining the context (for example differentiating between alternative meanings of “heat” –

  • A batch of steel being produced;
  • The agent increasing temperature; and
  • Something female dogs exhibit.

Orphan topics also introduce indexation and classification and subsequent retrieval. Orphans are those terms that are unrelated to others in a hierarchy, that is no parents, or children. They ought be avoided.As highlighted in my earlier post, synonym processing is not desirable. All terms in the vocabulary should be “preferred” terms, and should become the universal identifiers for subject matter of documents, messages, etc. Synonyms should be handled in the auto-classification stage, where various common usage topics are converted to the preferred term for use in retrieval, etc.Automatic analysis of newly arrived documents (after the initial vocabulary is created), without reprocessing and reorganizing the entire collection of documents, may be difficult. So, the automatic procedure for building a vocabulary will often be useful only once, at the start of the exercise. Thereafter, the update most probably have to be manual.To facilitate vocabulary navigation it is necessary to embed higher level parent terms in the hierarchy. Manual editing is required here since the automatic process will not do this, and, in any case, the vernacular appropriate to the enterprise ought be used, not some standard industry or linguistic term.

A general purpose vocabulary may be useful testing platform, particularly if the information sources are news and other external sources. Internal documents tend to require a more enterprise and industry specific topic list. Similarly, a test platform can be built easily from an industry oriented set of KPIs, metrics or measures. These will be a subset of a complete vocabulary.

This is often the most effective starting point for creating an enterprise specific vocabulary and the associated rules for auto-classification.

All business oriented KM vocabularies are organic and will, therefore, evolve as the business interests and issues change. Normally, in a well designed vocabulary, these changes only involve the third and fourth levels of the hierarchy (new customers, competitors, mergers with same, new products, technologies, etc.), occasionally the first and second (a takeover creates a new business segment, a new class of customers is created to assist vocabulary navigation, etc.). Evolution is normally best achieved by manual adjustment, with suggestions being made by users as they encounter inadequate terminology or rules.Templates:Notwithstanding the industry focus on automatic construction of vocabularies by analysis of a pool of documents, my preference is to build a set of industy template vocabularies, and to create new versions by modifying earlier ones. This is because the terminology relevant to KM is very similar across organizations in the same industry, and similar, especially at the higher levels, across all businesses with similar operations, irrespective of industry.

Therefore, banks will have almost identical vocabularies, except at the third and fourth levels of detail, and an insurance company vocabulary will be quite similar to that of a bank.

Issues of scalability:

400 to 2000 topics is the common size range for useful BI vocabularies. Smaller than 400 is unlikely to provide sufficient granularity in categorization to satisfy inquiries. More than 2,000 terms will compromise navigation for complexity reasons.

Multiple vocabularies, one for each different community of interest – e.g. marketing, research, executives, etc. – are often required for large businesses. Multiple vocabularies covering similar subjects, but in different languages, are common in large corporations. If multiple vocabularies are used, the BI system needs to support cross-community browsing and alerting, with exchange of relevant documents and collaboration.


Vocabularies of a useful scale almost certainly require auto-classification, since it is not practical to allocate categories manually for an individual corporation.

Auto-classification means the mechanical assignment of vocabulary terms to documents, messages, news items, reports, etc. when they become accessible to the enterprise network. It involves matching new documents to the appropriate preferred topics in the vocabulary using classification rules or inference techniques.

Rules tend to be set for the narrower topics. Therefore, the selected terms are narrow concepts – such as a customer name, supplier, competitor, or a product or service. The higher level, parent terms, such as customer, problem customer, etc. are then added using inheritance provisions. Normally this is done as soon as a document comes within the scope of the KM system. Reclassification is required whenever the vocabulary changes significantly.

If you are interested in more detail on merging hard and tacit information, you can see some examples of KPI templates, a subset of a complete business vocabulary, in the download for my BI Pathfinder project www.bipathfinder.com

Knowledge management in practice: Building a vocabulary for classifying BI related tacit information August 6, 2006

Posted by Cyril Brookes in General, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies.
add a comment

In my July 27 post I outlined the reasons why an enterprise vocabulary of business terminology is a vital part of any BI environment that seeks to manage the knowledge available in the tacit (soft) information known to its professionals.

This post takes this topic a step further, looking at the form and function of such a vocabulary.

What does a BI oriented vocabulary look like?

Ideally, the BI oriented vocabulary has a basic hierarchical structure, allowing for broad, narrow, and narrower, (etc.) concepts to be nested.

BI vocabularies often have similar terms, even for different organizations – e.g. customer (or client), competitor, product, technology, sales operations, (etc.) branches of the hierarchy are commonly used, though with very different topics at the narrower levels.

Vocabularies have a similar structure for different organizations in the same industry, and this is a basis for expediting their construction, using industry templates.

Some narrow topics naturally exist in multiple branches of the hierarchy (i.e. they have more than one parent), because they belong to more than one part of the business structure. For example, a corporation may be both a customer and a member of a strategic alliance, be temporarily a problem customer account, but permanently be a finance industry customer.

The KPIs, measures and metrics used in Business Intelligence reporting will be a subset of the vocabulary. For more detail, review my BI Pathfinder project at www.bipathfinder.com.

Knowledge management, the sharing and reporting of tacit information, is encouraged through the introduction of “soft” concepts, for example; problem customers, projects at risk, employment issues, customer comments, whistle blower issues, etc. The quality of a vocabulary has particular bearing on the level of user satisfaction for browsing, searching and personalization. What distinguishes a high quality vocabulary? Essentially, vocabulary quality depends on:

Structure paralleling business processes,

Selection of concise and familiar terminology

Lack of synonyms that confuse usage

Avoidance of orphan topics (those not part of the structure), and

Routine updates to cater for new issues and entities.

The stability of vocabulary structure is important, so users get to “find their way around”, especially broader level terms should remain constant, with all evolution at the narrowest level, except for major business structure changes.

The breadth and depth of the vocabulary determine the practical “balance” of BI reporting, as in the Balanced Scorecard concept.

Here is an example customer branch of a vocabulary, with both hard and soft concepts included.


finance industry customers

abc enterprises

abc enterprises east region

abc enterprises west region

def corporation

xyz inc

transport industry customers

national bus lines

fast rail corporation

slow boat to china shipping

problem customer accounts

xyz inc

difficult business inc

strategic alliances

acme products

def corporation

mno industries

In the next post on this thread I will discuss issues of scalability and automatic categorization of documents.