jump to navigation

Unstructured Information in BI – Implementation Practicalities with Tacit Data August 30, 2007

Posted by Cyril Brookes in General, Tacit (soft) information for BI, Taxonomies, Tags, Corporate Vocabularies, Unstructured Information.
add a comment

Designing an unstructured information based BI system must take account of the explicit and tacit distinction. There is consensus for this, in blog-speak that means “me and my mate in the next office agree”! Feedback on my last two posts does, however, unanimously support this contention. The issue remains, however, so what do we do about it? Here’s what I propose.

Most businesses have a reasonably adequate process for collecting explicit unstructured information, the documents, news, emails, reports, etc. And if your’s doesn’t, the corporate portal experience awaits your attention. For heavy hitters the UIMA approach with its multi vendor retinue is available, and willing, for a substantial sum.

I have opined in the last posts here and here that explicit unstructured information is not where BI relevance is at. It can be a start, but the real value lies in the qualification that the executive and professional mind-space can give to seeds of BI, both explicit and tacit. The tacit realm is the goldmine; it is where the current, relevant, actionable, validated business intelligence lies.

How, then Dear Reader, do you capitalize on your tacit resources?

It’s a 9 step process, as I see it

  1. Encourage contributions from everyone, everywhere, based on credible rumor, opinion, assessment, etc.
  2. Scour the corporate world for knowledge building seeds, explicit and tacit – web crawlers, internal and external portals, news feeds, etc.
  3. Selectively disseminate raw data seeds to subject specialists – formally appointed for preference
  4. Encourage comments on those seeds by the specialists – acts, sources, cross-references, importance, time criticality – with discussion threads escalating in importance as appropriate
  5. Selectively disseminate comments – dynamic audience creation, so that more people, and more senior executives, are aware of more important issues
  6. Encourage issue identification by executives and professionals – implications, assessments, importance value adjustments, criticality adjustments
  7. Selectively disseminate the discussion – dynamic audience modification as business significance becomes clearer, possibly creating closed group discussions if the issue becomes strategic
  8. Propagate decisions made to the appropriate staff
  9. Store knowledge created – with time stamp, sunset clause if appropriate, to help avoid multiple solutions to the same problem

Obviously this must be an explicit process, where the tacit input is first encouraged, then amplified, assessed, amplified again until either the issue dies, is resolved, or mutates into another issue. But make no mistake; it’s the tacit input that drives the successful implementation.

Essentially we are making explicit that which was tacit; but on a selective basis, right time, right people, right place.

There is downside, however. Creating a workable tacit unstructured information BI system with the above features is non-trivial. I have done it many times, and it was never easy.

Caveats and Dependencies

Cultural Crevasses

Culture of collaboration is the all important enabler. If the people related barriers to sharing the knowledge creation process are not addressed, the venture will fail. No question about it. I have made an earlier post on the cultural issues and how they can be managed, but, briefly, the most critical barriers are, in my experience:

  • There’s no reward mechanism for contributing intelligence, and it’s a lot of work for no personal benefit
  • You don’t know who to tell, and it’s a lot of effort to find out
  • You don’t know if this BI snippet you have come across is accurate, you don’t want to bother someone else unnecessarily and someone else must know it anyway
  • There’s no important person around to hear what you have to say; so keep this intelligence to yourself until there is the right audience – the more valuable it is, the longer you’ll wait.
  • Tall poppies lose their heads, so keep your head down, and messengers get shot
  • You don’t want to embarrass your boss, or peer group, so keep it quiet

Source Validation

The source of intelligence is most people’s key to determining apparent accuracy of any tacit input. If you get a stock market tip, you will always check where it came from before acting. It’s the same for a rumor on a competitor’s product recall.

Audience Creation

Dissemination is completely dependent on adequate categorization. If a document, email, news item, etc. is not classified it cannot be circulated to the right audience. And everyone in the business must use the same terms for categorization, or they will miss relevant documents.

Crucial Taxonomies

This implies a standard comprehensive corporate vocabulary or taxonomy. Setting this up is not trivial either.

Automatic Categorization is Oversold

It’s not sufficient to classify documents by internal content references. The real, useful keywords for document that is relevant to BI may not even appear in the text. In spite of the tremendous advances in text analysis, the personal categorization by a subject expert still wins the classification stakes, in my opinion. By all means use the automated technique to get the item to a subject expert, but he/she will always be the best determinant of cross-references, importance and time criticality.

Finally

I believe that an important principle BI analysts need to fully understand is “the strategic and most valuable information in your business is in the minds of the managers and professionals” as first enunciated by Henry Minzberg. Turning this tacit unstructured information into explicit useful stuff is universally a high priority task. Done well, it creates the difference between learning and non-learning enterprises.

Unstructured Information – Tacit Versus Explicit for Profit and BI Best Practice August 9, 2007

Posted by Cyril Brookes in General, Issues in building BI reporting systems, Tacit (soft) information for BI, Unstructured Information.
add a comment

A picture may be worth a thousand words, a news item also has about a thousand and a marketing strategic plan may have around five thousand. OK, but a great idea for a new marketing message, an expert’s adverse comment on the marketing plan, or a chance serendipitous airplane conversation about a competitor’s plans may be each worth a million dollars, for just a few hundred words. What do you want, words or dollars?

I believe there is far too much emphasis on the analysis of documented unstructured information as a BI resource. The basic important data just isn’t there for most businesses. You can search as long as you like; mine it, categorize it, summarize it, but to no avail, the well is dry.

This post follows on from my earlier, definitional, piece on this subject.

Of course, I recognize there is potential import in some written material, for example, recent emails, salesperson call reports, customer complaints and their ilk. But these are like seeds, rather than the fruit off the tree. They are the beginning of a BI story, not the whole enchilada.

At risk of making the discussion too deep, Dear Reader, I think we need to consider the basic concepts before coming to any conclusions about how a corporation should manage its unstructured data, and the tools required.

I find it valuable to characterize unstructured information with a 2 x 3 matrix.

The horizontal axis has the above two basic categories of unstructured information:

Explicit unstructured items are those that are basically unformatted, but have a physical, computable, presence; e.g. documents, pictures, emails, graphs, etc.

Tacit items are basically anything unformatted that is not explicit, they’re still in the minds of professionals and managers, but are nonetheless both real and vital; e.g. mental models, ideas, rumors, phone calls, opinions, verbal commentary, etc.

The vertical axis has the three categories of unstructured information (according to moi!): independent, qualification and reference items.

Independent items stand alone being self-explanatory in the first instance, not requiring reference to other pieces of information, be they structured, unstructured, explicit or tacit.

Qualification items have an adjectival quality, since they add value to other items (structured or unstructured), but are therefore relatively useless without reference to the appropriate one or more Independent or other Qualification items (note there may be one or more threads to a discussion based on an Independent item)

Reference items are pointers to subject experts who can provide details or opinions, and other sources of information, structured or unstructured, together with quality assessments of the value, reliability and timeliness of those sources. As Samuel Johnson said “Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information on it”.

Here’s a descriptive tabulation.

Explicit

Tacit

Independent, stand alone items

Meeting minutes

News items

Analyst reports

Marketing call reports

Legal judgments

Proposals

Government regulations

Suggestion box items

Customer complaints

Strategic plans

Manuals of best practice

Emails about new issues or competitive intelligence

Unrecorded meeting discussions

Ideas

Suggestions (undocumented)

Potential problems

Know-how

Competitive intelligence from informal customer/industry contacts

Stock market (racehorse) tip

Rumors

Intuitions

Off-the-record talks with government officers

Qualification, commentary items

Written comments on a report/news/analyst item

Documented opinion on problem or situation

Formal assessments of status implications

Verbal comments on a report/news/analyst item

Verbal comments on emails

Verbal opinions on problems

Verbal assessments of issues

Possible solution options

Comments on a rumor

Reference, source quality items

Lists of subject experts

Ratings of experts

Document sources, catalogs

Written reviews of document sources

Unrecorded subject expert identity

Opinions on expert quality

People who “know-how”

Informal unrecorded information source documents

Assessments of document source utility

Ask yourself, Dear Reader, which of these cells contains high value information, likely to assist your corporate executives find problems and make decisions? If they’re only on the explicit side, then you’re in the sights of UIMA and lots of enthusiastic vendors; good luck. If some are on the tacit side, please read on.

I’ve covered several of the relevant aspects of managing tacit information in earlier posts, e.g. here and here. However, there are some additional relevant observations to be made in the tacit versus explicit context.

  • The first, possibly most important, observation is apparently self-defeating to my thesis. All important, currently relevant, items of tacit unstructured information should be made explicit as soon as practicable.
  • It is not possible to identify, collect, store, disseminate, and facilitate collaboration on purely tacit items; it will happen in a “same time” meeting, of course, but wider ramifications demand that the prelude and/or outcome be made explicit.
  • Independent intelligence items, be they initially explicit (e.g. a recent email) or tacit, are very rarely complete as regards background to the issue, its importance to the business, its time criticality, and assessments of potential impact. If you will, the knowledge has not yet been created, only its seed.
  • The information required to complete the knowledge building that starts with an Independent Item is rarely in one location or person’s mind.
  • The knowledge building is based mostly on tacit information
  • The knowledge building process is most effective if performed via collaboration between the people who have, or know where to find, the necessary Qualification Items of information.
  • Some process for collaboration audience selection is required, one based on issue content, criticality and importance. It shouldn’t be left to pure chance.
  • Desirably, the collaboration process, but certainly the end result, should be made explicit, to avoid resolving the same issue many times over.

In my previous post I offered some questions that might provoke your curiosity, Dear Reader

  1. What are the most useful sources of unstructured information in our business? Are they Explicit or Tacit?
  2. If Explicit, how do we best marshal the information and report it?
  3. If Tacit, ditto?
  4. Is the information we get from our unstructured sources complete, and ready for promulgation, or do we need to amplify or build on it before it’s useful?

I expect that you will be able to answer 1 and 4 for your business; I’ve outlined the issues as best I can.

I’ll defer offering pointers you might consider for 2 and 3 to the next post, because I believe we still need to revisit the processes and constraints that inhabit the strange corporate world of collaborative knowledge building.

Unstructured Information in BI – Only for Spooks? What about Business Analysts? July 30, 2007

Posted by Cyril Brookes in General, Tacit (soft) information for BI, Unstructured Information.
2 comments

There’s a big marketing and consultant push on UIMA, unstructured information management architecture. But I think it is largely missing the point for the real world corporate BI people, i.e. those not spooks or librarians. The critical concept, ignored by many, is that unstructured information is of two kinds, explicit and tacit. Even Wikipedia gets it wrong, ignoring the latter.

I believe BI gains most from tacit intelligence, but that’s not where the product marketing thrust lies. The importance of tacit unstructured information in BI is summed up well by Timo Elliott in a cartoon and by James Taylor’s recent blog post.

The heavy hitters in BI software, as evidenced by the recent takeover activity, e.g. Business Objects and Inxight for one, are pressing home apparent advantage to be gained by corporations with analysis of masses of emails, news, documents, etc. See, for example, the description of UIMA.

Well and good. But you can’t make a silk purse out of a sow’s ear, as Jonathan Swift said. And you can’t create relevant action oriented information for executives out of data that has no embedded useful information in it. The ocean of documents, with some exceptions, is a BI desert for most companies. But I mix my metaphors.

Basically, I believe that a corporation’s vast compendium of historical documents has little BI relevance. It may be useful to track or assess a person’s background, or to isolate the cause of a problem. But history rarely contains the up-to-date information that’s relevant to managing a business, assessing current performance and finding problems. The real lies in exploiting the tacit stuff.

I’ve quoted Henry Minzberg often before, but it bears repeating, as the message hasn’t yet been fully understood in the mainstream of BI: “The strategic database of an organization is in the minds of its managers, not in the databases of its computers”. This is as true today as it was in 1974. Today, one can add: “Or in the morass of historical documents and emails”.

Of course recent emails and documents often contain important information that can, and should, be part of a BI context; but usually only as the seed for a collaborative knowledge building process. This is the nub of the issue; it’s hard to identify, collate, disseminate and collaborate on tacit unstructured information. Perhaps this is why most authors steer clear of the issue. But we need to address it if we are to be effective. More on this in my next post.

I wrote last year detailing some of my research in the 90s on the subject of “hard” and “soft” information, how valuable it is in many BI contexts particularly CRM, but also how difficult it is to exploit. In this context hard information refers to the structured, numeric, formatted, BI reports. Soft information is the unformatted, unarticulated, information in managers’ and professionals’ minds.

An interesting article by Rick Taylor deals with unstructured information is relevant here. It says, in part:

The key to defining knowledge management is to make sure you are separating “explicit” knowledge from “tacit” knowledge. Explicit knowledge is anything easy to quantify, write down, document or explain. Tacit knowledge is everything else. The knowledge based on ones experiences, and often times, at a subconscious level. It is information that you don’t necessarily know you know until you are reminded of it. If you were asked to write down everything you know, could you do it?

The explicit and tacit labels were used first in this context, I believe, by Nonaka and Takeuchi in The Knowledge-Creating Company.

The BI key questions that arise from this discussion are, I believe:

  1. What are the most useful sources of unstructured information in our business? Explicit or Tacit?
  2. If Explicit, how do we best marshal the information and report it?
  3. If Tacit, ditto?
  4. Is the information we get from our unstructured sources complete, and ready for promulgation, or do we need to amplify or build on it before it’s useful?

I believe that the above analysis outlines the problem of utilizing tacit unstructured information reasonably well. I’ll offer my answers to these issues in the next post.