1 / 30

Bridging the Gap Between Unstructured and Structured Data

Discover the power of integrating unstructured data with structured systems for enhanced business intelligence. Learn how to leverage text analysis for valuable insights across corporate operations.

bibarra
Download Presentation

Bridging the Gap Between Unstructured and Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BRIDGING THE GAP BETWEEN UNSTRUCTURED DATA AND STRUCTURED DATA A presentation by W H Inmon

  2. Email Program .Txt .Doc The informal systems of the corporation: The formal systems of a corporation: - unstructured data - structured systems - structured data • - .doc files • - .txt files • - .xls files • email • transcripted telephone - corporate transactions - corporate reports - corporate databases -customer files - audit reports

  3. 80% Email 20% Program .Txt .Doc It is estimated that less than 20% of corporate systems are structured.

  4. legal discovery search engines applications Email dbms compliance web content business intelligence Program ontology .Txt taxonomy .Doc ERP OLTP email archive document mgmt transactions imagine what would happen if the two worlds could be integrated……. the world of dbms, analytics, and other processing opens up.

  5. legal discovery search engines applications Email Email dbms compliance web content business intelligence Program ontology .Txt .Txt taxonomy .Doc .Doc ERP OLTP email archive document mgmt transactions tight integration between the two types of data.

  6. Email Program .Txt .Doc There is a gulf between the two worlds: - technology - business practice - organizational - historical

  7. Email Program .Txt .Doc Think of the possibilities!

  8. Imagine this - Reports and visualization show a lot. have you ever wondered why you can’t hook up your Business Objects to email? or telephone conversations?

  9. Email .Txt .Doc text numbers Business Intelligence There is a fundamental disconnect between unstructured data and business intelligence. So what would happen if we had powerful visualization for text?

  10. liver cancer skin cancer diabetes blood pressure thirst correlative information becomes very easy to spot

  11. for the general population for women for women who smoke for women who smoke over the age to 50 doing analysis on sub populations of women

  12. for the general population for women who smoke over the age to 50 the contrast between the different correlations of different populations leads to great insight

  13. broken wait too long late service did not fit installation salesman attitude delivery what about looking at customer feedback – complaints? now you can see the broader picture of what is happening

  14. but there are plenty of other places where the technology applies – - manufacturing warranties – (what patterns of defects are there?) - Weblogs (marketing – who is saying what?) - customer complaints – (what are the problem products?) - general email – (What’s the buzz? what is on people’s minds?) - insurance claims (what are the circumstances of accidents?)

  15. Email .Txt .Doc EMAIL another possibility is the monitoring of email and the transport of email to the structured environment MONITOR

  16. Email .Txt .Doc Monitoring emails and other corporate conversations - compliance – making sure that email is being used properly - compliance - corporate standard for language MONITOR Sarbanes Oxley HIPAA BASEL II

  17. Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….” Jan 5 – finance to vp “It looks like we are going to do $9,000,000 this quarter…” Jan 5 – president to analyst “This quarter looks like we are going to break new records…” Feb 1 – employee to employee “Did you see the stock market? Everything is going down…” Feb 3 – president to vp “What is happening to sales in the midwest? We didn’t expect this…” Feb 3 – vp to vp “The sales cycle looks like it is extending. The economy is tanking…” Feb 4 – sales manager to vp “It looks like we are going to be a little short this quarter…” Feb 6 – president to vp “What are we going to do to get sales up? Do we need to do some discounting?” Mar 2 – sales person to vp “Demand has dried up. We aren’t going to close as many sales this quarter as we thought…” A bunch of emails and conversations: What do you do with them?

  18. Jan 3 - vp to vp “This is going to be a real barn burner of a quarter….” Jan 5 – finance to vp “It looks like we are going to do $9,000,000 thisquarter…” Jan 5 – president to analyst “This quarter looks like we are going to break new records…” Feb 1 – employee to employee “Did you see the stockmarket? Everything is going down…” Feb 3 – president to vp “What is happening to sales in the midwest? We didn’t expect this…” Feb 3 – vp to vp “The sales cycle looks like it is extending. The economy is tanking…” Feb 4 – sales manager to vp “It looks like we are going to be a little short this quarter…” Feb 6 – president to vp “What are we going to do to get sales up? Do we need to do some discounting?” Mar 2 – sales person to vp “Demand has dried up. We aren’t going to close as many sales this quarter as we thought…” Examining emails (“combing” them) for important corporate information: Sarbanes Oxley quarter stock sales discount demand sales cycle external categories

  19. sales cycle email – Feb 24 phone conversation – Mar 14 meeting notes – Mar 18 ……………………………. sales email – Feb 2 email – Mar 5 phone – Mar 8 ……………… quarter email – Jan 2 email – Jan 4 email – Feb 5 ……………… discount phone conversation – Jan 6 email – Jan 12 email – Jan 14 ………………………….. Structured Environment The “combed” information is brought over to the structured environment. Now you can use standard tools, such as Cognos, Business Objects, Crystal Reports, MicroStrategy to do analysis.

  20. But there are other ways that communications can be used EMAIL customer data telephone conversations probabilistic match Emails and telephone conversations can be linked to CDI/CRM data.

  21. A true 360 degree view of the customer can be formed. “I placed an order last week and when it arrived it was the wrong size. And then your company would not take it back. I’m mad.” how easy is it going to be to engage Mrs Jones until she has satisfaction about her order

  22. A true 360 degree view of the customer can be formed. communications demographics delivering on the promise of CDI

  23. Email Program .Txt .Doc integration integration integration integration can’t I just use a search engine to link the two worlds? search engines do not integrate textual information

  24. Email Program .Txt .Doc integration integration integration integration text doesn’t need to be searched, it needs to be integrated

  25. Email Program .Txt .Doc integration integration integration integration “head ache” “heart attack” “Hepatitis A” “ha”

  26. Email Program .Txt .Doc integration integration integration integration “oblique fractured ulna” “oblique fractured tibia” “obliq fractured tarsi” “broken bone”

  27. Email Program .Txt .Doc integration integration integration integration What is meant by editing, integrating text? 8 – theming 9 – probabilistic matching 10 – negation exclusion 11 – concept clustering 12 – mid process editing 13 – change sensitivity 1 – stop word editing 2 – stemming 3 – synonym replacement 4 – synonym concatenation 5 – homograph resolution 6 – alternate spelling resolution 7 – external category classification

  28. Email Program .Txt .Doc For a detailed description of how the unstructured environment should be linked to the structured environment, go to - www.inmoncif.com and look for DW 2.0 TM or go to - www.inmondatasystems.com

  29. Structured Environment Unstructured Data Query visualization DB2 probabilistic match Business Objects, Cognos, MicroStrategy, Crystal Reports

More Related