1 / 24

Text Analytics World Future Directions of Text Analytics

Text Analytics World Future Directions of Text Analytics. Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com. Agenda. Introduction: Current State of Text Analytics Survey Roadblocks for Text Analytics

milt
Download Presentation

Text Analytics World Future Directions of Text Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Analytics World Future Directions of Text Analytics Tom ReamyChief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

  2. Agenda • Introduction: • Current State of Text Analytics • Survey • Roadblocks for Text Analytics • Complexity and Customization • Fast and Slow (Thinking) Text Analytics • Building Text Analytics Brains • New Methods for Text Analytics • Lessons from Watson • Some Wild New Ideas and Approaches • Questions

  3. Introduction: KAPS Group • Knowledge Architecture Professional Services – Network of Consultants • Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies • Services: • Strategy – IM & KM - Text Analytics, Social Media, Integration • Taxonomy/Text Analytics development, consulting, customization • Text Analytics Quick Start – Audit, Evaluation, Pilot • Social Media: Text based applications – design & development • Partners – SAS, Smart Logic, Expert Systems, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics • Projects – Portals, taxonomy, Text analytics – news, expertise location, information strategy, text analytics evaluation, Quick Start in Text A. • Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc. • Presentations, Articles, White Papers – www.kapsgroup.com

  4. Introduction:What is Text Analytics? • Text Mining – NLP, statistical, predictive, machine learning • Semantic Technology – ontology, fact extraction • Extraction – entities – known and unknown, concepts, events • Catalogs with variants, rule based • Sentiment Analysis • Objects and phrases – statistics & rules – Positive and Negative • Auto-categorization • Training sets, Terms, Semantic Networks • Rules: Boolean - AND, OR, NOT • Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE • Disambiguation - Identification of objects, events, context • Build rules based, not simply Bag of Individual Words

  5. Text Analytics WorldCurrent State of Text Analytics • History – academic research, focus on NLP • Inxight –out of ZeroxParc • Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data • Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends • Half from 2008 are gone - Lucky ones got bought • Early applications – News aggregation and Enterprise Search – • Second Wave = shift to sentiment analysis • Enterprise search down, taxonomy up –need for metadata – not great results from either – 10 years of effort for what? • Text Analytics is growing – But

  6. Text Analytics WorldCurrent State of Text Analytics • Current Market: 2012 – exceed $1 Bil for text analytics (10% of total Analytics) • Growing 20% a year • Search is 33% of total market • Other major areas: • Sentiment and Social Media Analysis, Customer Intelligence • Business Intelligence, Range of text based applications • Fragmented market place – full platform, low level, specialty • Embedded in content management, search, No clear leader.

  7. Text Analytics WorldCurrent State of Text Analytics: Vendor Space • Taxonomy Management – SchemaLogic, Pool Party • From Taxonomy to Text Analytics • Data Harmony, Multi-Tes • Extraction and Analytics • Linguamatics (Pharma), Temis, whole range of companies • Business Intelligence – Clear Forest, Inxight • Sentiment Analysis – Attensity, Lexalytics, Clarabridge • Open Source – GATE • Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching • Embedded in Content Management, Search • Autonomy, FAST, Endeca, Exalead, etc.

  8. Future Directions: Survey Results • 28% just getting started, 11% not yet • What factors are holding back adoption of TA? • Lack of clarity about value of TA – 23.4% • Lack of knowledge about TA – 17.0% • Lack of senior management buy-in - 8.5% • Don’t believe TA has enough business value -6.4% • Other factors • Financial Constraints – 14.9% • Other priorities more important – 12.8% • Lack of articulated strategic vision – by vendors, consultants, advocates, etc.

  9. Text Analytics WorldPrimary Obstacle: Complexity • Usability of software is one element • More important is difficulty of models: • Conceptual and document models • General need – more structure but also more flexible kinds of structure and interactions • More modules and more ways of combining or interacting – IBM – select best answer but others • Competitive – learn and evolve – Feedback! • Cooperative – join together to form higher level structures

  10. Text Analytics WorldPrimary Obstacle: Complexity: Partial Solutions • Build complex semantic networks – basic concepts – good for demo, gets a start, but very complex to build on • Library of taxonomies – but all need major customization and often are not a good starting point – different types of taxonomies – index vs. categorization • Customization – Text Analytics– heavily context dependent • Content, Questions, Taxonomy-Ontology • Level of specificity – Telecommunications • Specialized vocabularies, acronyms • Specialized relationships – conceptual and organizational • How overcome?

  11. Text Analytics World Thinking Fast and Slow – Daniel Kahneman • System 1 and System 2 – Daniel Kahneman • System 1 – fast and automatic – little conscious control • Represents categories as prototypes – stereotypes • Norms for immediate detection of anomalies – distinguish the surprising from the normal • fast detection of simple differences, detect hostility in a voice, find best chess move (if a master) • Priming / Anchoring – susceptible to systemic errors • Temperature Example • Biased to believe and confirm • Focuses on existing evidence (ignores missing – WYSIATI) • .

  12. Text Analytics World Thinking Fast and Slow • System 2 – Complex, effortful judgments and calculations • System 2 is the only one that can follow rules, compare objects on several attributes, and make deliberate choices • Understand complex sentences • Check the validity of a complex logical argument • Focus attention – can make people blind to all else – Invisible Gorilla • Similar to traditional dichotomies – Tacit – Explicit, etc • Basic Design – System 1 is basic to most experiences, and System 2 takes over when things get difficult – conscious control • Text Analysis and Text Mining / Auto-Cat and TA Cat

  13. Text Analytics WorldSystem 1 & 2 – and Text Analytics Approaches • “Automatic Categorization” – System 1 prototypes • Limited value -- only works in simple environments • Shallow categories with large differences • Not open to conscious control • System 2 – categories – complex, minute differences, deep categories • Together: • Choose one or other for some contexts • Combine both – need to develop new kinds of categories and/or new ways to combine?

  14. Text Analytics World Text Mining and Text Analytics • Text Analytics and Big Data enrich each other • Data tells you what people did, TA tells you why • Text Analytics – pre-processing for TM • Discover additional structure in unstructured text • Behavior Prediction – adding depth in individual documents • New variables for Predictive Analytics, Social Media Analytics • New dimensions – 90% of information, 50% using Twitter analysis • Text Mining for TA– Semi-automated taxonomy development • Apply data methods, predictive analytics to unstructured text • New Models – Watson ensemble methods, reasoning apps • Extraction – smarter extraction – sections of documents, Boolean, advanced rules – drug names, adverse events – major mention

  15. Text Analytics WorldIntegration of Text and Data Analytics • Expertise Location: Case Study: Data and Text • Data Sources: • HR Information: Geography, Title-Grade, years of experience, education, projects worked on, hours logged, etc. • Text Sources: • Document authored (major and minor authors) – data and/or text • Documents associated (teams, themes) – categorized to a taxonomy • Experience description – extract concepts, entities • Self-reported expertise – requires normalization, quality control • Complex judgments: • Faceted application • Ensemble methods – combine evaluations

  16. Text Analytics World : Building on the PlatformExpertise Analysis • Expertise Characterization for individuals, communities, documents, and sets of documents • Experts prefer lower, subordinate levels • Novice & General – high and basic level • Experts language structure is different • Focus on procedures over content • Applications: • Business & Customer intelligence – add expertise to sentiment • Deeper research into communities, customers • Expertise location- Generate automatic expertise characterization based on documents

  17. Text Analytics WorldNew Approaches – Applied Watson • Key concept is that multiple approaches are required – and a way to combine them – confidence score • Aim = 85% accuracy of 50% of questions (Ken Jennings – 92% of 62% • Used a combination of structure and text search • Massive parallelism, many experts, pervasive confidence estimation, integration of shallow and deep knowledge • Key step – fast filtering to get to top 100 (System 1) • Then – intense analysis to evaluate (System 2) – multiple scoring

  18. Text Analytics WorldNew Approaches – Applied Watson • Multiple sources – taxonomies, ontologies, etc. • Special modules – temporal and spatial reasoning – anomalies • Taxonomic, Geospatial, Temporal, Source Reliability, Gender, Name Consistency, Relational, Passage Support, Theory Consistency, etc. • Merge answer scores before ranking • 3 Years, 20 researchers of all types • Got to 70% of 70% - in two hours • More difficult answers / more complete questions

  19. Text Analytics WorldNew Approaches: Adding Structure to Content • Contexts – whole range of types of context • Document types-purpose, Textual complexity, formats • Categorization by page, sections (text markers) or even sentence or phrase – Key – remember what the last page was • [Key– documents are not unstructured – they have a variety of structures] • Use generic components – like the level of generality of terms or concepts (general and context specific)

  20. Text Analytics WorldNew Approaches • Idea – build a higher level language – like tutoring systems • More complex primitives • IDEA – Crowd sourcing – to evolve better structures – how design to avoid design by committee – other side of wisdom of crowds • Design TA Game – 1,000’s to play and evolve • Partner with MOOC - example – better essay evaluation – avoid gaming the system – lots of multi-syllabic words – nonsense • Also to enhance software / modules

  21. New Directions in Text AnalyticsConclusions • Text Analytics is growing – but • Big obstacles remain • Strategic Vision of text analytics in the enterprise, applications • Concrete and quick application to drive acceptance • Software still too complex, un-integrated • New models are being developed Cognitive science – System 1 and 2, AI – brains that learn Watson like integrated approaches • Overcome complexity – modules (System 1/ Standard) with new ways of integrating (System 2 / Customized) – smarter and easier

  22. Questions? Tom Reamytomr@kapsgroup.com KAPS Group http://www.kapsgroup.com Upcoming: Taxonomy Boot Camp – KMWorld -DC, Nov 3-6 Workshop on Text Analytics Text Analytics World – San Francisco, March 17-19

  23. Future Directions for Text AnalyticsSocial Media: Beyond Simple Sentiment • Analysis of Conversations- Higher level context • Techniques: self-revelation, humor, sharing of secrets, establishment of informal agreements, private language • Detect relationships among speakers and changes over time • Strength of social ties, informal hierarchies • Combination with other techniques • Expertise Analysis – plus Influencers • Quality of communication (strength of social ties, extent of private language, amount and nature of epistemic emotions – confusion+) • Experiments - Pronoun Analysis – personality types • Analysis of phrases, multiple contexts – conditionals, oblique

  24. Introduction: Personal • Deep Background: History of Ideas – dissertation – Models of Historical Knowledge • Artificial Intelligence research at Stanford AI Lab • Programming – designed two computer games, educational software • Started an Education Software company, CTO • Height of California recession • Information Architect – Chiron/Novartis, Schwab Intranet • Importance of metadata, taxonomy, search – Verity • From technology to semantics, usability • From library science to cognitive science • 2002 – started consulting company

More Related