1 / 38

Template-based Authoring

Template-based Authoring. Knowledge Systems Laboratory Stanford. Project Goals. Assist analyst in everyday work Knowledge Authoring Tools to assist in: Research for reports Produce reports Consume reports Share reports Our solution: Semantic Web Templates. Semantic Web Templates.

hilde
Download Presentation

Template-based Authoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Template-based Authoring Knowledge Systems Laboratory Stanford

  2. Project Goals • Assist analyst in everyday work • Knowledge Authoring Tools to assist in: • Research for reports • Produce reports • Consume reports • Share reports • Our solution: Semantic Web Templates

  3. Semantic Web Templates • Knowledge Representation, Semantics are key for information exchange • Creation, maintenance of knowledge must be transparent • Automate extraction of knowledge • Enhance knowledge retrieval methods

  4. Semantic Web Templates • Similar to MS Word Templates • Different templates for different tasks • Word templates can have restrictions on text • Very primitive, such as length of text • Simplistic patterns such as “phone number” • No concepts such as “color” or “country” • One template, many documents • HTML templates are very common today • Many web sites use SQL database as back end, template + SQL  HTML

  5. Semantic Web Templates • An HTML file with additional tags • Tags specify: • Where particular knowledge is stated • What kind of knowledge it is • Where it came from, if applicable • References to an entity or relation • Repetitive regions of text

  6. Goal: Assist Research • Unstructured Extraction • Sort through buckets of data to find gold • Entity recognition • Relation recognition • Semistructured Extraction • Utilize repetitive patterns within a page • Use similar pages to extract more data • Robust despite changing pages, data

  7. Unstructured Extraction • Natural language processing • News feeds • Indexing, storage, retrieval • Plugin architecture • Web Services • Our system, collaboration with IBM via NIMD • Rover news crawler • Political news articles from Yahoo! • 22,000 articles, ~8500 concepts, ~1000 relations • Used in authoring tools

  8. Unstructured Extraction • Pattern based system • Leverage “hints” for the reader in news articles • British Prime Minister Tony Blair • <type Country><subClassOf Politician> <unknown name> • “Tony Blair” is a Prime Minister who represents the Country “England”. • System runs daily on Yahoo political news • Highlights known terms in green • Highlights new terms in red • Used to create search index, maintain KB • Demo

  9. Semi-structured Extraction • Extract, produce knowledge • Initial model is Domain Authorities • Enhance KB with ground facts • Strong for relations and breadth of data • Leverages work of others • Makes use of SQL databases • Future work is wide-scale web of trust

  10. Semi-structured Extraction • Site Registry • By description and property • CIA World Fact Book has data about items which are of type <Country> • CIA World Fact Book has properties <population>, <hasNeighbor>, <hasMembership>, etc. • Demo

  11. Semi-structured Extraction • Publishing • Human editing good for high-level concepts • Automated techniques good for relations, ground level facts, and massive repetition • Rover web crawler • Template construction is currently manual • With critical mass of data, templates could be discovered.

  12. Enhanced Document Retrieval • Enhanced document retrieval • Search based on concept • Find articles about… • Membership: Scottie Pippen  Trailblazers • Membership: Osama bin Laden  al-Qaeda • Subgroups: • Ramadan Shallah  Islamic Jihad  al-Qaeda • Semantic search

  13. Enhanced Document Retrieval • Document Augmentation • Sidebar acts as glossary as you read • Pre-fetch data user is likely to want • Adapt to user preferences, activities • Deeper understanding for user, gets answers to questions raised while reading

  14. Enhanced Document Retrieval

  15. Search Augmentation • Google assumes users only want documents • Provide answers along with documents • Use query term denotation to more closely target results • “Browns Ferry” is a garden park • “Browns Ferry” is a nuclear power plant • Automates what people do with IR systems • Append hints about the type of term being sought

  16. Search Augmentation

  17. Search Augmentation • Demo: Basic Search • Demo: Followup Data • Demo: Disambiguation • Demo: Relations

  18. Basic Question Answering • Automated techniques for ground facts • Use reasoners for higher-level facts • Tie in with KSL AQUAINT work • Feedback, direction from user • Structure of knowledge allows simple form of question answering

  19. Basic Question Answering • Multiple views into data • Browse interface • Ugly, but complete view • Activity-based knowledge presentation • Search, document augmentation • Future work accept user feedback, customization, preferred sources

  20. Basic Question Answering • Query by example • Users create many similar documents • These are targeted to an activity • Use past work to speed present work • User creates and templates which present data they find interesting in a way they find convenient

  21. Query by Example

  22. Query by Example

  23. Query by Example

  24. Goal: Produce Reports • Most reports are made with Office • Word processor, spreadsheet • Enhance with semantic awareness • Provide seamless access to knowledge • Transparent maintenance, creation • Low overhead of operation • Avoid centralized approach • Contrast with relational database

  25. Word Processing • Creation of new data • Semantic scan • Like spell check or grammar check • Automatically identifies referenced entities • Learns new entities, relations between entities • Annotation of text • User manually adjusts system • User adds new data • System gets smarter over time

  26. Word Processing • Create data via entry into templates • Create new templates • For others • For personal use • Extend templates with new entry areas • Enhance analyst’s view • Semantic Search, Document Augmentation • Sidebar boxes are templates too

  27. Word Processing • Demo: Semantic Scan • Demo: Annotation • Demo: Knowledge Creation

  28. Spreadsheets • Spreadsheets are key tools in analysis • Tabular format, UI are both intuitive • Sorting, basic math functions • We add semantics: • New formula type: “Get Data” • New formula type: “Put Data” • Summarization, new views

  29. Spreadsheets • Example scenario • Suppose SARS was found to affect Asian-Americans more than others? • Analyst wants to determine, based on that, which states are most at risk • Knowledge from Census tells us Asian-American population as a percentage

  30. Spreadsheets

  31. Spreadsheets

  32. Spreadsheets

  33. Spreadsheets

  34. Spreadsheets

  35. Spreadsheets

  36. Goal: Consume Reports • Verify others’ data against yours • Incorporate others’ results into your knowledge base, track sources • Maintain data • Change notification • Document updates with new data • Versioning of documents, data

  37. Goal: Share Reports • Easily exchangable via e-mail • Truth maintenance techniques • Multiple views into data • Leverage domain expertise • The missile guy has a KB, … • Collaboration, trust levels • Colleagues disagree, sources are unreliable

  38. Conclusion • KD-D effort is focused on authoring, analysis tasks • Leverage automated techniques to complement manual techniques • System gets smarter as it’s used • Tie in with commonly used applications

More Related