1 / 43

L.A.S.I.

L.A.S.I. Linguistic Analysis for Subject Identification. Feasibility Presentation Presented by: CS410 Red Group. November 12, 2012. Outline. Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk

zed
Download Presentation

L.A.S.I.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. L.A.S.I. Linguistic Analysis for Subject Identification Feasibility Presentation Presented by:CS410 Red Group November 12, 2012

  2. November 12, 2012 Outline • Team Red Staff Chart • Introduction • Societal Problem • Case Study • Proposed Solution • Major Component Diagram • Algorithm • The Competition • Risk • Conclusion

  3. 410 Red Group November 12, 2012 Team Red Staff Chart Scott Minter Project Co Leader Software Specialist Brittany Johnson Project Co Leader Documentation Specialist Dustin Patrick Algorithm Specialist Expert Liaison Richard Owens Documentation Specialist Communication Specialist Erik Rogers Marketing Specialist GUI Developer Aluan Haddad Algorithm Specialist Software Specialist

  4. 410 Red Group November 12, 2012 What is a theme?

  5. 410 Red Group November 12, 2012 A specific and distinctive quality, characteristic, or concern.1 1“Theme” Merriam Webster

  6. 410 Red Group November 12, 2012 What are you looking for when you are identifying a theme?

  7. 410 Red Group November 12, 2012 5 W’s & 1 H • Who • What • When • Where • Why • How

  8. 410 Red Group November 12, 2012 Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove.

  9. 410 Red Group November 12, 2012

  10. 410 Red Group November 12, 2012 The Theme from the 5 W’s & 1 H Bill drove to the store yesterday to buy a new stove because his broke.

  11. 410 Red Group November 12, 2012 Why are themes important? • Comprehension • Summarization • Assists in communication between people

  12. 410 Red Group November 12, 2012 Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner.

  13. 410 Red Group November 12, 2012 How long does it take? • Finding a theme over multiple documents is a time-consuming process. • The average reading speed of an adult is 250 words per minute.2 2Thomas "What Is the Average Reading Speed and the Best Rate of Reading?"

  14. 410 Red Group November 12, 2012 Consistency and Objectivity • The criteria for evaluation may vary from person to person. • Large quantities of documents must be mentally digested, assessed, and interrelated.

  15. 410 Red Group November 12, 2012 Dr. Patrick Hester “My research interests include multi-objective decision making under uncertainty, probabilistic and non probabilistic uncertainty analysis, critical infrastructure protection, and decision making using modeling and simulation.” 3 - Dr. Hester Ph. D. from Vanderbilt University, 2007 Major: Risk and Reliability Engineering and Management 3Patrick Hester Website

  16. 410 Red Group November 12, 2012 • Dr. Hester is a systems analyst and researcher • He Must • Conduct extensive research • Quickly become familiar with client systems • Formulate concise, objective assessments • LASI will help with all of this

  17. 410 Red Group November 12, 2012 Assessment Improvement Design (A.I.D.) • Preliminary Problem statement Identified from document • Problem statement then used to find Critical Operational Issues (COI’s) • COIs used to find Measures of Effectiveness (MOE’s) • MOE’s used to find Measures of Performance (MOP’s)

  18. 410 Red Group November 12, 2012 Current Method Continue on to the rest of the A.I.D Process Customer Contact yes Is Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process Document Analysis no Client Goes Elsewhere

  19. 410 Red Group November 12, 2012 LASI: Linguistic Analysis for Subject Identification THEMES LASI

  20. 410 Red Group November 12, 2012 Our Proposed Solution • LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: • accurately find themes • be system efficient • provide consistent results

  21. 410 Red Group November 12, 2012 What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning.

  22. 410 Red Group November 12, 2012 Linguistic analysis involves Syntactic Semantic • Logical grammar • Statistical Data • Alphabetical Frequencies • Word Counts • Parts of Speech • Word Dependencies • Relating syntactic structures to language-independent meanings • Extracting meaning and conceptional arguments • Summarization

  23. 410 Red Group November 12, 2012 The Wills and Will Nots of LASI What LASI Will Do What LASI WillNot Do • Analyze multiple documents to find common themes • Provide statistical data to help a user make a decision • Provide a concise synopsis • Provide a single theme

  24. 410 Red Group November 12, 2012 Who Would This Appeal To? • Researchers • Consultants • Academics • Students

  25. 410 Red Group November 12, 2012 Benefits To The Customer • Time saving • Objective output • Consistent output • Cost saving solution

  26. 410 Red Group November 12, 2012 How does LASI fit into our Case Study?

  27. 410 Red Group November 12, 2012 Before LASI Customer Contact Continue on to the rest of the A.I.D Process yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process Document Analysis no Client Goes Elsewhere

  28. 410 Red Group November 12, 2012 After LASI Customer Contact Continue on to the rest of the A.I.D Process yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? yes Document Gathering Process LASI Aided Document Analysis no Client Goes Elsewhere

  29. 410 Red Group November 12, 2012 Major Functional Components Hardware Software Algorithm: Extrapolates the most likely congruence of themes and ideas across all documents in the input domain • High End Notebook PC • - Computation • Quad-Core CPU • - Primary Memory • 8.0 GB DDR3 RAM • - Document Storage • Solid State Storage • ~$1500 USD User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification

  30. 410 Red Group November 12, 2012 Linguistic Analysis Algorithm Primary Analysis: Word Count and Syntactic Assessment Tertiary Analysis: Semantic Relationship Assessment Secondary Analysis: Associative Identification Traverse Document in Word-Wise Manner Bind Pronouns to Nouns, Updating Frequency Identify Potential Synonyms Assess Potential Subject-Object-Verb Relationships Identify Corresponding Parts of Speech Bind Adjectives to Nouns Output List of Weighted Themes Determine Frequency by Grammatical Role Identify Potential Noun Phrases

  31. November 12, 2012 Milestone diagram

  32. 410 Red Group November 12, 2012 The Competition

  33. 410 Red Group November 12, 2012 The Competition

  34. 410 Red Group November 12, 2012 WordStat

  35. 410 Red Group November 12, 2012 Stanford CoreNLP

  36. 410 Red Group November 12, 2012 ReadMe

  37. 410 Red Group November 12, 2012 Automap

  38. 410 Red Group November 12, 2012 Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling

  39. 410 Red Group November 12, 2012 Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and user friendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning

  40. 410 Red Group November 12, 2012 Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text

  41. 410 Red Group November 12, 2012 Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods

  42. 410 Red Group November 12, 2012 • Conclusion • LASI is feasible. • LASI is a decision support tool not a decision making tool. • Implications of success affect a wide area of study and professions. • In order for LASI to succeed the output needs to immediately usable and the interface user-friendly.

  43. 410 Red Group November 12, 2012 References • "Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct. 2012. <http://www.merriam-webster.com/dictionary/theme >. • Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct. 2012. <http://www.healthguidance.org/entry/13263/1/What-Is-the-Average- Reading-Speed-and-the-Best-Rate-of-Reading.html>. • “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012 <http://www.odu.edu/directory/people/p/pthester>. Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>. ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>. “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/readme.pdf> "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>. "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>. "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.

More Related