1 / 41

Watson Systems

Watson Systems. By- Team 7 : Pallav Dhobley 09005012 Vihang Gosavi 09005016 Ashish Yadav 09005018. Motivation:. Deep-Blue’s Triumph over Kasparov in 1997. In search of new challenge. Jeopardy!. 2004 – Search ends! One of the most popular Quiz show in U.S.A.

walker
Download Presentation

Watson Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Watson Systems By- Team 7 : PallavDhobley 09005012 VihangGosavi 09005016 AshishYadav 09005018

  2. Motivation: • Deep-Blue’s Triumph over Kasparov in 1997. • In search of new challenge.

  3. Jeopardy! • 2004 – Search ends! • One of the most popular Quiz show in U.S.A. • Broad/Open Domain. • Complex Language. • High Speed. • High precision. • Accurate Confidence.

  4. Jeopardy! • 2004 – Search ends! • One of the most popular Quiz show in U.S.A. • Broad/Open Domain. • Complex Language. • High Speed. • High precision. • Accurate Confidence. *le IBM

  5. Easier than playing Chess? • Chess: • Finite moves and states. • Mathematically well defined • search space • Symbols have mathematical • meaning • Natural Language: • Implicit • Highly Contextual • Ambiguous • Imprecise

  6. Easier than playing Chess? NO!! • Chess: • Finite moves and states. • Mathematically well defined • search space • Symbols have mathematical • meaning • Natural Language: • Implicit • Highly Contextual • Ambiguous • Imprecise

  7. Easy Question (LN(1,25,46,798*π))^3 / 34,600.47 = ?

  8. Easy Question: (LN(1,25,46,798*π))^3 / 34,600.47 = 0.155

  9. Hard Question: • Where was our “father of nation” born? - contextual. - imprecise. • Easy for us Indians to relate term “father of nation” with M.K. Gandhi. • Not the same with computers. • Need of learning from As-Is content.

  10. Learning the As-Is text (NLP):

  11. What is Watson? • Advanced Search Engine? × • Some fancy Database Retrieval System? × • Beginning of Sky-Net? × • Science behind an Answer? √

  12. DeepQA

  13. Principles of DeepQA: • Massive Parallelism - Each hypothesis and interpretation is analyzed independently in parallel to generate candidate answers. • Many experts - Facilitate the integration and contextual evaluation of a wide range of analytics generated by several algorithms running in parallel.

  14. Principles of DeepQA (ctd.) • Pervasive Confidence Estimation - No component commits to an answer • Integrate shallow and deep knowledge - Using shallow and deep semantics for better precision e.g. Shallow semantics : Keyword matching Deep semantics : Logical Relationships

  15. Shallow Semantics:

  16. Deep Semantics:

  17. How does Watson Learn?

  18. Step 0 : Content Acquisition • Identifying and gathering the content to be used for answering and evidence supporting. • Involves analyzing example questions from the problem space which consists of Q-A from previous games. • Encyclopedias, dictionaries, wiki pages etc. are use to make up the evidence sources. • Extract , verify and merge the most informative nuggets as a part of content acquisition.

  19. Step 1 : Question Analysis The initial analysis that determines how the question will be processed by the rest of the system. • Question Classification e.g. puzzle/math • Focus and (Lexical Answer Type)LAT e.g. “On this day” LAT – date/day • Relation Detection e.g. sea(India, x, west) • Decomposition - divide and conquer.

  20. Step 2 : Hypothesis Generation • Primary search : • Keyword based search • Top 250 results are considered for Candidate Answer generation. • Empirical statistics : 85% time answer is within top 250 results. • CA generation : above results are further processed for CA generation. • Soft Filtering • It reduces set of candidate answers using superficial analysis (machine learning). • Reduction in number of CA to approx. 100 • Answers are not fully discarded , may be reconsidered at final stage.

  21. Step 2: Hypothesis Generation (ctd.) 4. Each CA plugged back into the question is considered a hypothesis which the system has to prove correct with some threshold of confidence. 5. If failed at this state , system has no hope of answering the question whatsoever. • Noise tolerance.

  22. Step 3 : Hypothesis & evidence scoring • Evidence retrieval : • Further evidences are gathered to support the Hypothesis formed in last step . e.g. Passage search: gathering passages by adding CA to primary search query. • Scoring: • Deep content analysis • Determines degree of certainty that retrieved evidence supports the CA.

  23. Step 4 : Final Merging and Ranking • Merging: • Merging all the hypothesis which give you the same answer. • Using an ensemble of matching, normalization and co-reference resolution algorithms, Watson identifies equivalent and related hypothesis. • Ranking and confidence estimation: • The final set of hypothesis after merging are ran over set of training questions with known answers.

  24. Example : • Q : “Who is the antagonist of Stevenson's Treasure Island?” • Step 1 : Parse and generate a logical structure to describe the question. -antagonist(X) -antagonist_of(X, Stevenson’s TI) -adj_possesive(Stevenson, TI)

  25. Example (ctd.): • Step 2: Generating semantic assumptions - island (TI) -book(TI) - movie(TI) -author(Stevenson) -director(Stevenson) • Step 3:Builds different semantic queries based on phrases, keywords and semantic assumptions. • Step 4 : Generates 100s of answers based on passages, documents and facts returned from 3. Long-John Silver is likely to be one of them.

  26. Example (ctd.): • Step 5:Formulate evidence in support or refutation. (+VE) evidence : 1. Long-John Silver the main character in TI. 2. The antagonist in Treasure Island is Long-John Silver 3. Treasure Island, by Stevenson was a great book. (-VE) evidence : Stevenson = Richard Lewis Stevenson antagonist = Wolverine

  27. Example (ctd.): • Step 6: - Combine all the evidence and their scores. - Analyze evidences to compute confidence and return the most confident answer. Long-John Silver in this case !

  28. Watson- Performance:

  29. Watson’s Brain (Software): • Languages used : Java , C++ , prolog. • Apache Hadoop framework for distributed computing. • Apache UIMA framework. • Helps in DeepQA’s demand for Massive Parallelism. • Facilitated rapid component integration, testing , evaluation • SUSE Linux Enterprise Server 11

  30. Watson’s Brain(Hardware): • One Jeopardy! Question takes 2hours on normal desktop computer! • The real task - Confidence determination before buzzing. • High Time need of faster Hardware support.

  31. Watson’s Brain: (ctd.) • Total Ninety POWER-750 servers. • Total 2880 POWER7 processor cores. • Total 16 Terabytes of R.A.M. • Each POWER-750 server uses a 3.5 GHz POWER7eight core processor, with 4 Threads per core. • Size of total 8 refrigerators. • Can process data up-to the speed of 500 GB/s.

  32. Watson’s Brain: (ctd.)

  33. Watson – Runtime Stack

  34. The Final Blow! • 3 rounds of Jeopardy! Between Watson , Rutter& Jennings. • Watson comprehensively defeats it’s competitors with net score of $77,147 • Jennings managed $24,000. • Rutter ended third with $21,600.

  35. The Final Blow! (ctd.) “I for one welcome our new computer overlords” - Jennings

  36. Conclusion: • High performance analytics • Non-cognitive • Smart Learner • Not invincible

  37. Watson & Suits • Tech support • Knowledge management • Business Intelligence • Improvised Information sharing

  38. Watson for society- Health Care • Symptoms • Patient Records • Tests • Medications • Notes/Hypothesis • Texts, Journals Diagnosis Models Finding appropriate “Disease” , As per Asked by adjoining “Symptoms” and “Records”

  39. References: • Watson Systems: http://www-03.ibm.com/innovation/us/watson/ • Wiki Page http://en.wikipedia.org/wiki/Watson_%28computer%2 • Research Papers: http://researcher.ibm.com/researcher/view_page.php?id=2121

  40. References: • Jeopardy! IBM Watson Day 1 (Feb 14, 2011) http://www.youtube.com/watch?v=seNkjYyG3gI&feature=related • Science Behind an Answer- http://www-03.ibm.com/innovation/us/watson/what-is-watson/science-behind-an-answer.html • The AI magzine http://www.aaai.org/ojs/index.php/aimagazine/article/view/2303

  41. References: • Philip Resnik. 1999.Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research. • Tom M. Mitchell. 1997. Machine Learning. Computer Science Series. McGraw-Hill.

More Related