1 / 19

Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014

Next Generation Information Extraction (NGIE) in Multilingual Environment ( collaborative project with TCS). Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014. NGIE Project Overview. The goal of the project is to develop Next Generation Information Extraction Technology

haley
Download Presentation

Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Next Generation Information Extraction (NGIE) in Multilingual Environment(collaborative project with TCS) Pushpak Bhattacharyya CSE Dept. IIT Bombay 19 May, 2014

  2. NGIE Project Overview • The goal of the project is to develop Next Generation Information Extraction Technology • The IE environment will be multi lingual • Involves Machine Translation and Cross Lingual Search • The IE focus is on relation extraction, named entity extraction, multi word extraction, semantic role labeling, corpus management • Relation and name extraction are to be jointly done since they are synergistic. (CEO_of is a relation between Person and Organization) • The fruits of this research is to be carried to TCS IE environment called INX • High quality publications in IE, in all the above tasks and combinations thereof

  3. Principal Investigators and other members 1. Mr. Girish Palshikar, Principal Scientist, Systems Research Lab, Tata Consultancy Services Limited 2. Dr. Pushpak Bhattacharyya, Professor, Department of Computer Science & engineering, IIT Bombay 3. Other members- Rohit Bangera, Sachin Pawar, Rudra Murthy, Girish Ponkia, Ravi Soni, Manish Shrivastava, Diptesh Kanojia, Gajanan Rane

  4. NGIE Project accomplishments (1/3) 1. Relation Extraction: A relation extraction system has been built which can extract entities from natural language sentence and identify relationships among them. Following papers have been published: • Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Semi-supervised Relation Extraction using EM Algorithm, International Conference on NLP (ICON 2013), Noida, India, 18-20 December, 2013 • Sachin Pawar, Pushpak Bhattacharyya and Girish Keshav Palshikar, Improving Relation Extraction Using A Joint Model of Entities and Relations , under revision.

  5. Relation Extraction: Joint Model of Entities and Relations • E1, E2 : Types of the first and second entity mentions • R : Type of the Relation between two mentions • F : Feature Vector capturing characteristics of the entity mentions and how they occur in the sentence • Can be used in- • Semi-supervised mode : F, E1,E2 known, R unknown, EM algorithm is used for learning the model parameters. • Supervised mode : F, E1, E2 and R are known while learning

  6. Relation Extraction: Example • Input sentence : Patricia Newell, an organizer for Nader at the University of Florida in Gainesville, said that Nader had won far fewer votes in Florida than hissupporters had expected. • Entity Mentions Extracted : • PERSON - Patricia Newell, organizer, Nader, Nader, his, supporters • ORGANIZATION - University of Florida • GPE (Geo-Political Entity) – Gainesville, Florida • Relations Extracted :

  7. NGIE Project accomplishments (2/3) 2. Multiword Extraction: Identifying and Extracting multi words using deep learning (multilayered neural networks) • Paper submitted to COLING 2014 (Ireland): Rahul Sharnagat, Rudra Murthy V, Dhirendra Singh, Pushpak Bhattacharyya,  Identification of Multiword Named Entities using Co-occurrence Statistics and Distributed Word Representation.

  8. MWE situations • Machine Translation • यूक्रेनकीसेनानेक्रीमियाकेसीमावर्तीइलाकोंमेंअपनाडेराडालदियाहै। • Natural Language Generation • Good said or Well said? • Baby chaning room (what is changed?)

  9. Challenges of MWE • ಈ ಕೆಲಸವುಕಬ್ಬಿಣದಕಡಲೆಯೇಸರಿ • Transliteration: I kelasavukabbiNadakaDaleyE sari • Gloss: this job iron nut correct • Translation: This job is a hard nut to crack • Google: This work is strong meat • ಯಾರಹತ್ತಿರವೂಕೈಚಾಚಬೇಡ • Transliteration: yArahattiravUkaichAchabEDa • Gloss: which near hand no extend • Translation: Do not ask help from anybody • Google: Whose ever hand cacabeda

  10. MWE Extraction Taxonomy MWE Extraction Techniques Rule Based Empirical Statistical Measures Based Similarity based Thesaurus based Distributional Word Representation

  11. MultiWord Extraction Process • Artificial Neural Networks(ANN) successfully applied to various Natural Language Processing tasks • ANNs able to capture the semantics of the word • Use ANNs to extract MWE from the text: Deep Learning

  12. SampleWord Representation (for Hindi)

  13. MWE Extraction Engine Screenshots

  14. MWE Extraction Engine Screenshots

  15. NGIE: additional outcomes

  16. Multi Lingual POS Projection: HMM Results with Hindi as Helper • HMM trained on Hindi • Tested on Hindi words aligned with source Language words

  17. NGIE outcome: Parallel Corpora Management Workbench tool: Screenshot

  18. Summary • Project goal: Advanced IE in Multilingual setting • Involves Machine Translation and Search too • Sophisticated machine learning techniques like Markov Logic Network, Deep Learning etc. to be used for NLP • The incumbent will get into depths of ML and NLP with active support for existing project work • Expectation: day to day project work, attending research evaluation meetings around the country, publish, create downloadable resources and tools

  19. Thank you Lab URL: http://www.cfilt.iitb.ac.in My URL: http://www.cse.iitb.ac.in/~pb

More Related