1 / 29

LING 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics. Lecture Notes January 26th. Penn Treebank. Bracketing guidelines. Ungraded Homework Exercise. Search for NP trace relative clauses as defined below:. Be ready to c ompare search pattern and number f ound next time in class.

aderes
Download Presentation

LING 581: Advanced Computational Linguistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 581: Advanced Computational Linguistics Lecture Notes January 26th

  2. Penn Treebank Bracketing guidelines

  3. Ungraded Homework Exercise • Search for NP trace relative clauses as defined below: Be ready to compare search pattern and number found next time in class

  4. Ungraded Homework Exercise @NP < @NP < @SBAR 12038

  5. Ungraded Homework Exercise @NP < @NP < @SBAR plus WH indices 10956 down from 12038

  6. Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) 529 Note -NONE- < *ICH*

  7. Ungraded Homework Exercise

  8. Ungraded Homework Exercise Not all @NP < @NP < (@SBAR < /^-NONE-/) are relative clauses

  9. Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* count drops from 529 to 166

  10. Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* Is 166 too low? How about other -NONE- nodes?

  11. Ungraded Homework Exercise

  12. Ungraded Homework Exercise • Final tally

  13. Homework Exercise Use the bracketing guides and choose three “interesting” constructions Find all occurrences in the WSJ PTB

  14. Homework Exercise • 581 Homework rules • Due next lecture • Present your findings in class (slides)

  15. Parsing … from Treebank search to stochastic parsers trained on the WSJ Penn Treebank

  16. Bikel Collins • Java re-implementation of Collins’ parser • Paper • Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) 
in Computational Linguistics, 30(4), pp. 479-511. • http://www.cis.upenn.edu/~dbikel/papers/collins-intricacies.pdf • Software • http://www.cis.upenn.edu/~dbikel/

  17. Bikel Collins • Download and install Dan Bikel’s parser • File: install.sh • Java code • but at this point I think Windows won’t work because of the shell script (.sh) • maybe after files are extracted?

  18. Bikel Collins • Download and install the POS tagger MXPOST parser doesn’t actually need a separate tagger…

  19. Bikel Collins • Training the parser with the WSJ PTB • See guide • http://www.cis.upenn.edu/~dbikel/download/dbparser/guide.pdf directory: TREEBANK_3/parsed/mrg/wsj chapters 02-21: create one single .mrg file events: wsj-02-21.obj.gz

  20. Bikel Collins • Settings:

  21. Bikel Collins • Parsing • Command • Input file format (sentences)

  22. Bikel Collins • Verify the trainer and parser work on your machine

  23. Bikel Collins • File: bin/parse is a shell script that sets up program parameters and calls java

  24. Bikel Collins

  25. Bikel Collins • File: bin/train is another shell script

  26. Bikel Collins • Relevant WSJ PTB files

  27. Bikel Collins • If you have tcl/tk installed, I use a wrapper to call Dan Bikel’s code makes it easy to work the parser without memorizing the command line options

  28. Bikel Collins • For tree viewing, you can use tregex For demos, I use my own viewer

  29. Bikel Collins • POS tagging (MXPOST, in directory jmx) • tagger_input • $prefix/jmx/mxpost $prefix/jmx/tagger.project < /tmp/test.txt 2> /tmp/err.txt • Parsing • set ddf "wsj-02-21.obj.gz” • set properties "collins.properties" • parser_input • $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout • Training • set mrg "wsj-02-21.mrg” • set properties "collins.properties" • $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout Unix file descriptors 0 Standard input (stdin) • Standard output (stdout) • Standard error (stderr) GUI components frame .input text .input.t -height 4 -yscrollcommand {.input.s set} scrollbar .input.s -command {.input.tyview} frame .tagged text .tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar .tagged.s -command {.tagged.tyview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile }

More Related