1 / 43

Java Development for HLT

Java Development for HLT. Lars Degerstedt Linköping university, IDA larde@ida.liu.se Towards available and useful NLP software. This Lecture. 1st hour - Course introduction purpose motivation course overview relevance for NLP 2nd hour – experiences of NLPLAB. Aim of the Course.

ferris
Download Presentation

Java Development for HLT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Java Development for HLT Lars Degerstedt Linköping university, IDA larde@ida.liu.se Towards available and usefulNLP software

  2. This Lecture • 1st hour - Course introduction • purpose • motivation • course overview • relevance for NLP • 2nd hour – experiences of NLPLAB

  3. Aim of the Course • Use of the Java platform for NLP • Experience from software design • Experience of mainstream techniques How can basic NLP research lead to products?

  4. Trad. NLP(LAB) Results • Larger projects slower growth/person • Conflict between results: • paper or code? • Subexpert vs. holistic view ”the GUI is not important” • Code is (at best) stable but not mature. Even less ”useful”.

  5. Closed architecture Weak on software methodology No differentiation of users Difficult to use Difficult to integrate Unclear in functionality No reuse Weak maintenance Imposes new formalisms A lot of bugs... Weak points of NLP Software

  6. Waterfall methods Little real usage during development think-a-year then code-a-week prolog, lisp, java… ”No research value” Large projects Subspecialists Lack of programmers Weak points of NLP Development

  7. So, What are the Solutions? • Use commercially available technique: • what can we learn from industry? • Global cooperation on code-level: • join mainstream technology? • Adjust our working methods: • how do we better interact with society/real usage?

  8. Selected Course Topics • Lecture 2 - Java: • Language and platform • Lecture 3 – object-oriented design: • Basic concepts/techniques • Lecture 4 – design patterns: • Extremely useful architectural techniques

  9. History of OO-related Concepts (My View) Component Systems Iterative Development/ software evolution Prog. in the large System architecture Operating system design/ scripts Components Subsystems/ modules OO Frameworks Design patterns Objects Object-oriented design Web-centered Development/ Open Source Interfaces/APIs Protocols Contracts Code-level design Formal specification Idioms High-level languages Declarative languages Extensive free libraries Prog. in the small Time of creation 60’s 70’s 80’s 90’s This is just a sketch!

  10. What is Java? • In short: C++ syntax, byte code, platform-neutral… • High-level platform: • Unix/C is more low level… • Easy access beyond the desktop... • Sub-platforms: J2SE, J2EE, J2ME, JINI… • Buzz: security, connectivity, heterogenuous, multimedia,Swing, XML, beans, distributed…

  11. Why use Java for NLP? • HL-quality information available. • Mature community: free code! • Rewrite for Java - not C… • Utilities: sound, 3d graphics, xml… • Integration with industry. • Joining the OO-movement.

  12. What is ”software development”? One Project View Evolutionary Process View Project Project Analysis Specification Design Project Implementation Evaluation Testing Project Project ”Development in the Small” ”Development in the Large”

  13. What is ”Design”?(Part 1: The Ws) • What: software units, ui, interaction, language • When: role in dev. model, time constraints • Who (by/for): product-design, linguistics, hackers • Where:organization, legasy, single/multiple project • Why: internal/external readers/publication

  14. What is ”Design”?(Part 2: Definition) • Theory of something (not everything) • Design is ”sold” (not ”proven”) • ”Defines” the system, rather than realizes it: • Partitioning of ”the system” • Contracts for the interaction between the parts • Design phase result: a specification. E.g. • Interfaces/APIs (with comments) • documents • Conceptual prototype • Intertwined concepts:architecture, development, requirements analysis/capture, implementation

  15. What is ”Object-Oriented Design”? Design in the Small • Object-Oriented Modeling • Finding ”the objects” • Domain and artefact models: problem vs domain • Taxonomy and aggregation • Real-life mapping/customer satisfaction: ui prototypes and scenarios • Object Interface Design: • Abstract Data Types (ADTs): data+op, information-hiding, ... • Object Roles: information, system, passive, active, ... • Object as Machines: state+method, orthogonal methods,...

  16. What are ”Design Patterns”? Design in the Middle • Micro architecture - abstract designs. • not a concept - a catalogue! • Useful: reuse of successful design. • Used: abstracts from experience. • Usable: includes coding details.

  17. Why Design Patterns for NLP? Design patterns are truly useful!! • Fill a gap between library modules and system architectures. Patterns are open-ended, not straight-jackets. • Codify the (oo) design expertice. • Open question: How do NLP design patterns look like?

  18. This is a Project Course! • Use your own code - • write code you would want to use. • Not a basic programming course. • Creative ideas but concrete results. • Write ”reusable” (=generic enough) code – try to reuse when possible

  19. Course Examination • Individual examination: • cooperation is encouraged. • Two parts: term paper and project • Term paper (1/3): 75 hours • Project (2/3): 125 hours • Metrics for finished project: • Two iterations (with deliverables) • Well-designed code (document how/why)

  20. Course Literature • Recommended readings: See the course pm at GSLT course page • Recommended book to buy: Erich Gamma et al. Design Patterns: Elements of Reusable Object-Oriented Software, Addison-Wesley 1994 • Further readings: Stefan Sigfried, Understanding OO Software Engineering, IEEE Press 1996 Clemens Szyperski, Component Software: Beyound Object-oriented Programming, Addison-Wesley 2001. www.javaworld.com

  21. Related NLP activities • nlpFarm and openNLP NLP OSS development and platform • GATE 2 tool-box for NLP processing • SVENSK Swedish NLP platform • NLSR NLP software registry (DFKI)

  22. nlpFarm • An OSS Java-software at SourceForge. • Farmstead mission: ”A place whereearly research prototypes evolve into robust and useful open source.” • practical work towards useful things • Global/Scandinavian cooperation? • Will nlpFarm work? It is an OpenNLP experiment sponsored by Vinnova…

  23. SVENSK • Language processing tool-box for Swedish. • Reuse of existing NLP components. • Based on the GATE architecture. • Its successor Kaba: for information access and refinement only.

  24. GATE 2 • GATE: document manager, gui, components. • Installed at > 250 sites. • GATE 2: rewritten in Java • A platform for Language engineering. • Broad range of packages: • gate.sgml, gate.swing, gate.email…

  25. This Lecture (2nd Hour) • 1st hour - Course introduction • 2nd hour – Experiences of NLPLAB • Evolutionary Process Model • Iterative Method • BirdQuest • TvGuide • nlpFarm

  26. NLPLAB Projects of Today TvGuide BirdQuest TvGuide App: Quaks, JavaChart, PGP,... QUAC, DM, FS, TGEN, Guidia,... MOLINC, FUNs, JavaChart, ... Facility: 2 persons 5 persons 4 persons Iterative, incremental with free evolution (mixing bottom-up and top-down design)

  27. Evolutionary Process Model for NLP/LE Application Artefacts p/n p/n p Artefact Construction Theory n Language Modeling p n p/n p/n Facility artefacts p = possibilties n = needs Multi-dimensional approach to NLP/LE development – avoid one sided approaches!

  28. Issues in Evolutionary Design • Iterative and Incremental Design • Robust for change; formal revisions • Refactorings • Respect of Legacy – boththeory and code • ...but don’t be a slave under it! • Free evolution of design • Mixed bottom-up and top-down design • Multiple-project approach • Use feedback (both pos. and neg.) seriously! • It is a bumpy ride! • ...sometimes improvements make it worse!

  29. DS theory DS req. specification Conceptual design DS design DS framework Other modules Framework customisation DS module Application-Driven Dialogue System Development

  30. Two Problems… • Too much time is devoted to discussions on features of the system that are interesting but often rare and hard to realise • it is not easy to subdivide the work with design and implementation into manageable pieces when developing a dialogue system. How does the incremental evolution path of a dialogue application look like?

  31. X X X X X X X X X X X X X X X X X X X X X X X X X X X Development space for DM DM Framework Customisation DM Capabilities Tools Sub-dialogue control Framework templates History Code patterns Atomic request handling DM Design Knowledge representation Modularisation Interfaces

  32. BirdQuest = Two GUIs + Phase-Based Design • Bird encyclopaedia • Corpus with user questions • Dialogue systems framework

  33. Client-Server Design for BirdQuest Application UI Layout Bird Database UI & Feature Code JDBC Server Code Server Code JDBC HTTP Browser Web Server RMI RMI Server Server Code UI Layout Web Servlet (UI & Feature Code)

  34. Phase-based NLP of BirdQuest

  35. Language Modeling TvGuide = Evolutionary Re-design System Develop- Ment (round 1) Application Artefact Facility Software Evaluation + Dialogue Model Re-design System Develop-ment (round 2) Application Artefact Facility Software

  36. Encapsulation for TvGuide (non-strict) Layering Subsystems Application Components ? Framelets/Tools Libraries KR?

  37. Peer-to-peer Model TCP/IP Protocol Stack Application Peer A Transport Peer B Internet Host-to-Net Summing up: Two Basic Design Dimensions Splits the problem! Problem Division ”Horizontal Design” Parsing Access Generate Sign of Success: ”High cohesion and Low coupling between modules” Agenda List Array Abstraction/Layering ”Vertical Design” Creates a language!

  38. http://nlpfarm.sourceforge.net • Public web resource with open source • ”A place to work” • Cooperation over time and place • Development support • Mostly facility software • Formal release system • Towards robust and useful code • Link between research and industrial products

  39. Experiences from nlpFarm - Method • Separate application from facility • Different structure and methods • Interdependent artefacts – needs and possibilites • Variation of evolutionary approach, e.g. • Bottom-up vs top-down • Theory vs code • Background of personel / type of result • Discriminate beginners from experts • Newbies have creative ”eyes of a child” • Experts should focus on ”hidden” continuity work • Software experts should make the overall design • Don’t work alone – find feedback

  40. Construct Interface Implement and use the module Information Hiding Refactoring Experiences from nlpFarm - Design Facility Software Library modules Framelets 1. Non-strict Layering 2. Work bottom-up with real applications 3. Add code only 4. Design patterns in kernel 5. Inheritance/taxonomy in external layer Kernel External API package 2nd Layer Kernel Packages ...? Application Artefact Old Applications 1. Method important 2. Focus on ”the possible” 3. Look at ”the whole” 4. Avoid duplication 5. Reuse from Legacy New Application Facility Software

  41. Experience from nlpFarm - Implementation • Inter-project conventions are hard to follow • Code conventions important for continuity • Project build support saves time improves result • Version management hard with beginners code • Automatic testing is important • Context-independent unit-tests for facility software • System-tests for applications with support for incremental evolution • Code quality is generally low and programming is time-consuming • Stay focused and make existing solutions ”a little better” • There is no ”script-layer” where everything becomes easy • Software construction is inherently creative where every problem is unique – don’t kid yourself!

  42. Experiences from nlpFarm - Community • Too early? • Not all can be ”users” or ”script fillers” • Kernel of developers must exist (>= 3?) • Projects/community are not important, but results are • Are linguists like programmers? • Will the Open Source/free software manifesto work outside Hackerdale? • Willingness to engage in the e-society for its own sake • What is the modern (90s) evolutionarysociety vision of NLP? • OpenNLP needs a vision like GNU, but still lacks one... • A talking thinking computer?Hm,...?

  43. Summing Up • The Java the language for NLP? • It has kept its promise so far! • Java 1.5 is coming... • Higher-orderness/meta-programming is still a problem • The Java platform for NLP? • Better than promised in many ways • Example of well-handled software evolution • Many elegant designs • Still Open: Knowledge Representation and Mainstream Technology? • XML in Java shows both possibilities and problems • XML is a format at a low layer in the formalism stack! • XML as a script-language, e.g. the build-tool Ant shows the way? • W3C is an example of evolution of representation formats...

More Related