1 / 21

Semi-Automated Extension of a Specialized Medical Lexicon for French

Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France. Semi-Automated Extension of a Specialized Medical Lexicon for French. Outline. Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information

brandi
Download Presentation

Semi-Automated Extension of a Specialized Medical Lexicon for French

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bruno Cartoni & Pierre Zweigenbaum LIMSI-CNRS, France Semi-Automated Extension of a Specialized Medical Lexicon for French

  2. Outline Context : UMLF for French The desired coverage The target lexical information The organisation of a specialised lexicon Acquiring lexical information Initial coverage Obtaining lexical entries from general lexicon Guessing technique Results Consensus guessing Acquisition of the full paradigm General improvement Conclusion and further work

  3. Context : the InterSTIS project InterSTIS: development of Terminology Server for French Medical Terminologies Sub-Project: Improving the Lexical Coverage of a French medical lexicon (UMLF : Unified Medical Lexicon for French) Use: support indexation process of medical texts Issues: What is the desired lexical knowledge ? How to acquire it ?

  4. The desired coverage Reference: “Term-Union” Union of 10 terminologies (CIM-10, SNOMED, MeSH, CISMeF, …) of French medical domains, organised around concept identifiers (CUI) of the UMLS 311,518 terms 203,300 unique concepts (CUI)‏ 94,964 word-forms

  5. Term-Union: example C0000936 MSHFRE … Accommodation de l'oei C0000936 MSHFRE … Accommodation des yeux C0000936 MSHFRE … Accommodation oculaire C0000936 SNMIGIPFRE … accommodation visuelle ... C00001558 MSHF … Voie cutanée C00001558 MSHF … Voie intradermique C00001558 MSHF … Voie percutanée C00001558 MSHF … Voie transcutanée  Observation of term variation

  6. Target lexical information Term variation within Term-Union Graphemic équilibre acido-basique – équilibre acidobasique [EN: acid-base balance] Morphosyntactic adaptation de l'oeil- adaptation des yeux [EN: eye adaptation] Morphosemantic intoxication à l’alcool - intoxication alcoolique [EN: alcohol intoxication] Others ...

  7. Organisation of the specialised lexicon 3 types of relational tables for the 3 levels of representation (graphemic, inflection, derivation) A full-entry lexicon (LMF compliant) that gathers all lexical information … inter-maxillaire | intermaxillaire insulino-sécrétantes | insulinosécrétantes scléro-cornéenne | sclérocornéenne … ... abdominal | abdomen aplasique | aplasie arachnoïdien | arachnoïde argentique | argent … … sérofibrineux | sérofibrineux | Afpms sérofibrineuse | sérofibrineux | Afpfs sérofibrineux | sérofibrineux | Afpmp sérofibrineuses | sérofibrineux | Afpfp …

  8. Outline • Context : UMLS for French • The desired coverage • The target lexical information • The organisation of a specialised lexicon • Acquiring lexical information • Initial coverage • Obtaining lexical entries from general lexicon • Guessing technique • Results • Consensus guessing • Acquisition of the full paradigm • General improvement • Conclusion and further work

  9. Acquiring the lexical information Initial coverage of UMLF (previous project, UMLF, based on Baud et al. 1998) 17,192 lexical units 5,353 adjectives 11,799 nouns 36,211 word forms

  10. Acquiring the lexical information From general lexicon Existing French general lexicon (Morphalou) With a guessing technique

  11. Acquiring the lexical information • From guessing technique (Tanguy & Hathout 2007) • 3 steps: • Learning phase : calculating the most frequent tag for each ending string in 2 existing lexicons • Guessing phase: assigning possible tag(s) • Cross validation with 2 guessing based on 2 lexicons

  12. Acquiring the lexical information • Acquiring the full paradigm • All the inflectional forms • Lemma • Based on “productive” inflectional paradigms • 9 for adjectives • 3 for nouns • Algorithm based on lexical tries to cluster forms of the same paradigm

  13. Outline • Context : UMLS for French • The desired coverage • The target lexical information • The organisation of a specialised lexicon • Acquiring lexical information • Initial coverage • Obtaining lexical entries from general lexicon • Guessing technique • Results • Consensus guessing • Acquisition of the full paradigm • General improvement • Conclusion and further work

  14. Known words entries Remaining words to describe Term-Union 94,964 Initial UMLF 19,599 81,595 Morphalou 6,617 74,978 Acquisition from general lexicon: results

  15. Acquisition with guessing techniques: results 74,978 unknown forms 44,515 analyses from Morphalou-based program 35,438 analyses from UMLF-based program Cross-validation: 30,137 in common

  16. Acquisition with guessing techniques: evaluation Wrong label 12 Proper names 49 Latin words 5 English words 1 Spelling/segmentation 10 Other 5 Total 82 • Errors: 82 out of 1000 (8.2 %)

  17. Acquisition of the full paradigm: Results 4,453 paradigms captured (incomplete or not, grouping 9352 word forms) 3,308 adjectives 514 nouns  Automatic extension for the full paradigms (with canonical forms only) Manually checked for the others

  18. General improvement Source Forms added Still unknown in Term-union Coverage UMLF-v1 36,211 81,595 14,1% Morphalou 17,828 74,978 21,0% Acquisition 8,088 70,602 25,7%

  19. Outline • Context : UMLS for French • The desired coverage • The target lexical information • The organisation of a specialized lexicon • Acquiring lexical information • Initial coverage • Obtaining lexical entries from general lexicon • Guessing technique • Results • Consensus guessing • Acquisition of the full paradigm • General improvement • Conclusion and further work

  20. Discussion and conclusion The acquisition and evaluation of specialised lexical resources require a specific reference  Term-Union Extract (full) lexical information Assess lexical needs and target Other acquisition techniques (CRF for inflectional information, rule-based techniques for derivational information)

  21. Acknowledgment • This work was partially funded by project InterSTIS (ANR-07-TECSAN-010) • InterSTIS project: www.interstis.org

More Related