1 / 27

Patent Claim Processing for Readability - Structure Analysis and Term Explanation -

ACL2003 WS on Patent Corpus Processing. Patent Claim Processing for Readability - Structure Analysis and Term Explanation -. July 12, 2003 Akihiro Shinmori † , Manabu Okumura ‡ , Yuzo Marukawa ‡ , Makoto Iwayama *

hollie
Download Presentation

Patent Claim Processing for Readability - Structure Analysis and Term Explanation -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACL2003 WS on Patent Corpus Processing Patent Claim Processing for Readability - Structure Analysis and Term Explanation - July 12, 2003 Akihiro Shinmori†, Manabu Okumura‡, Yuzo Marukawa ‡, Makoto Iwayama* † Tokyo Institute of Technology & INTEC Web and Genome Informatics‡ Japan Science and Technology & National Institute of Informatics* Tokyo Institute of Technology & Hitachi

  2. Problem & Approach • Problem=Improve patent claim readability • Structural difficulty • Term difficulty • Approach • Analyze the structure and present it visually • Apply RST and utilize tools for RST • Cue-phrase-based approach • Give explanation for terms • Utilize the “detailed explanation” part of the specification .

  3. Structure of Patent Document • Patent Specification • Invention Title • Claim • Detailed Explanation • Brief Explanation of Drawings • Drawings • Summary “The claims specify the boundaries of the legal monopoly created by the patent.” (Burgunder 1995) .

  4. Sample Japanese Patent Claim 操作手段によりアクチュエータを駆動して所望の作業を行なう作業機において、前記作業機の作業機構に作用する負荷を検出する負荷検出手段と、この負荷検出手段の検出値に応じた周波数の信号を出力する第1 の周波数変換器と、当該負荷検出手段の検出値に応じた周波数のパルスを出力する第2 の周波数変換器と、前記第1 の周波数変換器から出力される信号を前記第2 の周波数変換器からのパルスの出力期間だけ間欠的に出力する変調手段と、この変調手段の出力信号に応じて振動を発生する振動発生手段とを設けたことを特徴とする作業機の操作用仮想振動生成装置。 (Publication Number=10-011111, a patent on virtual oscillation generator for construction) One sentence (noun phrase) with 259 characters!! .

  5. Characteristics of Patent Claim Description • The length of sentence is long. • The average is 242 chars. (cf. 55.4 chars for newspaper articles) • The structure is complex. • Even native speakers cannot understand them for the first reading! • Difficult terms are often used. • Abstract terms are preferred. • Description styles are established. • Patent specifications are usually written by professionals (such as patent attorneys and IP specialists) .

  6. Description Styles of Japanese Patent Claims [Kasai 1999] • Process Sequence Style • “・・・し[shi](does)、・・・し[shi](does)、・・・した[shita](and does)、・・・” • Element Enumeration Style • “・・・と[to](and)、・・・と[to](and)、・・・とからなる[to karanaru](comprising)・・・” • Jepson-like Style • “・・・において[ni-oite](in)、・・・を特徴とする[wo-tokuchou tosuru](be characterized by)、・・・” • First describe the known or precondition part, and next describe the new or main part. .

  7. Structure Analysis of Patent Claims • Our Position: • To improve the readability of Japanese Patent claim, the structure of description needs to be presented in a readable way • Japanese Patent Claims are: • Composed of multiple clauses which have some relationship with each other • There exist cue phrases around clause boundaries • Apply RST (Rhetorical Structure Theory). • Use Cue-phrase-based Approach. .

  8. Result of Structure Analysis of Japanese Patent Claim . Graphical view by RSTTool [Odonnel 1997]

  9. Relations for Patent Claim

  10. Collection of Cue Phrases • From description pattern analysis • に(お|於)いて(in), であって(in), ... • を特徴とした(be characterized by) • From the description patterns of the claims which contain explicitly-inserted newlines .

  11. Example of claims in which newlines are explicitly inserted 原稿が載置される原稿台と、<NL>この原稿台に対して主走査方向に移動する走査光学手段と、<NL>この走査光学手段上に配置され原稿を副走査方向に照明する照明手段と、を備えた画像読取装置において、<NL>前記照明手段は、前記走査光学手段に対して走査移動平面に略平行に回動自在に取付けられることを特徴とする画像読取装置。 (Publication Number=8-182670, An image reading device) .

  12. Description pattern just before the newlines of newline-inserted claims .

  13. Cue phrases which can be used to analyze patent claims

  14. Cue phrases which can be used to analyze patent claims

  15. Algorithm • Morphological Analysis • Using Chasen(with –j option, specifying the sentence delimiter as “。:;”) • Lexical Analysis • Context-dependent output token and string value • Judge whether Jepson-like style or not • Judge whether process sequence style or element enumeration style .

  16. Algorithm (cont.) 3. Syntax Analysis (= Structure Analysis) • Parser generated from a context-free grammar (CFG) • Using BISON-compatible parser-generator • CFG: 57 rules, 11 terminals, 19non-terminals • Actions • Build-up RS-Tree • Newline insertion and indentation • Paraphrase .

  17. Evaluation Data for Structure Analysis 59,956 claims (in 1999) extracted from “NTCIR3 patent data collection” • Analysis was done by using “Sample data” (59,968 claims in 1998) • The IPC (International Patent Classification) code distribution was almost the same as the total data in 1999 published by Japan Patent Office. .

  18. Evaluation and Result • Accept Ratio • Ratio of the claims accepted by the CFG grammar • 99.77% • Processing Speed • 0.30 sec/claim (on Linux PC with Pentium 1GHz and 512MB Memory) .

  19. Accuracy Evaluation • Indirect Evaluation • Newline-insertion by using the result of RS analysis • Baseline: • Mechanically insert newlines at the end of every sequence of “(NOUN|SYMBOL)(、|,)” and “(Verb-Renyoukei|Adverb-Renyoukei) (、|,)”. • Direct Evaluation • Evaluation of result of randomly selected 100 claims .

  20. Accuracy Evaluation Result Indirect Evaluation .

  21. Accuracy Evaluation Result Direct Evaluation .

  22. Term Explanation • Difficult terms used in patent claims: • Terms specific to the invention • Terms specific to the domain • Approach • Use the result of structure analysis • Give explanation for terms by utilizing the “detailed explanation” part • Because, what is claimed must be explained in detail in the “detailed explanation” part. .

  23. Structure of Patent Document • Patent Specification • Invention Title • Claim • Detailed Explanation • Technical field • Prior art • Problem to be resolved by the invention • Means of solving the problems • Embodiments of the invention • Effects of the invention .

  24. Preliminary Survey • For the Jepson-like claims, the words used in the first part (the known or precondition part) appear more often in the technical field and the prior art than the words used in the last part. • 76.3% (cf. 55.5% for the words in the last part) • “Terms specific to the domain” are often explained in the prior art by using the following cue phrases. • so-called, or, () .

  25. Words usage in Jepson-like claims • Patent Specification • Invention Title • Claim (Jepson-like type) • First part (known things or the precondition) • Last part (new things or the body) • Detailed Explanation • Technical field • Prior art • Problem to be resolved by the invention • Means of solving the problems • Embodiments of the invention • Effects of the invention 76.3% 55.5% .

  26. “Terms specific to the domain” that can be extracted from “prior art” • For the 132 patent specifications in the field of ink-jet printer: • 29 terms can be extracted by the cue phrase “いわゆる” (so-called”) from the “prior art” part. • 9 of 27 terms are used in the claim description. • For 3 terms, useful explanation can be extracted from the “prior art” part. .

  27. Conclusion • NLP technologies can contribute toward improving the readability. • Structure can be analyzed by cue-phrase-based approach and CFG-based parsing. • Explanations for some terms can be given by utilizing the expression in the detailed explanation. • This can be a step toward more challenging task of automatic “patent map” generation. .

More Related