1 / 20

Kernel Methods for Relation Extraction

Kernel Methods for Relation Extraction. 杨振东. Journal of Machine Learning Research 3 (2003) 1083-1106. 2003 Dmitry Zelenko , Chinatsu Aone and Anthony Richardella. Extract What Relation. person-affiliation(organization) John Smith is the chief scientist of the Hardcom Corporation

gyan
Download Presentation

Kernel Methods for Relation Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel Methods for Relation Extraction 杨振东 Journal of Machine Learning Research 3 (2003) 1083-1106 2003 Dmitry Zelenko, Chinatsu Aone and Anthony Richardella

  2. Extract What Relation • person-affiliation(organization) • John Smith is the chief scientist of the Hardcom Corporation • organization-location • The IBMis an American multinational technology and consulting corporation, with headquarters in • New York, United States.

  3. Prerequisite • shallow parsing • Machine Learning (Andrew NG)

  4. shallow parsing • We believe that shallow parsing (Abney, 1990) is an important prerequisite for information extraction. “John Smith is the chief scientist of the Hardcom Corporation” The types “PNP”, “Det”, “Adj”, and “Prep” denote “Personal Noun Phrase”, “Determiner(限定词)”, “Adjective”, and “Preposition”, respectively.

  5. shallow parsing contd. The person and organization under consideration will receive the member and affiliation roles, respectively. The rest of the nodes will receive none roles reflecting that they do not participate in the relation.

  6. Definition • Kernel function defines the similarity of the object X and the object Y, denoting K(X,Y). John Smith is the chief scientist of the Hardcom Corporation James Brown was a scientist at the University of Illinois

  7. Definition contd. Definition 1: Each node may have a different number ofattributes. The attributes are named, and each node necessarily has attributes with names “Type” and “Role”.

  8. Definition contd. • Definition 2: An (unlabeled) relation example is defined inductively as follows: • Let p be a node, then the pair P = (p, []) is a relation example, where by [] we denote an empty sequence. • Let p be a node, and [P1,P2 ,…,Pl ] be a sequence of relation examples. Then, the pair P = (p, [P1,P2 ,…,Pl ]) is a relation example.

  9. Definition contd. • We denote by P.pthe first element of the example pair, by P.c the second element of the example pair.. ……. ……. ……. • We first define a matching function and a similarity function on nodes.

  10. t()=1 John Smith is the chief scientist of the Hardcom Corporation James Brown was a scientist at the University of Illinois

  11. k()=1

  12. Definition contd. • Then, for two relation examples P1,P2, we define the similarity function K(P1,P2) in terms of similarity function of the parent nodes and the similarity function Kc of the children. (1) ……. • We now give a general definition of Kc in terms of similarities of children subsequences. We first introduce some helpful notation.

  13. Definition contd. (2) l(*) = 2 (A1,A2-> B1,B2) (A1,A3-> B1,B2) (A2,A4-> B1,B3) …… A1 A2 A3 A4 B1 B2 B3 The formula enumerates all subsequences of relation example children with matchingparents, accumulates the similarity for each subsequence by adding the corresponding child examples’ similarities, and decreases the similarity by the factor of reflecting how spread out the subsequences within children sequences. Finally, the similarity of two children sequences is the sum all matching subsequences similarities.

  14. Contiguous Subtree Kernels l(*) = 2 (A1,A2-> B1,B2) (A2,A3-> B1,B2) (A2,A3-> B2,B3) …… A1 A2 A3 A4 B1 B2 B3 Sparse Subtree Kernels

  15. assume that = 0.5

  16. K(P1,P2) =k(P1.Sentence.p,P2.Sentence.p)+Kc([P1.Person,P1.Verb,P1.PNP],[P2.Person,P2.Verb,P2.PNP]) = 0 + 0.5( K(P1.Person,P2.Person) + K(P1.Verb,P2.Verb) + K(P1.PNP,P2.PNP) ) +0.52( K(P1.Person,P2.Person)+K(P1.Verb,P2.Verb)+K(P1.Verb,P2.Verb) + K(P1.PNP,P2.PNP) ) +0.53( K(P1.Person,P2.Person)+K(P1.Verb,P2.Verb)+K(P1.PNP,P2.PNP) ) = 0.5( k(P1.Person,P2.Person) + k(P1.Verb,P2.Verb) + K(P1.PNP,P2.PNP) ) + 0.52( k(P1.Person,P2.Person) + 2k(P1.Verb,P2.Verb) + K(P1.PNP,P2.PNP) ) + 0.53( k(P1.Person,P2.Person) + k(P1.Verb,P2.Verb) + K(P1.PNP,P2.PNP) ) ……. = 2.765625

  17. Experiments • The (text) corpus for our experiments comprises 200 news articles from different news agencies and publications (Associated Press,Wall Street Journal, Washington Post, Los Angeles Times, Philadelphia Inquirer). Learning curves (of F-measure) for the person-affiliationrelation(on the left) and org-locationrelation(on the right), comparing feature-based learning algorithms with kernel-based learning algorithms.

  18. Experiments contd. Learning curve (of F-measure) for the person-affiliation relation (on the left) and org-location relation (on the right), comparing kernel-based learning algorithms with different kernels.

More Related