Named Entity Tagging with Conditional Random Fields

Named Entity Tagging withConditional Random Fields Ryan McDonald, Fernando Pereira and Fei Sha Computer and Information Science University of Pennsylvania

Goals • Improve on the results of the current NE tagger used by UPenn ACE • Accomplish through Conditional Random Field Model (Lafferty et al. 2001) • Compare MaxEnt and CRFs in a controlled environment

ACE Definition • Find entities and classify them as Person, GPE, Organization, Location and/or Facility • “Bush took over the White House from the Clinton Administration” • Bush: Person • White House: Facility, GPE • The Clinton Administration: Organization • Clinton: Person

MaxEnt vs. CRFs • Ran an MEMM tagger and a CRF tagger with: • The exact same features • Exact same training algorithm (limited memory quasi-Newton) • Exact same training data and test data • Have not used Sept. test data yet since more improvements on the way

Features • Word: Unigram* • 1-suffix, 2-suffix, 3-suffix and 4-suffix: Unigram and Bigram • Word length bins: Unigram and bigram • Word features defined by Tom's script: Caps, Numeric, etc.* * used in original ACE system

MEMM vs. CRF • Same feature set • Same training algorithm

ACE vs. CRF • Different feature sets (CRF is richer)

Summary • These results and (Sha 2002) show that CRFs perform slightly better than MEMMs • Richer feature set leads to larger improvement • Portable CRF, MEMM code • Congugate Gradient, Limited Memory Quasi-Newton, Perceptron

Future and Current Work • “Person” and “Organization” recall • Multilayer taggers • Name lists • Document class information

Multilayer Taggers • If entity information known, can lead to a 10-20% increase in F-Score • First layer of tagger attempts to find generic entities • Can achieve around F-Score of 0.87 • Second layer uses entity information as feature for each category classifier • Leads to about a 2-5% increase in F-Score

Name Lists • Aim is to increase Recall results for person and organization categories • Name list size: 80,000 • Organization list size: 30,000 • Binary feature: is token in name list? • Increase Person F-Score to 0.793 (From 0.755) • Binary feature: is token in organization list? • Increase Person F-Score to 0.601 (From 0.569)

Name Lists • Small name lists can lead to a substantial improvement in F-Score • Even features were simplistic • Investigating better name lists • MT name list of 500,000 names and 50,000 orgs • Investigating more sophisticated features • frequency

Document Class Features • “Atlanta defeated Florida in extra innings ...” • Atlanta and Florida should be tagged as organizations • Mistakenly tagged as GPE • If document classified as SPORTS, NE classifier may recognize things normally tagged GPE should be orgs • Currently beginning to look at state of the art document classification algorithms • Could provide a richer source of knowledge

Named Entity Tagging with Conditional Random Fields