160 likes | 228 Views
FA XIAN Finding, Ascertaining, eXtracting, And Identifying Names Yuan Ding and John Blitzer. Preliminaries. 250K of hand-annotated Chinese newswire 2586 Person Names 1422 Unique BBN Identifinder Trained on 650K of English Text NYU MENE Trained on 270K of English Text. Modeling Overview.
E N D
FA XIANFinding, Ascertaining, eXtracting, And Identifying Names Yuan Ding and John Blitzer
Preliminaries • 250K of hand-annotated Chinese newswire • 2586 Person Names • 1422 Unique • BBN Identifinder • Trained on 650K of English Text • NYU MENE • Trained on 270K of English Text
Modeling Overview • SVM (Support Vector Machine) • Common Feature Set • Common Training/Evaluation Data
Features • Character based feature set • Chinese character (GB2312), English letter • Ci-2 Ci-1 Ci Ci+1 Ci+2 • Word boundary • Begin_of_Word (BOW), End_of_Word (EOW) • Previous tag (B,I,O) • [O]Outside PN, [I]Inside PN, [B]Beginning of PN
Data • Characters: 392795(training) + 34740(testing) • Unique Characters: 3252 • 行销 全球 的 <ENAMEX TYPE="ORGANIZATION:corporation">宜进 实业</ENAMEX> <ENAMEX TYPE="PER_DESC">董事长</ENAMEX> <ENAMEX TYPE="PERSON">詹正田</ENAMEX>忿忿 指陈 : 原先 <NUMEX TYPE="QUANTITY:weight">一 公斤</NUMEX> <NUMEX TYPE="MONEY">一块八 美金</NUMEX> 的 加工 丝 , 目前 滑落 至 <NUMEX TYPE="MONEY">一块三</NUMEX>
Example ju shuo yuehan zai bin da shangxue • 据说 约翰 在 宾大 上学。 C0 C1 C2 C3 C4 …… • O O B I O O O O O O (Tag) • B E B E BE B E B E BE (Word boundary) • O O O B I O O O O O (Previous tag) • 据 说约 翰 在宾 大上学 (Previous char)
Model 1: SVM (Support Vector Machine) • To search the Optimal Separating Hyperplane to maximize the margin [V.Vapnik 1995]
SVM - Properties • Two strong properties • – High generalization performance independent of feature dimension • – Training with combinations of multiple features by using a Kernel Function. • Maximal Margin Strategy • Separate positive and negative (binary) examples with a Linear Hyperplane: (w *x + b=0; w; x in Rn; b in R) • Find an optimal hyperplane (parameter w; b) with the maximal margin
SVM – Kernel Function • Kernel Function • K(x,y) = (x) • (y) • x,y are vectors in input space • (x), (y) are vectors in feature space • d (feature space) >> d (input space) • No need to compute (x) explicitly • d-th polynomial kernel • K(xi, xj) = (xi * xj + 1)^d • considering combinations of up to d features
SVM – Polynomial Kernel • So, the larger the d, the better? – Not necessarily! • Larger d • Virtually considering all d-grams • Higher precision, lower recall • Potentially equivalent to over fitting • Smaller d • Model trained is more general
Evaluation Setup • Features • [ES1]: { Ci-1 Ci Ci+1 } + { BOW EOW } • [ES2]: [ES1] + { Prev_Tag } • [ES3]: Ci-2 Ci-1 Ci Ci+1 Ci+2 (Ci+2 only for SVM) • Estimation of input space: 5 * 3200 = 16000 binary features
Target of Decision • SVM Model • Binary classifier • I and O only • Any word contains an “I” is viewed as a Person’s name.
Experiment Setup ju-shuo |yue-han| zai| bing-da| shang-xue据 说 约 翰 在 宾 大 上 学 Gold O O B I O O O O O Model1 O B I I I O O O OModel2 O O O O O O O O OUse perfect word boundary in evaluation Model1: positive<+3> gold<+1> true positive<+1>Model2: positive<+0> gold<+1> true positive<+0>Use automatic segmenter Model1: positive<+1> gold<+1> true positive<+0>Model2: positive<+0> gold<+1> true positive<+0>
Results Current English Person Name Finder: F-score around 75%
Future Work • Dynamic decoding • Integrate with word segmenter • Other named entities
Thank you! Special Thanks To BBN M. Palmer E. Loper S. Kulick T. Morton T. Joachims (Author of SVMLight)