160 likes | 232 Views
This research presents a dataset of Chinese newswire with annotated person names for training English models using BBN Identifinder and NYU MENE. The models utilize Support Vector Machine (SVM) with common features and data structures. Evaluation is done based on character-based feature sets and previous tagging sequences. Evaluation setup includes various feature combinations with an estimation of 16,000 binary features. Results show promising F-scores in recognizing English person names.
E N D
FA XIANFinding, Ascertaining, eXtracting, And Identifying Names Yuan Ding and John Blitzer
Preliminaries • 250K of hand-annotated Chinese newswire • 2586 Person Names • 1422 Unique • BBN Identifinder • Trained on 650K of English Text • NYU MENE • Trained on 270K of English Text
Modeling Overview • SVM (Support Vector Machine) • Common Feature Set • Common Training/Evaluation Data
Features • Character based feature set • Chinese character (GB2312), English letter • Ci-2 Ci-1 Ci Ci+1 Ci+2 • Word boundary • Begin_of_Word (BOW), End_of_Word (EOW) • Previous tag (B,I,O) • [O]Outside PN, [I]Inside PN, [B]Beginning of PN
Data • Characters: 392795(training) + 34740(testing) • Unique Characters: 3252 • 行销 全球 的 <ENAMEX TYPE="ORGANIZATION:corporation">宜进 实业</ENAMEX> <ENAMEX TYPE="PER_DESC">董事长</ENAMEX> <ENAMEX TYPE="PERSON">詹正田</ENAMEX>忿忿 指陈 : 原先 <NUMEX TYPE="QUANTITY:weight">一 公斤</NUMEX> <NUMEX TYPE="MONEY">一块八 美金</NUMEX> 的 加工 丝 , 目前 滑落 至 <NUMEX TYPE="MONEY">一块三</NUMEX>
Example ju shuo yuehan zai bin da shangxue • 据说 约翰 在 宾大 上学。 C0 C1 C2 C3 C4 …… • O O B I O O O O O O (Tag) • B E B E BE B E B E BE (Word boundary) • O O O B I O O O O O (Previous tag) • 据 说约 翰 在宾 大上学 (Previous char)
Model 1: SVM (Support Vector Machine) • To search the Optimal Separating Hyperplane to maximize the margin [V.Vapnik 1995]
SVM - Properties • Two strong properties • – High generalization performance independent of feature dimension • – Training with combinations of multiple features by using a Kernel Function. • Maximal Margin Strategy • Separate positive and negative (binary) examples with a Linear Hyperplane: (w *x + b=0; w; x in Rn; b in R) • Find an optimal hyperplane (parameter w; b) with the maximal margin
SVM – Kernel Function • Kernel Function • K(x,y) = (x) • (y) • x,y are vectors in input space • (x), (y) are vectors in feature space • d (feature space) >> d (input space) • No need to compute (x) explicitly • d-th polynomial kernel • K(xi, xj) = (xi * xj + 1)^d • considering combinations of up to d features
SVM – Polynomial Kernel • So, the larger the d, the better? – Not necessarily! • Larger d • Virtually considering all d-grams • Higher precision, lower recall • Potentially equivalent to over fitting • Smaller d • Model trained is more general
Evaluation Setup • Features • [ES1]: { Ci-1 Ci Ci+1 } + { BOW EOW } • [ES2]: [ES1] + { Prev_Tag } • [ES3]: Ci-2 Ci-1 Ci Ci+1 Ci+2 (Ci+2 only for SVM) • Estimation of input space: 5 * 3200 = 16000 binary features
Target of Decision • SVM Model • Binary classifier • I and O only • Any word contains an “I” is viewed as a Person’s name.
Experiment Setup ju-shuo |yue-han| zai| bing-da| shang-xue据 说 约 翰 在 宾 大 上 学 Gold O O B I O O O O O Model1 O B I I I O O O OModel2 O O O O O O O O OUse perfect word boundary in evaluation Model1: positive<+3> gold<+1> true positive<+1>Model2: positive<+0> gold<+1> true positive<+0>Use automatic segmenter Model1: positive<+1> gold<+1> true positive<+0>Model2: positive<+0> gold<+1> true positive<+0>
Results Current English Person Name Finder: F-score around 75%
Future Work • Dynamic decoding • Integrate with word segmenter • Other named entities
Thank you! Special Thanks To BBN M. Palmer E. Loper S. Kulick T. Morton T. Joachims (Author of SVMLight)