760 likes | 922 Views
CCF 贝叶斯网络在中国的应用和发展学术沙龙. 香港科技大学 BN 理论研究和应用的情况 2012-05-22. Early Work (1992-2002) Inference: Variable Elimination Inference: Local Structures Others: Learning, Decision Making, Book Latent Tree Models (2000 - ) Theory and Algorithms Applications
E N D
CCF贝叶斯网络在中国的应用和发展学术沙龙 香港科技大学 BN理论研究和应用的情况 2012-05-22
Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making, Book Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview
Papers: • N. L. Zhang and D. Poole (1994), A simple approach to Bayesian network computations, in Proc. of the 10th Canadian Conference on Artificial Intelligence, Banff, Alberta, Canada, May 16-22. • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. Idea Variable Elimination
First BN inference algorithm in Variable Elimination • Russell & Norvig wrote on page 529: • “The algorithm we describe is closest to that developed by Zhang and Poole (1994, 1996)” • Koller and Friedman wrote on page: • “… the variable elimination algorithm, as presented here, first described by Zhang and Poole (1994), …” • The K&F book cites 7 of our papers
Papers: • N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference,Journal of Artificial Intelligence Research, 5: 301-328. • N. L. Zhang and D. Poole (1994), Intercausal independence and heterogeneous factorization,i in Proc. of the 10th Conference on Uncertainties in Artificial Intelligence., Seattle, USA, July 29-31 Local Structures: Causal Independence
Papers: • N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning IJCAI-99, 1288-1293. • D. Poole and N. L. Zhang (2003). Exploiting contextual independence in probablisitic inference.Journal of Artificial Intelligence Research, 18: 263-313. Local Structure: Context Specific Independence
Parameter Learning • N. L. Zhang (1996), Irrelevance and parameter learning in Bayesian networks, Artificial Intelligence, An International Journal, 88: 359-373. Decision Making • N. L. Zhang (1998), Probabilistic Inference in Influence Diagrams,Computational Intelligence , 14(4): 475-497. • N. L. Zhang R. Qi and D. Poole (1994) A computational theory of decision networks, International Journal of Approximate Reasoning, 1994, 11 (2): 83-158. PhD Thesis Other Works
Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview
Concept first mentioned by Pearl 1988 We are the first one to conduct systematic research on LTMs. • N. L. Zhang (2002). Hierarchical latent class models for cluster analysis. AAAI-02, 230-237. • N. L. Zhang (2004). Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research, 5(6):697--723, 2004. Earlier Followers: • Aarlborg U of Denmark, Norwegian University of Science and Technology Recent papers from: • MIT, CMU, USC, Goergia Tech, Edinburgh Latent Tree Models: Overview
Recent survey by French researcher: Latent Tree Models
Latent Tree Models (LTM) • Bayesian networks with • Rooted tree structure • Discrete random variables • Leaves observed (manifest variables) • Internal nodes latent (latent variables) • Also known as hierarchical latent class (HLC)models, HLC models P(Y1), P(Y2|Y1), P(X1|Y2), P(X2|Y2), …
Example • Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence
Theory: Root Walking and Model Equivalence • M1: root walks to X2; M2: root walks to X3 • Root walking leads to equivalent models on manifest variables • Implications: • Cannot determine edge orientation from data • Can only learn unrooted models
Regularity • Regular latent tree models: For any latent node Z with neighbors X1, X2, …, • Can focus on regular models only • Irregular models can be made regular • Regularized models better than irregular models • The set of all such models is finite.
Standard dimension: • Number of free parameters Effective dimension • X1, X2, …, Xn: observed variables • P(X1, X2, …, Xn) is a point in a high-D space for each value of the parameter • Spans a manifold as parameter value varies. • Effective dimension: dimension of the manifold. Parsimonious model: • Standard dimension = effective dimension • Open question: How to test parsimony? Effective Dimension
Paper: • N. L. Zhang and Tomas Kocka (2004). Effective dimensions of hierarchical latent class models.Journal of Artificial Intelligence Research, 21: 1-17. Effective Dimension Open question: Effective of LTM with one latent variable
Learning Latent Tree Models Determine • Number of latent variables • Cardinality of each latent variable • Model Structure • Conditional probability distributions
Search-Based Learning: Model Selection • Bayesian score: posterior probability P(m|D) • P(m|D)= P(m)∫P(D|m, θ) d θ/ P(D) • BIC Score: large sample approximation BIC(m|D) = log P(D|m, θ*) – d logN/2 • BICe Score: BICe(m|D) = log P(D|m, θ*) – de logN/2 effective dimensionde. • Effective dimensions are difficult to compute • BICe not realistic
Search Algorithms • Papers: • T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. • N. L. Zhang and T. Kocka (2004). Efficient Learning of Hierarchical Latent Class Models. ICTAI-2004 • Double hill climbing (DHC), 2002 • 7 manifest variables. • Single hill climbing (SHC), 2004 • 12 manifest variables • Heuristic SHC (HSHC), 2004 • 50 manifest variables • EAST, 2011 • 100+ manifest variables • Recent fast algorithm for specific applications.
Variable clustering method • S. Harmeling and C.K. I. Williams. Greedy learning of binary latent trees (2011). IEEE Transactions on Pattern Analysis and Machine Intel ligence, 33(6), 1087-1097. • Raphaël Mourad, Christine Sinoquet, Philippe Leray (2010). A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies. BMC Bioinformatics 2011, 12:16doi:10.1186/1471-2105-12-16. • Fast, model quality may be poor Adaptation of Evolution Tree Algorithms • Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, and Alan S. Willsky (2011). Learning latent tree graphical models. Journal of Machine Learning Research 1 (2011) 1-48. • Fast, has consistence proof, for special LTMs only Algorithm by Others
Early Work (1992-2002) • Inference: Variable Elimination • Inference: Local Structures • Others: Learning, Decision Making Latent Tree Models (2000 - ) • Theory and Algorithms • Applications • Multidimensional Clustering, Density Estimation, Latent Structure • Survey Data, Documents, Business Data • Traditional Chinese Medicine (TCM) • Extensions Overview
Density Estimation • Characteristics of LTMs • Are computationally very simple to work with. • Can represent complex relationships among manifest variables. • Useful tool for density estimation.
Density Estimation • New approximate inference algorithm for Bayesian networks (Wang, Zhang and Chen, AAAI 08, Exceptional Paper) Sample LTAB Algo sparse sparse dense dense
Multidimensional Clustering Paper: T. Chen, N. L. Zhang, T. F. Liu, Y. Wang, L. K. M. Poon (2011). Model-based multidimensional clustering of categorical data. Artificial Intelligence, 176(1), 2246-2269. Cluster Analysis Grouping of objects into clusters so that objects in the same cluster are similar in some sense Page 31
How to Cluster Those? Page 32
How to Cluster Those? Page 33 Style of picture
How to Cluster Those? Page 34 Type of object in picture
How to Cluster Those? Page 35 Multidimensional clustering / Multi-Clustering • How to partition data in multiple ways? • Latent tree models
Latent Tree Models & Multidimensional Clustering • Model relationship between • Observed / Manifest variables • Math Grade, Science Grade, Literature Grade, History Grade • Latent variables • Analytic Skill, Literal Skill, Intelligence • Each latent variable gives a partition • Intelligence: Low, medium, high • Analytic skill: Low, medium, high
ICAC Data // 31 variables, 1200 samples C_City: s0 s1 s2 s3 // very common, quit common, uncommon, .. C_Gov: s0 s1 s2 s3 C_Bus: s0 s1 s2 s3 Tolerance_C_Gov: s0 s1 s2 s3 //totally intolerable, intolerable, tolerable,... Tolerance_C_Bus: s0 s1 s2 s3 WillingReport_C: s0 s1 s2 // yes, no, depends LeaveContactInfo: s0 s1 // yes, no I_EncourageReport:s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... I_Effectiveness: s0 s1 s2 s3 s4 //very e, e, a, in-e, very in-e I_Deterrence: s0 s1 s2 s3 s4 // very sufficient, sufficient, average, ... ….. -1 -1 -1 0 0 -1 -1 -1 -1 -1 -1 0 -1 -1 -1 0 1 1 -1 -1 2 0 2 2 1 3 1 1 4 1 0 1.0 -1 -1 -1 0 0 -1 -1 1 1 -1 -1 0 0 -1 1 -1 1 3 2 2 0 0 0 2 1 2 0 0 2 1 0 1.0 -1 -1 -1 0 0 -1 -1 2 1 2 0 0 0 2 -1 -1 1 1 1 0 2 0 1 2 -1 2 0 1 2 1 0 1.0 ….
Latent Structure Discovery Y2: Demographic info; Y3: Tolerance toward corruption Y4: ICAC performance; Y7: ICAC accountability Y5: Change in level of corruption; Y6: Level of corruption
Multidimensional Clustering Y2=s0: Low income youngsters; Y2=s1: Women with no/low income Y2=s2: people with good education and good income; Y2=s3: people with poor education and average income
Multidimensional Clustering Y3=s0: people who find corruption totally intolerable; 57% Y3=s1: people who find corruption intolerable; 27% Y3=s2: people who find corruption tolerable; 15% Interesting finding: Y3=s2: 29+19=48% find C-Gov totally intolerable or intolerable; 5% for C-Bus Y3=s1: 54% find C-Gov totally intolerable; 2% for C-Bus Y3=s0: Same attitude towardC-Gov and C-Bus People who are tough on corruption are equally tough toward C-Gov and C-Bus. People who are relaxed about corruption are more relaxed toward C-Bus than C-GOv
Multidimensional Clustering Interesting finding: Relationship btw background and tolerance toward corruption Y2=s2: ( good education and good income) the least tolerant. 4% tolerable Y2=s3: (poor education and average income) the most tolerant. 32% tolerable The other two classes are in between.
Page 43 Latent Tree Analysis of Text Data • The WebKB Data Set • 1041 web pages collected from 4 CS departments in 1997 • 336 words
Page 44 Latent Tree Model for WebKB Data by BI Algorithm 89 latent variables
Topic • A latent state • A collection of document A document can belong to multiple topics 100% LTM for Topic Detection
LTM • Topic • A latent state • A collection of document • A document can belong to multiple topics 100% LDA • Topic: • Distribution over the entire vocabulary. • The probabilities of the words add to one. • Document: • Distribution over topics. • If a document contains more of one topic, then it contains less of other topics. LTM vs LDA for Topic Detection
Latent Tree Analysis Summary Page 50 • Finds meaningful facets of data • Identify natural clusters along each facet. • Gives clear picture of what is in data.