620 likes | 639 Views
Explore the significance of graph construction in data analysis, including dimensionality reduction, semi-supervised learning, and spectral clustering. Understand the challenges and the role of graphs in characterizing data geometry.
E N D
Adaptive Graph Construction and Dimensionality Reduction Songcan Chen, Lishan Qiao, Limei Zhang http://parnec.nuaa.edu.cn/ {s.chen, qiaolishan, zhanglimei}@nuaa.edu.cn 2009. 11. 06
Outline • Why to construct graph? • Typical graph construction • Review & Challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Outline • Why to construct graph? • Typical graph construction • Review & Challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Graph is used to characterize data geometry (e.g., manifold) and thus plays an important role in data analysis including machine learning! For example, dimensionality reduction, semi-supervised learning, spectral clustering, … Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Dimensionality reduction 10 -10 4-NN Graph • Nonlinear manifold learning • E.g., Laplacian Eigenmaps, LLE, ISOMAP 2D Embedding Result Data Points (Swiss roll) • Linearized variants • E.g., LPP, NPE, and so on • (Semi-)supervised and/or Tensorized extensions • Too numerous to mention one by one Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Dimensionality reduction [1] PCA LDA • Many classical DR algorithms • E.g., PCA (Unsupervised), LDA (Supervised) According to [1], most of the current dimensionality reduction algorithms can be unified under a graph embedding framework. [1] S.Yan, D.Xu, B.Zhang, H.Zhang, Q.Yang, S.Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29(1)(2007):40–51. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Semi-supervised learning Transductive (e.g., Label Propagation) Inductive (e.g., Manifold Reg.) Data Points with 4-NN graph • Typical graph-based semi-supervised algorithms • Local and global consistency • Label propagation • Manifold regularization • … “Graph is at the heart of the graph-based semi-supervised learning methods” [1]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Spectral clustering Clustering structure Manifold structure • Typical graph-based clustering algorithms • Graph cut • Normalized cut • … “Ncut on a kNN graph does something systematically different than Ncut on an ε-neighborhood graph! … shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to.”[1] [1] M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures. NIPS, 2008 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Summary • Dimensionality reduction • Linear/nonlinear, local/nonlocal, parametric/nonparametric • Semi-supervised learning • Transductive/inductive • Spectral clustering • Clustering structure/manifold structure A well-designed graph tends to result in good performance [1]. How to construct a good graph? What is the right graph for a given data set? [1] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Summary Generally speaking, Despite its importance, “Graph construction has not been studied extensively” [1]. “The way to establish high-quality graphs is still an open problem”[2]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. [2] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Why to construct graph? Summary • Fortunately, graph construction problem has attracted • increasingly attention, especially in this year (2009) • For example, graph construction by • sparse representation [1,2,3] or l1-graph. • minimizing the weighted sum of the squared distance from each vertex to the weighted average of its neighbors [4]. • b-matching graph [5] • symmetry-favored criterion and assuming that the graph is doubly stochastic [6]. • learning projection transform and graph weights simultaneously [7]. [1] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recogn, 2009 (Received on 21 July 2008) [2] S. Yan,H. Wang, Semi-supervised Learning by Sparse Representation. SDM, 2009 [3] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR, 2009. [4] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 [5] T. Jebara, J. Wang, S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning. ICML, 2009. [6] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009 [7] L. Qiao, S. Chen, L. Zhang, A Simultaneous Learning Framework for Dimensionality Reduction and Graph Construction, submitted, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Outline • Why to construct graph? • Typical graph construction • Review & Challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Review Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… A basic flow for graph-based machine learning Two basic characteristics • Task-independent • Two steps • Graph construction • Edge weight assignment Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Review • Two basic criteria • k-nearest neighbor criterion (Left) • ε-ball neighborhood graph (Right) Graph construction Edge weight assignment Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Review • Gaussian function (Heat kernel) • Inverse Euclidean distance • Local reconstructive relationship (involved in LLE) Graph construction Edge weight assignment • Several basic ways Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Review • Several basic ways • Gaussian function (Heat kernel) • Inverse Euclidean distance • Local reconstructive relationship (involved in LLE) • Non-negative local reconstruction [1] [1] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. NIPS, 2006 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Challenges • Few degree of freedom • Little noise • Sufficient sampling (Abundant samples) • Smooth assumption or clustering assumption However, • work well only when conditions are strictly satisfied. • In Practice, >>?? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Challenges ①~⑤ • Tens or hundreds of degrees of freedom • Recent research [1] showed the face subspace is estimated to have at least 100 dimensions. • More complex composite objects ? 1 Noise and other corruptions 2 Euclidean Distance The locality preserving criterion may not work well under this scenario, especially when just few training samples are available. 0.84x103 0.92x103 1.90x103 [1] M. Meytlis, L. Sirovich, On the dimensionality of face space. IEEE TPAMI, 2007, 29(7): 1262-1267. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Challenges ①~⑤ Insufficient samples 3 Data points kNN graph Data points kNN graph Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Challenges ①~⑤ Also, this illustrate… In fact, there are not reliable methods to assign appropriate values for the parameters k and ε under unsupervised scenario, or if only few labeled samples are available [1]. The sensitivity to neighborhood size 4 Another example, on Wine data set 15 samples per class for training 5 samples per class for training [1] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf, Learning with local and global consistency. NIPS, 2004 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Typical graph construction Challenges ①~⑤ • Others. For example, • The lingering “curse of dimensionality” • Fixed neighborhood size • Independence on subsequent learning tasks 5 Dimensionality reduction aims mainly at overcoming the “curse of dimensionality”, but unfortunately locality preserving algorithms construct graph relying on the nearest neighbor criterion which itself suffers from such a curse. This seems to be a paradox. Let’s try to address these problems… Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Outline • Why to construct graph? • Typical graph construction • Review & challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Task-independent graph construction Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning Our work (II) …… Our work (I) Our work (I) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Motivation PCA Simple, but ignore local structure LLE Consider locality, but fixed neighborhood size, artificial definition, difficult Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) From L0 to L1 The solution of L2 minimization (Left) and L1 minimization (Right) problem If the solution sought is sparse enough, the solution of L0-minimization problem is equal to the solution of L1-minimization problem [1]. [1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52(4) (2006) 1289-1306 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Modeling & Algorithms Nonsmooth optimization 1 Subgradient-based algorithms [1] Also, p=2, it can be recast as SOCP Quasi LASSO p=2, LASSO, many algorithms: LARS…[2] p=1, Linear Programming (see next page) 2 L1-ball constraint optimization [3] (e.g., SLEP: Sparse Learning with Efficient Projections, http://www.public.asu.edu/~jye02/Software/SLEP/index.htm) 3 [1] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publisher, 2003. [2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of Statistics, 2004, 32(2): 407-451. [3] J. Liu, J. Ye, Efficient Euclidean Projections in Linear Time, ICML, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Modeling (Example, p=1) (Left) A sub-block of the weight matrix constructed by the above model; (Right) The optimal t for 3 different samples (YaleB). Incorporate prior into the graph construction process ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Modeling (Example, p=2) L1-norm neighborhood and its weights Sparse, Adaptive, Discriminative, Outlier-insensitive Conventional k neighborhood and its weights Put samples from different classes into one patch [1]X.Tan, L.Qiao, W.Gao and J.Liu. Robust Faces Manifold Modeling: Most Expressive Vs. Most Sparse Criterion, Subspace 2009 Workshop in conjunction with ICCV2009, Kyoto, Japan Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) SPP: Sparsity Preserving Projections • The optimal describes the sparse reconstructive relationship. • So, we expect to preserve such relationship in the low dimensional space. • More specifically, Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Experiments: Toy PCA The toy data and their 1D images based on 4 different DRs algorithms LPP NPE SPP Insufficient sampling Additional prior Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Experiments: Wine Wine data set from UCI, 178 samples, 3 classes, 13 features The basic statistics of Wine data set PCA LPP NPE SPP The 2D projections of Wine data set Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Experiments: Face YALE AR Extended YALE B Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Experiments: Face AR_Fixed Yale AR_Random Extended YaleB Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Experiments: Face Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Related works [1] [2] [3] Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… Our work (I) Other extensions ? From graph to data-dependent regularization, … [1] L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recognition, 2009. (Received 21 July 2008) [2] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR2009. [3] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. SDM2009. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Extensions Semi-supervised classification Semi-supervised dimensionality reduction Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Extensions • Apply to single labeled face recognition problem • Compare with supervised LDA, unsupervised SPP, semi-supervised SDA SPDA: Sparsity Preserving Discriminant Analysis E1: 1 labeled and 2 unlabeled samples E2: 1 labeled and 30 unlabeled samples Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (I) Summary • Adaptive “neighborhood” size; • Simpler parameter selection; • Less training samples; • Easier incorporation of prior knowledge ( Not so insensitive to noise) • Stronger discriminating power • Higher computational cost Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Outline • Why to construct graph? • Typical graph construction • Review & Challenges • Our works • (I) Task-independent graph construction • (Related work: Sparsity Preserving Projections) • (II) Task-dependent graph construction • (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Task-dependent graph construction Our work (II) Our work (I) Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… • Task-independent graph construction • Advantage: be applicable to any graph-based learning tasks • Disadvantage: does not necessarily help subsequent learning tasks Can we unify them? How to unify them? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Motivation (Cont’d) Furthermore, take LPP as an example, • Step 1: Graph construction k-nearest neighbor criterion • Step 2: Edge weight assignment • Step 3: Projection directions learning Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Motivation (Cont’d) 5 samples per class for training 15 samples per class for training • In LPP, “local geometry” is completely determined by the artificiallypre-fixed neighborhood graph. • As a result, its performance may drop seriously if given a “bad” graph. Unfortunately, it is generally uneasy to justify in advance whether a graph is good or not, especially under unsupervised scenario. • So, we expect the graph to be adjustable. How to adjust? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Motivation (Cont’d) A natural question: Can we obtain more locality perserving power or discriminating power by minimizing the objective function further ? A Key: how to characterize such a power formally! • LPP seeks a low-dimensional representation aiming at preserving the local geometry in the original data. • Locality preserving power is potentially related to discriminating power [1]. • Locality preserving power is described by minimizing its objective function. Our idea: optimize graph and learn projections simultaneously in a unified objective function. [1] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Modeling: SLPP regard graph Sij as new optimization variable, i.e., graph is adjustable instead of pre-fixed. Also, note we do not constrain Sij asymmetrical. 1 m (>1), a new parameter which controls the uncertainty of Sij and helps us obtain closed-form solution. In addition, without it, we will get a singular solution where only one element in each row of is 1 and other elements are all zeros. 2 new constraints, aim to avoid degenerate solution, provide a natural probability explanation for the graph. 3 remove dii from this constraint mainly for making the optimization tractable. 4 LPP Soft LPP (SLPP or SLAPP) 2 1 3 4 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Algorithm • Non-convex with respect to • Solve it by alternating iteration optimization technique • Fortunately, we will obtain closed- form solution at each step. • Step 1: Calculate W by generalized eigen-problem • Step 2: Update graph Normalized inverse Euclidean distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Algorithm Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Modeling: ELPP ELPP: Etropy-regularized LPP Normalized heat kernel distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) ELPP: Algorithm Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Convergence Cauchy’s convergence rule. Block-Coordinate Gradient Descent ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11
Our work (II) Experiments: Wine LPP SLPP(1) SLPP(3) SLPP(5) SLPP(7) SLPP(9) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11