Learning the Kernel Matrix in Discriminant Analysis via QCQP

Learning the Kernel Matrix in Discriminant Analysis via QCQP Jieping Ye Arizona State University Joint work with Shuiwang Ji and Jianhui Chen

Kernel Discriminant Analysis • Regularized kernel discriminant analysis (RKDA) performs linear discriminant analysis (LDA) in the feature space. • The classification performance of RKDA is comparable to that of support vector machines (SVM). • The performance of RKDA depends on the selection (learning) of the kernel. • Cross-validation is commonly applied for kernel selection. • Recent trend:multiple kernel learning (MKL).

Outline • Overview of kernel methods and multiple kernel learning • Binary-class multiple kernel learning • Multi-class multiple kernel learning • Experiments • Conclusions

Data Embed data Linear algorithm SVM, PCA, CCA, LDA… Kernel-based Learning

Data Embed data Linear algorithm SVM, PCA, CCA, LDA… Kernel-based Learning Kernel design Kernel algorithm

j i K Kernel-based Learning Embed data IMPLICITLY: Inner product measures similarity X y Add domain-specific knowledge to measure similarity

? ? K Learning with Multiple Kernels

Multiple Kernel Learning Given a set of p kernel matrices the optimal kernel matrix G is restricted to be a convex linear combination of these kernel matrices: • Learning criterion: • Margin between two classes in SVM • Lanckriet et al., 2004 • Class discrimination in discriminant analysis

Binary-Class Kernel Discriminant Analysis • RKDA finds the optimal linear hyperplane by maximizing the Fisher discriminant ratio (FDR): Centroids of the positive and negative classes Covariance matrix of the positive and negative classes

SDP Formulation for Binary-Class Kernel Learning • Kim et al. (ICML 2006) formulated MKL for RKDA as the following maximization problem: • This leads to a semidefinite program (SDP).

Proposed Criterion for Binary-Class Kernel Learning • Consider the maximization of the following objective function: • the so-called total scatter matrix in the feature space is defined as follows: • We show that this criterion leads to an efficient Quadratically Constrained Quadratic Programming (QCQP) formulation for multiple kernel learning in the binary-class case. • Most multiple kernel learning algorithms work for binary-class problems only. We show that this QCQP formulation can be naturally extended to the multi-class case.

Least Squares Formulation • Consider the regularized least squares problem, which minimizes the following objective function: • We have the following result:

QCQP Formulation for Binary-Class Kernel Learning

Benefits of Our QCQP Formulation • QCQP can be solved more efficiently than SDP and it is therefore more scalable to large-scale problems. • Similar ideas have been used in Lanckriet et al. (JMLR 2004) to formulate the kernel learning problem of SVM as a nonnegative linear combination of some given kernel matrices. • Most kernel learning formulations are constrained to binary-class problems. Our formulation can be extended naturally to deal with multi-class problems.

Multi-Class Kernel Learning in Discriminant Analysis • The following objective function is maximized:

Least Squares Formulation for the Multi-Class Case • Equivalence relationship (Ye et al., 2007)

QCQP Formulation for Multi-Class Kernel Learning

Experiments • We use MOSEK as the QCQP solver. • http://www.mosek.com • The reported experimental results are averaged over 30 random partitions of the data into training and test sets.

Competing Algorithms • The proposed QCQP formulation is compared with the following algorithms:

Experimental Result on Sonar We can observe that cross-validated RKDA achieves the best performance on kernels corresponding to θ6 and θ7, while cross-validated SVM achieves the highest accuracy on θ6, θ7, and θ8. On the other hand, methods using linear combination of kernels seem to favor kernels corresponding to θ5, θ6, and θ7.

More Result on Binary-Class Data Sets

Running Time Comparison We can observe that the proposed QCQP formulation is much more efficient than the SDP formulation. Results also show that the QCQP formulation is much more efficient than doubly cross-validated RKDA.

Experiments Our QCQP formulation is very competitive with the other two methods based on cross-validation. Compared with the other two methods, the proposed method learns a convex linear combination of kernels by avoiding the cross-validation.

Conclusions • Propose a QCQP formulation for RKDA kernel learning in the binary-class case, which can be naturally extended to the multi-class case. • Multi-class QCQP formulation is still expensive for problems with a large sample size and a large number of classes. • We are currently investigating semi-infinite linear programming to improve the efficiency. • Applications to biological image analysis

Acknowledgments • This research is in part supported by: • Arizona State University • National Science Foundation Grant IIS-0612069

Learning the Kernel Matrix in Discriminant Analysis via QCQP

Learning the Kernel Matrix in Discriminant Analysis via QCQP

Presentation Transcript

Discriminant Analysis

Discriminant Analysis

Kernel Discriminant Analysis using Bayesian Model Averaging

Learning kernel matrix by matrix exponential update

Classification Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Distance metric learning Vs. Fisher discriminant analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Function Analysis

Discriminant Analysis

Discriminant Analysis

Discriminant Analysis

Learning the Kernel Matrix in Discriminant Analysis via QCQP

Classification Discriminant Analysis

The Learning Matrix