Properties of Kernels

Properties of Kernels Presenter: Hongliang Fei Date: June 11, 2009

Overview • Inner product and Hilbert space • Characteristics of kernels • The kernel Matrix • Kernel construction

Hilbert spaces • Linear function: Given a vector space X over the reals, a function f: X->R is linear if f(ax)=af(x) and f(x+z) = f(x)+f(z) for all x,z \in X and a \in R. • Inner product space: A vector space X over the reals R is an inner product space if there exists a real-valued symmetric bilinear (linear in each argument) map (.,.), that satisfies

Hilbert spaces • A Hilbert Space F is an inner product space with the additional properties that it is separable and complete. • Completeness refers to the property that every Cauchy sequence {hn} n≥1 of elements of F converges to an element h ∈ F. • A space F is separable if and only if it admits a countable orthonormal basis.

Cauchy–Schwarz inequality • In an inner product space, and the equality sign holds in a strict inner product space if and only if x and z are rescalings of the same vector.

Gram matrix

Positive semi-definite matrices • A symmetric matrix is positive semidefinite, iff its eigenvalues are all non-negative. for all v, • A symmetric matrix is positive semidefinite, iff its eigenvalues are all postive • Gram and kernel matrices are positive semi-definite.

Finitely positive semi-definite functions • A function satisfies the finitely positive semi-definite property if it is a symmetric functionfor which the matrices formed by restriction to any finite subset of the space X are positive semi-definite.

Mercier Kernel Theorem • A function which is either continuous or has a finite domain, can be decomposed into a feature map φ into a Hilbert space F applied to both its arguments followed by the evaluation of the inner product in F if and only if it satisfies the finitely positive semi-definite property.

The kernel matrix • Implementation issues • Kernels and prior knowledge • Kernel Selection • Kernel Alignment

Kernel Selection • Ideally select the optimal kernelbased on our prior knowledge of the problem domain. • Actually, consider a familyof kernels defined in a way that again reflects our prior expectations. • Simple way: require only limitedamount of additional information from the trainingdata. • Elaborate way: Combine label information

Kernel Alignment • Measure similarity between two kernels • The alignment A(K1,K2) between two kernel matricesK1 and K2 is given by

Kernel Construction

Operations on Kernel matrices • Simple transformation • Centering data • Subspace projection: chapter 6 • Whitening: Set all eigenvalues to 1 (spherically symmetric)

That’s all. Any questions?

Properties of Kernels