280 likes | 452 Views
Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam Material. Midterm Exam Material Dimensionality Reduction Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm. Dimensionality Reduction.
E N D
Final Exam Review CS479/679 Pattern RecognitionDr. George Bebis
Final Exam Material • Midterm Exam Material • Dimensionality Reduction • Feature Selection • Linear Discriminant Functions • Support Vector Machines • Expectation-Maximization Algorithm
Dimensionality Reduction • What is the goal of dimensionality reduction and why is it useful? • Reduce the dimensionality of the data • Eliminate redundant and irrelevant features • Less training samples, faster classification • How is dimensionality reduction performed? • Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation y = UTx x ϵ RN, U is NxK, and y ϵ RK • Or, select a subset of features (feature selection)
Dimensionality Reduction • Give two examples of linear dimensionality reduction techniques. • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • What is the difference between PCA and LDA? • PCA seeks a projection that preserves as much information in the data as possible. • LDA seeks a projection that best separates the data.
Dimensionality Reduction • What is the solution found by PCA? • “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components) • You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.
Dimensionality Reduction • You need to know how to apply PCA for face recognition and face detection. • What practical issue arises when applying PCA for face recognition? How do we deal with it? • The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images) • Consider the alternative matrix ATA which is only MxM (M is the number of training face images)
Dimensionality Reduction • What is the solution found by LDA? • Maximize the between-class scatter Sb while minimizing the within-class scatter Sw • Solution is given by the eigenvectors the following generalized eigenvalue problem:
Dimensionality Reduction • What practical issue arises when applying LDA for face recognition? How do we deal with it? • Solution can be obtained as follows: • But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.
Feature Selection • What is the goal of feature selection? • Select features having high discrimination power while ignoring or paying less attention to the rest. • What are the main steps in feature selection? • Search the space of possible feature subsets. • Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).
Feature Selection • What are the main search and evaluations strategies? • What is the difference between filter and wrapper methods? • In filter methods, evaluation is independent of the classification algorithm. • In wrapper methods, evaluation depends on the classification algorithm. Search strategies: Optimal, Heuristic, Randomized Evaluation strategies: filter, wrapper
Feature Selection • You need to be familiar with: • Exhaustive and Naïve search • Sequential Forward/Backward Selection (SFS/SBS) • Plus-L Minus-R Selection • Bidirectional Search • Sequential Floating Selection (SFFS and SFBS) • Feature selection using GAs
Linear Discriminant Functions • General form of linear discriminant: • What is the form of the decision boundary? What is the meaning of w and w0? • The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.
Linear Discriminant Functions • What does g(x) measure? • Distance of x from the decision boundary (hyperplane)
Linear Discriminant Functions • How do we find w and w0? • Apply learning using a set of labeled training examples • What is the effect of each training example? • Places a constraint on the solution a2 solution space (ɑ1, ɑ2) feature space (y1, y2) a1
Linear Discriminant Functions • Iterative optimization – what is the main idea? • Minimize some error function J(α) iteratively search direction α(k) α(k+1) learning rate
Linear Discriminant Functions • Gradient descent method • Newton method • Perceptron rule
Support Vector Machines • What is the capacity of a classifier? • What is the VC dimension of a classifier? • What is structural risk minimization? • Find solutions that (1) minimize the empirical risk and (2) have low VC dimension. • It can be shown that: with probability(1-δ)
Support Vector Machines • What is the margin of separation? How is it defined? • What is the relationship between VC dimension and margin of separation? • VC dimension is minimized by maximizing the margin of separation. support vectors
Support Vector Machines • What is the criterion being optimized by SVMs? maximize margin:
Support Vector Machines • SVM solutiondepends only on the support vectors: • Soft margin classifier – tolerate “outliers”
Support Vector Machines • Non-linear SVM – what is the main idea? • Map data to a high dimensional space h
Support Vector Machines • What is the kernel trick? • Compute dot products using a kernel function K(x,y)=(x . y) d polynomial kernel:
Support Vector Machines • Important comments about SVMs • SVM is based on exact optimization (no local optima). • Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space. • Performance depends on the choice of the kernel and its parameters.
Expectation-Maximization (EM) • What is the EM algorithm? • An iterative method to perform ML estimation max p(D/ θ) • When is EM useful? • Works best for problems where the data is incompleteor can be thought as being incomplete.
Expectation-Maximization (EM) • What are the steps of the EM algorithm? • Initialization:θ0 • Expectation Step: • Maximization Step: • Test for convergence: • Convergence properties of EM ? • Solution depends on the initial estimate θ0 • No guarantee to find global maximum but stable
Expectation-Maximization (EM) • What is a mixture of Gaussians? • How are the parameters of MoGs estimated? • Using the EM algorithm • What is the main idea behind using EM for estimating the MoGs parameters? • Introduce “hidden variables:
Expectation-Maximization (EM) • Explain the EM steps for MoGs
Expectation-Maximization (EM) • Explain the EM steps for MoGs