1 / 28

Final Exam Review

Final Exam Review. CS479/679 Pattern Recognition Dr. George Bebis. Final Exam Material. Midterm Exam Material Dimensionality Reduction Feature Selection Linear Discriminant Functions Support Vector Machines Expectation-Maximization Algorithm. Dimensionality Reduction.

Download Presentation

Final Exam Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final Exam Review CS479/679 Pattern RecognitionDr. George Bebis

  2. Final Exam Material • Midterm Exam Material • Dimensionality Reduction • Feature Selection • Linear Discriminant Functions • Support Vector Machines • Expectation-Maximization Algorithm

  3. Dimensionality Reduction • What is the goal of dimensionality reduction and why is it useful? • Reduce the dimensionality of the data • Eliminate redundant and irrelevant features • Less training samples, faster classification • How is dimensionality reduction performed? • Map the data to a space of lower-dimensionality through a linear (or non-linear) transformation y = UTx x ϵ RN, U is NxK, and y ϵ RK • Or, select a subset of features (feature selection)

  4. Dimensionality Reduction • Give two examples of linear dimensionality reduction techniques. • Principal Component Analysis (PCA) • Linear Discriminant Analysis (LDA) • What is the difference between PCA and LDA? • PCA seeks a projection that preserves as much information in the data as possible. • LDA seeks a projection that best separates the data.

  5. Dimensionality Reduction • What is the solution found by PCA? • “Largest” eigenvectors of the covariance matrix (i.e., corresponding to the largest eigenvalues - principal components) • You need to know the steps of PCA, its geometric interpretation, and how to choose the number of principal components.

  6. Dimensionality Reduction • You need to know how to apply PCA for face recognition and face detection. • What practical issue arises when applying PCA for face recognition? How do we deal with it? • The covariance matrix AAT is typically very large (i.e., N2xN2 for NxN images) • Consider the alternative matrix ATA which is only MxM (M is the number of training face images)

  7. Dimensionality Reduction • What is the solution found by LDA? • Maximize the between-class scatter Sb while minimizing the within-class scatter Sw • Solution is given by the eigenvectors the following generalized eigenvalue problem:

  8. Dimensionality Reduction • What practical issue arises when applying LDA for face recognition? How do we deal with it? • Solution can be obtained as follows: • But Sw is singular in practice due to the large dimensionality of the data; use PCA first to reduce dimensionality.

  9. Feature Selection • What is the goal of feature selection? • Select features having high discrimination power while ignoring or paying less attention to the rest. • What are the main steps in feature selection? • Search the space of possible feature subsets. • Pick the one that is optimal or near-optimal with respect to a certain criterion (evaluation).

  10. Feature Selection • What are the main search and evaluations strategies? • What is the difference between filter and wrapper methods? • In filter methods, evaluation is independent of the classification algorithm. • In wrapper methods, evaluation depends on the classification algorithm. Search strategies: Optimal, Heuristic, Randomized Evaluation strategies: filter, wrapper

  11. Feature Selection • You need to be familiar with: • Exhaustive and Naïve search • Sequential Forward/Backward Selection (SFS/SBS) • Plus-L Minus-R Selection • Bidirectional Search • Sequential Floating Selection (SFFS and SFBS) • Feature selection using GAs

  12. Linear Discriminant Functions • General form of linear discriminant: • What is the form of the decision boundary? What is the meaning of w and w0? • The decision boundary is a hyperplane ; its orientation is determined by w and its location by w0.

  13. Linear Discriminant Functions • What does g(x) measure? • Distance of x from the decision boundary (hyperplane)

  14. Linear Discriminant Functions • How do we find w and w0? • Apply learning using a set of labeled training examples • What is the effect of each training example? • Places a constraint on the solution a2 solution space (ɑ1, ɑ2) feature space (y1, y2) a1

  15. Linear Discriminant Functions • Iterative optimization – what is the main idea? • Minimize some error function J(α) iteratively search direction α(k) α(k+1) learning rate

  16. Linear Discriminant Functions • Gradient descent method • Newton method • Perceptron rule

  17. Support Vector Machines • What is the capacity of a classifier? • What is the VC dimension of a classifier? • What is structural risk minimization? • Find solutions that (1) minimize the empirical risk and (2) have low VC dimension. • It can be shown that: with probability(1-δ)

  18. Support Vector Machines • What is the margin of separation? How is it defined? • What is the relationship between VC dimension and margin of separation? • VC dimension is minimized by maximizing the margin of separation. support vectors

  19. Support Vector Machines • What is the criterion being optimized by SVMs? maximize margin:

  20. Support Vector Machines • SVM solutiondepends only on the support vectors: • Soft margin classifier – tolerate “outliers”

  21. Support Vector Machines • Non-linear SVM – what is the main idea? • Map data to a high dimensional space h

  22. Support Vector Machines • What is the kernel trick? • Compute dot products using a kernel function K(x,y)=(x . y) d polynomial kernel:

  23. Support Vector Machines • Important comments about SVMs • SVM is based on exact optimization (no local optima). • Its complexity depends on the number of support vectors, not on the dimensionality of the transformed space. • Performance depends on the choice of the kernel and its parameters.

  24. Expectation-Maximization (EM) • What is the EM algorithm? • An iterative method to perform ML estimation max p(D/ θ) • When is EM useful? • Works best for problems where the data is incompleteor can be thought as being incomplete.

  25. Expectation-Maximization (EM) • What are the steps of the EM algorithm? • Initialization:θ0 • Expectation Step: • Maximization Step: • Test for convergence: • Convergence properties of EM ? • Solution depends on the initial estimate θ0 • No guarantee to find global maximum but stable

  26. Expectation-Maximization (EM) • What is a mixture of Gaussians? • How are the parameters of MoGs estimated? • Using the EM algorithm • What is the main idea behind using EM for estimating the MoGs parameters? • Introduce “hidden variables:

  27. Expectation-Maximization (EM) • Explain the EM steps for MoGs

  28. Expectation-Maximization (EM) • Explain the EM steps for MoGs

More Related