1 / 154

Review

Review. Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005. PatReco: Introduction. Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005. PatReco:Applications. Speech/audio/music/sounds Speech recognition, Speaker verification/id, Image/video

leoma
Download Presentation

Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

  2. PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

  3. PatReco:Applications • Speech/audio/music/sounds • Speech recognition, Speaker verification/id, • Image/video • OCR, AVASR, Face id, Fingerpring id, Video segmentation • Text/Language • Machine translatoin, document class., lnag mod., text underst. • Medical/Biology • Disease diagnosis, DNA sequencing, Gene disease models • Other Data • User modeling (books/music), Ling analysis (web), Games

  4. Basic Concepts • Why statistical modeling? • Variability: differences between two examples of the same class in training • Mismatch: differences between two examples of the same class (one in training one in testing) • Learning modes: • Supervised learning: class labels known • Unsupervised learning: class labels unknown • Re-inforced learning: only positive/negative feedback

  5. Basic Concepts • Feature selection • Separate classes, Low correlation • Model selection • Model type, Model order • Prior knowledge • E.g., a priori class probability • Missing features/observations • Modeling of time series • Correlation in time (model?), segmentation

  6. PatReco: Algorithms • Parametric vs Non-Parametric • Supervised vs Unsupervised • Basic Algorithms: • Bayesian • Non-parametric • Discriminant Functions • Non-Metric Methods

  7. PatReco: Algorithms • Bayesian methods • Formulation (describe class characteristics) • Bayes classifier • Maximum likelihood estimation • Bayesian learning • Estimation-Maximization • Markov models, hidden Markov models • Bayesian Nets • Non-parametric • Parzen windows • Nearest Neighbour

  8. PatReco: Algorithms • Discriminant Functions • Formulation (describe boundary) • Learning: Gradient descent • Perceptron • MSE=minimum squared error • LMS=least mean squares • Neural Net generalizations • Support vector machines • Non-Metric Methods • Classification and Regression Trees • String Matching

  9. PatReco: Algorithms • Unsupervised Learning: • Mixture of Gaussians • K-means • Other not-covered • Multi-layered Neural Nets • Stochastic Learning (Simulated Annealing) • Genetic Algorithms • Fuzzy Algorithms • Etc…

  10. PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation

  11. PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation

  12. PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation

  13. PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation

  14. Evaluation • Training Data Set • 1234 examples of class 1 and class 2 • Testing/Evaluation Data Set • 134 examples of class 1 and class 2 • Misclassification Error Rate • Training: 11.61% (150 errors) • Testing: 13.43% (18 errors) • Correct for chance (Training 22%, Testing 26%) • Why?

  15. PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

  16. PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation

  17. Discriminant Functions • Define class boundaries (instead of class characteristics) • Dualism: • Parametric class description  Bayes classifier  Decision boundary  Parametric Discriminant Functions

  18. Normal Density • 1D • Multi-D • Full covariance • Diagonal covariance • Diagonal covariance + univariate • Mixture of Gaussians • Usually diagonal covariance

  19. Gaussian Discriminant Functions • Same variance ALL classes • Hyper-planes • Different variance among classes • Hyper-quadratics (hyper-parabolas, hyper-ellipses etc.)

  20. Hyper-Planes • When the covariance matrix is common across Gaussian classes • The decision boundary is a hyper-plane that is vertical to the line connecting the means of the Gaussian distributions • If the a-priori probabilities of classes are equal the hyper-planes cuts the line connecting the Gaussian means in the middle  Euclidean classifier

  21. Gaussian Discriminant Functions • Same variance ALL classes • Hyper-planes • Different variance among classes • Hyper-quadratics (hyper-parabolas, hyper-ellipses etc.)

  22. Hyper-Quadratics • When the Gaussian class variances are different the boundary can be • hyper-plane, multiple hyper-planes, hyper-sphere, hyper-parabola, hyper-elipsoid etc. • The boundary in general in NOT vertical to the Gaussian mean connecting line • If the a-priori probabilities of classes are equal the resulting classifier is a Mahalanobois classifier

  23. Conclusions • Parametric statistical models describe class characteristics x by modeling the observation probabilities p(x|class) • Discriminant functions describe class boundaries parametrically • Parametric statistical models have an equivalent parametric discriminant function • For Gaussian p(x|class) distributions the decision boundaries are hyper-planes or hyper-quadratics

  24. PatReco: Detection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

  25. Detection • Goal: Detect an Event • Hit (Success) • False Alarm • Miss (Failure) • False Reject

  26. PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005

  27. Estimation/Training • Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class

  28. Supervised-Unsupervised • Supervised training: All data has been (manually) labeled, i.e., assigned to classes • Unsupervised training: Data is not assigned a class label

  29. Observable data • Fully observed data: all information necessary for training is available (features, class labels etc.) • Partially observed data: some of the features or some of the class labels are missing

  30. Supervised Training(fully observable data) • Maximum likelihood estimation (ML) • Maximum a posteriori estimation (MAP) • Bayesian estimation (BE)

  31. Training process • Collected data used for training consists of the following examples D = {x1, x2, … xN} • Step 1: Label each example with the corresponding class label ω1, ω2, ... ωΚ • Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D1, D2..DK

  32. Training Process: Step 1 D = {x1, x2, x3, x4, x5, … xN} Label manually ω1, ω2, ... ωΚ D1 = {x11, x12, x13, … x1N1} D2 = {x21, x22, x23, … x2N2} ………… DK = {xK1, xK2, xK3, … xKNk}

More Related