1.54k likes | 1.65k Views
Review. Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005. PatReco: Introduction. Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005. PatReco:Applications. Speech/audio/music/sounds Speech recognition, Speaker verification/id, Image/video
E N D
Review Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
PatReco: Introduction Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
PatReco:Applications • Speech/audio/music/sounds • Speech recognition, Speaker verification/id, • Image/video • OCR, AVASR, Face id, Fingerpring id, Video segmentation • Text/Language • Machine translatoin, document class., lnag mod., text underst. • Medical/Biology • Disease diagnosis, DNA sequencing, Gene disease models • Other Data • User modeling (books/music), Ling analysis (web), Games
Basic Concepts • Why statistical modeling? • Variability: differences between two examples of the same class in training • Mismatch: differences between two examples of the same class (one in training one in testing) • Learning modes: • Supervised learning: class labels known • Unsupervised learning: class labels unknown • Re-inforced learning: only positive/negative feedback
Basic Concepts • Feature selection • Separate classes, Low correlation • Model selection • Model type, Model order • Prior knowledge • E.g., a priori class probability • Missing features/observations • Modeling of time series • Correlation in time (model?), segmentation
PatReco: Algorithms • Parametric vs Non-Parametric • Supervised vs Unsupervised • Basic Algorithms: • Bayesian • Non-parametric • Discriminant Functions • Non-Metric Methods
PatReco: Algorithms • Bayesian methods • Formulation (describe class characteristics) • Bayes classifier • Maximum likelihood estimation • Bayesian learning • Estimation-Maximization • Markov models, hidden Markov models • Bayesian Nets • Non-parametric • Parzen windows • Nearest Neighbour
PatReco: Algorithms • Discriminant Functions • Formulation (describe boundary) • Learning: Gradient descent • Perceptron • MSE=minimum squared error • LMS=least mean squares • Neural Net generalizations • Support vector machines • Non-Metric Methods • Classification and Regression Trees • String Matching
PatReco: Algorithms • Unsupervised Learning: • Mixture of Gaussians • K-means • Other not-covered • Multi-layered Neural Nets • Stochastic Learning (Simulated Annealing) • Genetic Algorithms • Fuzzy Algorithms • Etc…
PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation
PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation
PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation
PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation
Evaluation • Training Data Set • 1234 examples of class 1 and class 2 • Testing/Evaluation Data Set • 134 examples of class 1 and class 2 • Misclassification Error Rate • Training: 11.61% (150 errors) • Testing: 13.43% (18 errors) • Correct for chance (Training 22%, Testing 26%) • Why?
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
PatReco: Problem Solving • Data Collection • Data Analysis • Feature Selection • Model Selection • Model Training • Classification • Classifier Evaluation
Discriminant Functions • Define class boundaries (instead of class characteristics) • Dualism: • Parametric class description Bayes classifier Decision boundary Parametric Discriminant Functions
Normal Density • 1D • Multi-D • Full covariance • Diagonal covariance • Diagonal covariance + univariate • Mixture of Gaussians • Usually diagonal covariance
Gaussian Discriminant Functions • Same variance ALL classes • Hyper-planes • Different variance among classes • Hyper-quadratics (hyper-parabolas, hyper-ellipses etc.)
Hyper-Planes • When the covariance matrix is common across Gaussian classes • The decision boundary is a hyper-plane that is vertical to the line connecting the means of the Gaussian distributions • If the a-priori probabilities of classes are equal the hyper-planes cuts the line connecting the Gaussian means in the middle Euclidean classifier
Gaussian Discriminant Functions • Same variance ALL classes • Hyper-planes • Different variance among classes • Hyper-quadratics (hyper-parabolas, hyper-ellipses etc.)
Hyper-Quadratics • When the Gaussian class variances are different the boundary can be • hyper-plane, multiple hyper-planes, hyper-sphere, hyper-parabola, hyper-elipsoid etc. • The boundary in general in NOT vertical to the Gaussian mean connecting line • If the a-priori probabilities of classes are equal the resulting classifier is a Mahalanobois classifier
Conclusions • Parametric statistical models describe class characteristics x by modeling the observation probabilities p(x|class) • Discriminant functions describe class boundaries parametrically • Parametric statistical models have an equivalent parametric discriminant function • For Gaussian p(x|class) distributions the decision boundaries are hyper-planes or hyper-quadratics
PatReco: Detection Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
Detection • Goal: Detect an Event • Hit (Success) • False Alarm • Miss (Failure) • False Reject
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
Estimation/Training • Goal: Given observed data (re-)estimate the parameters of the model e.g., for a Gaussian model estimate the mean and variance for each class
Supervised-Unsupervised • Supervised training: All data has been (manually) labeled, i.e., assigned to classes • Unsupervised training: Data is not assigned a class label
Observable data • Fully observed data: all information necessary for training is available (features, class labels etc.) • Partially observed data: some of the features or some of the class labels are missing
Supervised Training(fully observable data) • Maximum likelihood estimation (ML) • Maximum a posteriori estimation (MAP) • Bayesian estimation (BE)
Training process • Collected data used for training consists of the following examples D = {x1, x2, … xN} • Step 1: Label each example with the corresponding class label ω1, ω2, ... ωΚ • Step 2: For each of the classes separately estimate the model parameters using ML, MAP, BE and the corresponding training examples D1, D2..DK
Training Process: Step 1 D = {x1, x2, x3, x4, x5, … xN} Label manually ω1, ω2, ... ωΚ D1 = {x11, x12, x13, … x1N1} D2 = {x21, x22, x23, … x2N2} ………… DK = {xK1, xK2, xK3, … xKNk}