LPP-HOG: A New Local Image Descriptor for Fast Human Detection

Andy {andy@ulsan.islab.ac.kr} LPP-HOG: A New Local Image Descriptor for Fast Human Detection Qing Jun Wang and Ru Bo Zhang IEEE International Symposium on Knowledge Acquisition and Modeling Workshop, 2008. pp.640-643 21-22 Dec. 2008, Wuhan

Problem setting • Goal: design algorithm for human detection able to perform in real-time • Proposed solution: • Use a combination of Histogram of Oriented Gradients (HOG) as a feature vector. • Decrease feature-space dimensionality using Locality Preserving Projection (LPP) • Use Support Vector Machine (SVM) algorithm in reduced feature space to train the classifier

HOG general scheme

Typical person detection scheme using SVM In practice, effect is very small (about 1%) while some computational time is required* *Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, SanDiego, USA, June 2005. Vol. II, pp. 886-893.

Computing gradients

Accumulate weight votes over spatial cells z • How many bins should be in histogram? • Should we use oriented or non-oriented gradients? • How to select weights? • Should we use overlapped blocks or not? If yes, then how big should be the overlap? • What block size should we use?

Accumulate weight votes over spatial cells • How many bins should be in histogram? • Should we use oriented or non-oriented gradients? • How to select weights? • Should we use overlapped blocks or not? If yes, then how big should be the overlap? • What block size should we use?

Contrast normalization - L2-norm followed by clipping (limiting the maximum values of v to 0.2) and renormalising

Making feature vector Variants of HOG descriptors. (a) A rectangular HOG (R-HOG) descriptor with 3 × 3 blocks of cells. (b) Circular HOG (C-HOG) descriptor with the central cell divided into angular sectors as in shape contexts. (c) A C-HOG descriptor with a single central cell.

HOG feature vector for one block Angle Magnitude 0 15 25 25 5 20 20 10 10 15 25 30 5 10 10 5 45 95 101 110 20 30 30 40 47 97 101 120 50 70 70 80 Binary voting Magnitude voting Feature vector extends while window moves

HOG example In each triplet: (1) the input image, (2) the corresponding R-HOG feature vector (only the dominantorientation of each cell is shown), (3)the dominant orientations selected by the SVM (obtainedby multiplying the feature vector by the corresponding weights from the linear SVM).

Support Vector Machine (SVM)

Problem setting for SVM denotes +1 denotes -1 x2 wT x + b > 0 A hyper-plane in the feature space wT x + b = 0 (Unit-length) normal vector of the hyper-plane: n x1 wT x + b < 0

Problem setting for SVM denotes +1 denotes -1 • How would you classify these points using a linear discriminant function in order to minimize the error rate? x2 Infinite number of answers! Which one is the best? x1

Large Margin Linear Classifier denotes +1 denotes -1 x+ x+ x- Support Vectors • We know that x2 Margin wT x + b = 1 wT x + b = 0 wT x + b = -1 The margin width is: n x1

Large Margin Linear Classifier denotes +1 denotes -1 x+ x+ x- • Formulation: x2 Margin wT x + b = 1 such that wT x + b = 0 wT x + b = -1 n x1

Solving the Optimization Problem s.t. Lagrangian Function s.t. Quadratic programming with linear constraints

Solving the Optimization Problem s.t.

Solving the Optimization Problem s.t. Lagrangian Dual Problem s.t. , and

Solving the Optimization Problem x+ x+ x2 wT x + b = 1 x- wT x + b = 0 wT x + b = -1 Support Vectors x1 From KKT condition, we know: Thus, only support vectors have The solution has the form:

Solving the Optimization Problem The linear discriminant function is: Notice it relies on a dot product between the test point xand the support vectors xi Also keep in mind that solving the optimization problem involved computing the dot productsxiTxjbetween all pairs of training points

Large Margin Linear Classifier denotes +1 denotes -1 wT x + b = 1 wT x + b = 0 wT x + b = -1 • What if data is not linear separable? (noisy data, outliers, etc.) x2 Slack variables ξican be added to allow miss-classification of difficult or noisy data points x1

Large Margin Linear Classifier Formulation: such that Parameter C can be viewed as a way to control over-fitting.

Large Margin Linear Classifier Formulation: (Lagrangian Dual Problem) such that

Non-linear SVMs x 0 But what are we going to do if the dataset is just too hard? x 0 x2 x 0 Datasets that are linearly separable with noise work out great: How about… mapping data to a higher-dimensional space:

Non-linear SVMs: Feature Space General idea: the original input space can be mapped to some higher-dimensional feature space where the training set is separable: Φ: x→φ(x)

Non-linear SVMs: The Kernel Trick With this mapping, our discriminant function is now: No need to know this mapping explicitly, because we only use the dot product of feature vectors in both the training and test. A kernel function is defined as a function that corresponds to a dot product of two feature vectors in some expanded feature space:

Non-linear SVMs: The Kernel Trick Examples of commonly-used kernel functions: • Linear kernel: • Polynomial kernel: • Gaussian (Radial-Basis Function (RBF) ) kernel:

Nonlinear SVM: Optimization Formulation: (Lagrangian Dual Problem) such that The solution of the discriminant function is The optimization technique is the same.

Support Vector Machine: Algorithm • 1. Choose a kernel function • 2. Choose a value for C • 3. Solve the quadratic programming problem (many algorithms and software packages available) • 4. Construct the discriminant function from the support vectors

Summary: Support Vector Machine • 1. Large Margin Classifier • Better generalization ability & less over-fitting • 2. The Kernel Trick • Map data points to higher dimensional space in order to make them linearly separable. • Since only dot product is used, we do not need to represent the mapping explicitly.

Back to the proposed paper

Proposed algorithm parameters • - Bins in histogram: 8 • Cell size: 4x4 pixels • Block size: 2x2 cells (8x8 pixels) • Image size: 64x128 pixels (8x16blocks) • Feature vector size: 2x2x8x8x16=4096

LPP Algorithm Main idea: find matrix which will project original data into a space with lower dimensionality while preserving similarity between data (data which are close to each other in original space should be close after projection)

LPP Algorithm Is it correct? Can be represented as a generalized eigenvalue problem Add constraints Is it correct? By selecting d smallest eigenvalues and corresponding eigenvectors dimensionality reduction is achieved

Solving different scale problem

Some results Detection rate Detection example Dimension d PCA-HOGfeatures (labeled’ *’) vs LPP-HOG features (labeled˅’)

Conclusions • Fast human detection algorithm based on HOG features is presented • no information about computational speed is given • Proposed method is similar to PCA-HOG • feature space dimensionality decreased using LPP • why do we need to make LPP instead of finding eigenvectors from original feature space? • some equations seems to be wrong • Reference papers are very few Navneet Dalal “Finding People in Images and Videos” PhD Thesis. Institut National Polytechnique de Grenoble / INRIA Grenoble , Grenoble, July 2006. Navneet Dalal and Bill Triggs, “Histograms of Oriented Gradients for Human Detection”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, SanDiego, USA, June 2005. Vol. II, pp. 886-893. Paisitkriangkrai, S., Shen, C. and Zhang, J. “Performance evaluation of local features in human classification and detection”, IET Computer Vision, vol.2, issue 4, pp.236-246,December 2008

LPP-HOG: A New Local Image Descriptor for Fast Human Detection