1 / 21

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction. Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers. 3D Human Pose Inference Difficulties Towards automatic monocular methods. Background clutter Geometric transforms

tassos
Download Presentation

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers

  2. 3D Human Pose Inference DifficultiesTowards automatic monocular methods • Background clutter • Geometric transforms • Scale & viewpoint change • Illumination changes • Fast motions • (Self-)occlusion • Variability in the humanbody proportions

  3. Standard Discriminative Approach • Train structured model to predict 3D human poses given image descriptor inputs - Multi-valued predictor necessary for multiple plausible pose interpretations • Train using images and corresponding 3D human poses

  4. Problems… we address • Lack of typical training data (lab, quasi-real) • Image descriptors are unstable w.r.t. to geometric deformations and background clutter CleanQuasi-real (QReal)Real Predictor cannot generalize!

  5. This Talk Hierarchical Image Encodings Multi-Valued Semi-supervised Learning Distance Metric Learning • Learning hierarchical image descriptors • Multi-level / coarse-to-fine encodings stable w.r.t. deformation and misalignment in the training set • Metrics for noise suppression (clutter removal) • Semi-supervised generalization to multi-valued prediction

  6. Why do we need better features? Global histograms (bag of features) are robust to local deformation but sensitive to background clutter Regular grid descriptors can be made robust to clutter but are sensitive to training set misalignments & local deformations

  7. Hierarchical Image Descriptors Coarse-to-fine encodings designed to represent multiple degrees of selectivity & invariance Progressively relax rigid local spatial encodings to weaker models of geometry accumulated over larger regions • Layered encodings (e.g. spatial pyramid) or bottom-up, hierarchical aggregation based on successive template matching + max pooling (e.g. HMAX)

  8. Hierarchical Image Descriptors HMAX (Poggio et al, 2002-06) Ω = MAX S1 C1 S2 C2 16 Gabor Filter response Ω Ω Encoding Ω Select Patches / Object Parts to match against results from previous layer

  9. Multilevel Image Descriptors Spatial Pyramid (Lazebnik et al, 2006) SIFT Descriptor, followed by Vector quantization Concatenate to Descriptor Votes accumulation in a spatial region

  10. Dealing with Background Clutter Multi-level encodings are still perturbed by background Need to suppress noise Feature selection based on e.g. sparse linear regression tends to be ineffective for global descriptors

  11. Distance Metric Learning Chunklet 2 Learn Mahalanobis distance that maximizes similarity within chunklets = sets of images of people in similar poses, but differently proportioned and placed on different backgrounds • Relevant Component Analysis (Hillel et. al. 2003) Chunklet 1

  12. Suppressing Background Clutter Clean Quasi-real (QReal) Real The distance between the learned descriptors computed on different backgrounds is diminished

  13. This talk … Flexible training Hierarchical Image Encodings Multi-Valued Semi-supervised Learning Distance Metric Learning • Learning hierarchical image descriptors • Multi-level / coarse-to-fine encodings stable w.r.t. deformation and misalignment in the training set • Metrics for noise suppression & clutter removal • Semi-supervised generalization to multi-valued prediction

  14. Semi-supervised Multi-valued Prediction x1 x2 Manifold Assumption If two image descriptors are close in their intrinsic geometry (e.g. encoded by the graph Laplacian), their 3D outputs should vary smoothly x – 3D Human Pose r – Image Descriptor r1 r2

  15. Semi-supervised Multi-valued Prediction x3 x1 x2 x – 3D Human Pose r – Image Descriptor r3 r1 r2 Expert Ranking Assumption (Mixture of experts) If two image descriptors are close in their intrinsic image geometry (graph Laplacian), their 3D outputs should be smooth only if predicted by the same expert (prevent smoothing across partitions)

  16. Experiments Multi-level encodings • 5 hierarchical descriptors • ~1500d image descriptor Dataset of human poses • 56d human joint angle state vector • 5 Motions obtained with motion capture • Walk, Pantomime, Bending Pickup, Dancing and Running • 3247 x 3 images + 1000 unlabeled images Multi-valued predictor uses 5 experts

  17. Prediction Accuracy Multilevel vs. Global Descriptors Multilevel / hierarchical descriptors perform significantly better than global histograms or single-layer (fine) grids of local descriptors

  18. Metric learning improves the prediction error for global histogram descriptors Prediction AccuracyBefore / after Metric Learning

  19. Integrated scanning detection window + 3D prediction Notice: scale change, occlusions (trees), self-occlusion, fast motion Run Lola Run MovieAutomatic 3D Pose Reconstruction

  20. Integrated scanning detection window + 3D prediction Notice: scale change, fast motion, transparencies Run Lola Run Movie 3D Pose Reconstruction

  21. We have argued for • Learning of hierarchical image descriptors for better generalization under shape variability and background clutter • Semi-supervised generalization to multi-valued prediction Ongoing work • Jointly learn the features and the predictor • Scaling to large datasets (> 500K samples)

More Related