1 / 18

Su-A Kim

Experiments ○ ○ ○. Methodology ○ ○ ○ ○ ○ ○ ○ ○ ○. Introduction ● ○ ○ ○ ○. Real-time Articulated Hand Pose Estimation u sing Semi-supervised Transductive Regression Forests. Danhang Tang, Tsz -Ho Yu, Tae- kyun Kim Imperial College London, UK. Su-A Kim. 3 rd June 2014.

apollo
Download Presentation

Su-A Kim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiments ○ ○ ○ Methodology ○ ○ ○ ○ ○ ○ ○○○ Introduction ● ○ ○ ○○ Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests Danhang Tang, Tsz-Ho Yu, Tae-kyun Kim Imperial College London, UK Su-A Kim 3rd June 2014 ※ The slides excerpted parts of the author’s oral presentation at ICCV 2013.

  2. Challenges for Hand? • Viewpoint changes and self occlusions • Discrepancy between synthetic and real data is larger than human body • Labeling is difficult and tedious! Su-A Kim 3rd June 2014 @CVLAB

  3. Method • Viewpoint changes and self occlusions Hierarchical Hybrid Forest • Discrepancy between synthetic and real data is larger than human body Transductive Learning • Labeling is difficult and tedious! Semi-supervised Learning Su-A Kim 3rd June 2014 @CVLAB

  4. Existing Approaches • Generative Approach : use explicit hand models to recover the hand pose • - optimization, 현재 hypothesis를 최적화 하기 위해 앞 결과에 의존 Hamer et al. ICCV2009 Motion capture Ballan et al. ECCV 2012 De La Gorce et al. PAMI2010 Oikonomidis et al. ICCV2011 • Generative Approach : learn a mapping from visual features to the target parameter space, such as joint labels or joint coordinates(i.e. hand poses), from a labelled training dataset. • - classification, regression,.... • - each frame independent, error recovery Keskin et al. ECCV2012 Wang et al. SIGGRAPH2009 Stengeret al. IVC 2007 Xu and Cheng ICCV 2013 Su-A Kim 3rd June 2014 @CVLAB

  5. Discriminative Approach • achieved great success in humanbody pose estimation. • Efficient : real-time • Accurate : frame-basis, not rely on tracking • Require a large dataset to cover many poses • Train on synthetic, test on real data Su-A Kim 3rd June 2014 @CVLAB

  6. Hierarchical Hybrid Forest To evaluate the classification performance of all the viewpoint labels in dataset Viewpoint Classification: Qa • STR forest: • Qa – View point classification quality (Information gain) Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB

  7. Hierarchical Hybrid Forest Viewpoint Classification: Qa To measure the performance of classifying individual patch Finger joint Classification: QP • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) Qapv =αQa+ (1-α)βQP+ (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB

  8. Hierarchical Hybrid Forest Viewpoint Classification: Qa Finger joint Classification: QP Pose Regression: QV • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) • Qv – Compactness of voting vectors (Determinant of covariance trace) Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV Su-A Kim 3rd June 2014 @CVLAB

  9. Hierarchical Hybrid Forest Viewpoint Classification: Qa Finger Joint Classification: QP Pose Regression:QV • STR forest: • Qa – View point classification quality (Information gain) • Qp – Joint label classification quality (Information gain) • Qv – Compactness of voting vectors (Determinant of covariance trace) • (α,β) – Margin measures of view point labels and joint labels Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV : difference btw the highest posterior of a class and the second in a node Using all three terms together is slow. Su-A Kim 3rd June 2014 @CVLAB

  10. Transductive Learning Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: labeled unlabeled • Synthetic data S: • Generated from an articulated hand model. All labeled. • Realistic data R: • Captured from Primesense depth sensor • A small part of R, Rlare labeled manually (unlabeled set Ru) Su-A Kim 3rd June 2014 @CVLAB

  11. Transductive Learning Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Synthetic data S: • Generated from a articulated hand model, where |S| >> |R| • Realistic data R: • Captured from Primesense depth sensor • A small part of R, Rlare labeled manually (unlabeled set Ru) Su-A Kim 3rd June 2014 @CVLAB

  12. Transductive Term Qt Nearest neighbour Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Similar data-points in Rl and S are paired(if separated by split function give penalty) • Qt is the ratio of preserved association after a split : the training data that pass down the left and right child nodes respectively 1 when matches Su-A Kim 3rd June 2014 @CVLAB

  13. Semi-supervised Term Qu Source space (Synthetic data S) Target space (Realistic data R) • Training data D = {Rl, Ru, S}: • Similar data-points in Rl and S are paired(if separated by split function give penalty) • Qu evaluates the appearance similarities of all realistic patches R within a node Su-A Kim 3rd June 2014 @CVLAB

  14. Kinematic Refinement • 각 관절에 대하여 GMM으로 voting, 두 모드의 가우시안 사이의 euclidean거리를 측정 • High Confidence / Low Confidence • High Confidence -> query large joint position databasechoose the uncertain joint positions that are close to the result of the query. Su-A Kim 3rd June 2014 @CVLAB

  15. Experimental Settings • Training data: • Synthetic data(337.5K images) • Real data(81K images, <1.2K labeled) • Evaluation data: • Three different testing sequences • Sequence A --- Single viewpoint(450 frames) • Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames) • Sequence C --- Multiple viewpoints, with fast hand movements(240 frames) Su-A Kim 3rd June 2014 @CVLAB

  16. Self comparison experiment • This graph shows the joint classification accuracy of Sequence A. • Realistic and synthetic baselines produced similar accuracies. • Using the transductive term is better than simply augmented real and synthetic data. • All terms together achieves the best results. Su-A Kim 3rd June 2014 @CVLAB

  17. Su-A Kim 3rd June 2014 @CVLAB

  18. Reference [1] Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture, CVPR, 2014 [2] A Survey on Transfer Learning, Transactions on knowledge and data engineering , 2010 [3] Motion Capture of Hands in Action using Discriminative Salient Points, ECCV, 2012 Su-A Kim 3rd June 2014 @CVLAB

More Related