1 / 0

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition. Waqas Sultani , Imran Saleemi CVPR 2014. Motivation. Dense STIP for cross dataset recognition. UCF50. UCF50. 70 %. UCF50. HMDB51. 55.7 %. Olympic Sports. 71.8 %. Olympic Sports.

noura
Download Presentation

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition WaqasSultani, Imran Saleemi CVPR 2014
  2. Motivation
  3. Dense STIP for cross dataset recognition UCF50 UCF50 70 % UCF50 HMDB51 55.7 % Olympic Sports 71.8 % Olympic Sports 16.67 % Olympic Sports UCF50 Recognition Accuracy drops across the datasets! Training and Testing is done on similar actions across the datasets
  4. HMDB51 UCF50 Is the background responsible for this drop in accuracy?
  5. Do action classifiers learn background?
  6. Experiment Two recent challenging datasets: UCF YouTube, 1100 Videos, 11 Actions UCF Sports, 150 Videos, 10 Actions Actor Bounding Boxes are available for these datasets Extract STIP (HOG, HOF) 50% Spatial-Temporal overlaps. Single scale Foreground Features: 50% overlap with bounding box Background Features: Less than 50% overlap with bounding box Dense Features: All features
  7. Experiment Experimental Setup: Leave one group out for UCF YouTube Leave one actor out for UCF Sports
  8. UCF YouTube Biking Video UCF Sports Running Video Foreground Features 59.80 % Foreground 71.92 % Features with more than 50% overlap with actor bounding box
  9. UCF YouTube Biking Video UCF Sports Running Video Dense Features Foreground 59.80 % 71.92 % All features
  10. UCF YouTube Biking Video UCF Sports Running Video Background Features Comparable performance with only background features without even observing the actor Foreground Features with less than 50% overlap with actor bounding box
  11. Background should be diverse but not discriminative !
  12. As action datasets are becoming large and more complex, their background may become more discriminative!
  13. Action class discriminatively using GIST Experiment 1 Compute GIST descriptor for KTH, UCF50, HMDB51 datasets Cluster GIST descriptor in ‘k’ clusters using K-means clustering Estimate point-wise mutual information between each cluster and action class
  14. Action class discriminatively using GIST Experiment 1 (Continue) PMI Distance Matrices HMDB51 UCF50 KTH Small value means more interclass confusion Based on scene information alone, KTH is harder to classify than HMDB51 and UCF50
  15. Action class discriminatively using GIST Experiment 2 Compute GIST descriptor for every 50th frame in KTH, UCF50, HMDB51 datasets Construct a Graph, is the set of descriptors Graph Connected Component Analysis is performed by threshold ‘E’
  16. Action class discriminatively using GIST Experiment 2 (Continue)
  17. Our Approach Foreground Focused Representation
  18. Foreground Focused Representation Action localization Binary foreground/Background Segmentation Very challenging and difficult, akin to introducing a new problem to solve the first. Instead Estimate the confidence in each pixel being a part of the foreground, and use it obtain video representation
  19. Motion Gradients Color Gradients
  20. Visual Saliency Due to camera motion, video saliency is noisy Graph based Image Saliency ( NIPS 2006)
  21. Coherence of Foreground Confidence Initial aggregate of confidence map The score is max-normalized for each frame of a video The quality of labeling is given by: ,
  22. Coherence of Foreground Confidence (continue) Inference The message the node p send to q is given by ( The belief vector of node q is given by
  23. Spatial-Temporal Coherence using 3D-MRF Motion Gradients Obtain probability of each pixel being the foreground using Color Gradients Video Saliency Final weights
  24. Examples: Final Foreground weights UCF50 Basketball UCF50 Pull up Olympic Sports Diving HMDB51 ride-horse UCF50 Golf swing HMDB51 ride-bike HMDB51 ride-bike Olympic Sports Pole vault
  25. Traditional Bag of words UCF YouTube Biking Video Histogram Foreground words Background words
  26. Weighted k-means To make codebook biased towards foreground features, use weights of features during clustering The confidence of each descriptors as being on foreground in given by: The goal of clustering is to minimize the following energy function:
  27. WeightedHistogram To reduce the contribution of background features, use weights for each features of being foreground during quantization
  28. Weighted bag of words UCF YouTube Biking Video Weighted Histogram Weighted-kmeans Background words Foreground words
  29. Problem No separate foreground and background words or vocabulary The large number of background features can sum up to be significant.
  30. Weighted bag of words UCF YouTube Biking Video Weighted Histogram Weighted-kmeans Background words Foreground words
  31. Weighted bag of words UCF YouTube Biking Video Weighted Histogram Weighted-kmeans Background words Foreground words
  32. Weighted bag of words UCF YouTube Biking Video Weighted Histogram Weighted-kmeans Background words Foreground words
  33. Foreground confidence based Histogram decomposition Categorize spatiotemporal regions corresponding to different weights in classes Compute Histograms for each region separately The regions of two videos that has same foreground confidence are compared only with each other The kernel function becomes
  34. Partition based weighted histograms 1 0 Features partitions based on weights
  35. UCF50 Video Weighted Histograms HMDB51 Video Weighted Histograms Weights Partitions Weights Partitions Final Similarity Weighted Summations Histogram Intersection
  36. Experimental Results Datasets used: UCF50, HMDB51, Olympic Sports Features used: STIP UCF50 Vs. HMDB51 10 common actions We choose actions which are visually similar: Biking, Golf Swing, Pull Ups, Horse Riding, Basketball UCF50 Vs. Olympic Sport 6 common actions: Basketball, Pole Vault, Tennis serve, Diving, Clean and Jerk, Throw Discus
  37. Qualitative Results Biking UCF 50 HMDB51 Histogram Intersection= 0.1035 Weighted Histogram Intersection=0.1142 Weighted Histogram Decomposition =0.1295
  38. Qualitative Results Golf Swing UCF 50 HMDB51 Histogram Intersection= 0.1684 Weighted Histogram Intersection=0.2740 Weighted Histogram Decomposition =0.3089
  39. Quantitative Results Pull Ups UCF 50 HMDB51 Histogram Intersection= 0.2744 Weighted Histogram Intersection=0.5454 Weighted Histogram Decomposition =0.5586
  40. Qualitative Results Histogram Decomposition
  41. Quantitative Results Confusion Matrix UCF50 classifiers on HMDB51 Unweighted Histogram Decomposition
  42. Thank you
More Related