430 likes | 554 Views
Human Action Recognition by Learning Bases of Action Attributes and Parts. Jia pingping. School of Electronic Information Engineering , Tianjin University. Outline :. Action Classification in Still Images. 1. 2. Intuition: Action Attributes and Parts.
E N D
Human Action Recognition by Learning Bases of Action Attributes and Parts Jia pingping School of Electronic Information Engineering , Tianjin University
Outline: ActionClassificationinStillImages 1 2 Intuition:ActionAttributesandParts Algorithm: Learning Bases of Attributes and Parts 3 Experiments: PASCAL & Stanford 40 Actions 4 Conclusion 5
Action Classification in Still Images Low level feature Riding bike
Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals …
Action Classification in Still Images Low level feature High-level representation Riding bike • - Semantic concepts – Attributes • Objects Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals …
Action Classification in Still Images Low level feature High-level representation Riding bike - Semantic concepts – Attributes - Objects - Human poses Parts Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals …
Action Classification in Still Images Low level feature High-level representation Riding bike • - Semantic concepts – Attributes • Objects • - Human poses • - Contexts of attributes & parts Parts Riding Riding a bike Sitting on a bike seat Wearing a helmet Peddling the pedals …
Action Classification in Still Images Low level feature High-level representation Riding bike wearing a helmet • - Semantic concepts – Attributes • Objects • - Human poses • - Contexts of attributes & parts Parts sitting on bike seat Peddling the pedal riding a bike • Incorporate human knowledge; • More understanding of image content; • More discriminative classifier.
Outline: ActionClassificationinStillImages 1 2 Intuition:ActionAttributesandParts Algorithm: Learning Bases of Attributes and Parts 3 Experiments: PASCAL & Stanford 40 Actions 4 Conclusion 5
Action Attributes and Parts Attributes: semantic descriptions of human actions … …
Action Attributes and Parts Attributes: semantic descriptions of human actions Discriminative classifier, e.g. SVM … … Ridingbike Notridingbike
Action Attributes and Parts Attributes: A pre-trained detector … … Parts-Objects: … … Parts-Poselets: … …
Action Attributes and Parts Attributes: a: Image feature vector Attribute classification … … Parts-Objects: Object detection … … Parts-Poselets: Poselet detection … …
Action Attributes and Parts Φ Action bases Attributes: a: Image feature vector Attribute classification … … Parts-Objects: … Object detection … … Parts-Poselets: Poselet detection … …
Action Attributes and Parts Φ Action bases Attributes: a: Image feature vector … … Parts-Objects: … … … Parts-Poselets: … …
Action Attributes and Parts Φ Action bases Attributes: a: Image feature vector … … Parts-Objects: … … … Parts-Poselets: … …
Action Attributes and Parts Φ Action bases Attributes: a: Image feature vector … … Parts-Objects: … … … Parts-Poselets: … … SVM Bases coefficients w
Action Attributes and Parts Φ Action bases Attributes: a: Image feature vector … … Parts-Objects: … … … Parts-Poselets: … … Riding bike Bases coefficients w
Outline: ActionClassificationinStillImages 1 2 Intuition: ActionAttributesandParts Algorithm: Learning Bases of Attributes and Parts 3 Experiments: PASCAL & Stanford 40 Actions 4 Conclusion 5
Bases of Atr. & Parts: Training a Φ • Input: • Output: sparse … Φ W • Jointly estimate and : w
Bases of Atr. & Parts: Testing a Φ • Input: • Output: sparse … • Estimatew: w
Outline: ActionClassificationinStillImages 1 2 Intuition: ActionAttributesandParts Algorithm: Learning Bases of Attributes and Parts 3 Experiments: PASCAL & Stanford 40 Actions 4 Conclusion 5
1. PASCAL Action Dataset http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/
1. PASCAL Action Dataset • Contain 9 classes , there are 21,738 images in total; • Randomly select 50% of each class for training/validation and the remain images for testing; • 14 attributes, 27 objects, 150 poselets; • The number of action bases are set to 400 and 600 respectively. The 𝜆 and 𝛾 values are set to 0.1 and 0.15.
Classification Result SURREY_MK UCLEAR_DOSP POSELETS Our method, use “a” … Average precision Playing instrument Riding bike Riding horse Taking photo Phoning Reading Running Walking Using computer a Φ w
Classification Result SURREY_MK UCLEAR_DOSP POSELETS Our method, use “a” Our method, use “w” … Average precision Playing instrument Riding bike Riding horse Taking photo Phoning Reading Running Walking Using computer a Φ w
Classification Result SURREY_MK UCLEAR_DOSP Poselet, Maji et al, 2011 Our method, use “a” Our method, use “w” … Average precision Playing instrument Riding bike Riding horse Taking photo Phoning Reading Running Walking Using computer a Φ attributes objects poselets w 400 action bases
Classification Result SURREY_MK UCLEAR_DOSP Poselet, Maji et al, 2011 Our method, use “a” Our method, use “w” … Average precision Playing instrument Riding bike Riding horse Taking photo Phoning Reading Running Walking Using computer a Φ attributes objects poselets w 400 action bases
Classification Result SURREY_MK UCLEAR_DOSP Poselet, Maji et al, 2011 Our method, use “a” Our method, use “w” … Average precision Playing instrument Riding bike Riding horse Taking photo Phoning Reading Running Walking Using computer a Φ attributes objects poselets w 400 action bases
Control Experiment Use “a” Use “w” … a Φ A: attribute O: object P: poselet w
2. Stanford 40 Actions Brushing teeth Calling Applauding Blowing bubbles Cleaning floor Climbing wall Cooking Cutting trees Cutting vegetables Drinking Feeding horse Fishing Fixing bike Gardening Holding umbrella Jumping Playing guitar Playing violin Pouring liquid Pushing cart Reading Repairing car Riding bike Riding horse Rowing Running Shooting arrow Smoking cigarette Taking photo Texting message Throwing frisbee Using computer Using microscope Using telescope Walking dog Washing dishes Watching television Waving hands Writing on board Writing on paper http://vision.stanford.edu/Datasets/40actions.html
2. Stanford 40 Actions • contains 40 diverse daily human actions; • 180∼300 images for each class, 9532 real world images in total; • All the images are obtained from Google, Bing, and Flickr; • large variations in human pose, appearance, and background clutter. Drinking Gardening Cutting vegetables Drinking Feeding horse Fixing bike Gardening Holding umbrella Playing guitar Playing violin Pouring liquid Reading Repairing car Riding bike Smoking Cigarette Shooting arrow Smoking cigarette Taking photo Walking dog Washing dishes Watching television
Result: • Randomly select 100 images in each class for training, and the remaining images for testing. • 45 attributes, 81 objects, 150 poselets. The number of action bases are set to 400 and 600 respectively. The 𝜆 and 𝛾 values are set to 0.1 and 0.15. • Compare our method with the Locality-constrained Linear Coding (LLC, Wang et al, CVPR 2010) baseline. Average precision
Control Experiment Use “a” Use “w” … a Φ A: attribute O: object P: poselet w
Outline: ActionClassificationinStillImages 1 2 Intuition: ActionAttributesandParts Algorithm: Learning Bases of Attributes and Parts 3 Experiments: PASCAL & Stanford 40 Actions 4 Conclusion 5
Partwise Bag-of-Words (PBoW) Representation: • Local feature • Body part localization • PBoW generation head-wise BoW limb-wise BoW leg-wise BoW foot-wise BoW
static vertical move Head • Local Action Attribute Method: 1. Label the action samples according to different parts horizontal move For each part, we define a new set of low-level semantic to re-class the training action samples static swing Limb … … static … Leg … static … Foot …
train • Local Action Attribute Method: 2. For each part, train a set of attribute classifiers according to the set of semantic we define. for each part … … …
Local Action Attribute Method: 3. For each action sample, map its low-level representation to a middle-level representation through the framework as follow: Head-wise BoW Limb-wise BoW Combine this four part to built a new histogram representation of the sample One action sample Leg-wise BoW Foot-wise BoW
Local Action Attribute Method: 4. Thus, based on local action attribute, we construct a new descriptor of action samples. It can be used to classify. Training set Training set SVM K-NN Testing set Testing set
Thank you School of Electronic Information Engineering , Tianjin University