120 likes | 447 Views
University of British Columbia. bnds@cs.ubc.ca. Motivation. Some traditional cases of ... driving domain, want to optimize travel time and number of crashes ...
E N D
Slide 1:Learning from Disagreeing Demonstrators
Bruno N. da Silva University of British Columbia bnds@cs.ubc.ca
Slide 2:Motivation
Some traditional cases of Learning from Demonstration assume a human expert In some (subjective) tasks, there might not be a single expert How to drive from point A to B
Slide 3:Motivation
In general, these tasks involve more than one feature e.g. in the driving domain, want to optimize travel time and number of crashes Different contexts lead to different tradeoffs between features Idiosyncratic demonstrators do not reflect on their routine approach to the problem
Slide 4:Problem definition
How can we integrate idiosyncratic (disagreeing) demonstrations to form a homogeneous and effective policy?
Slide 5:Solution
We extend the framework presented by Argall et al, 2007 Traditional demonstrations in the first stage Robot execution and human critique in the second stage Robot collects critiques Robot updates policy
Slide 6:The 1st stage of the mechanism
Slide 7:The 2nd stage of the mechanism
Slide 8:A little more concretely
The first stage can be interpreted as a set of datapoints (pm,an,c) Perception pm Action an Confidence on the mapping c The criticism will affect the confidence If praise the execution, increase c If knock the execution, decrease c
Slide 9:But lets not be naďve
If demonstrators lie in the demonstration, they would lie in the criticism Therefore, associate a reputation ri with each demonstration di And update the confidence level carefully c := c + ri * f(feedback)
Slide 10:Adjusting reputation ranks
And adjust ri based on (lack of) improvement from dis feedback ri := ri + ? * evaluation(feedback) evaluation(.) can be interpreted as a Pareto improvement from the feedback
Slide 11:Current investigations
Policy conversion? Rate of conversion? What are the long term effects on human demonstrators? Frustration? Repudiation? Will critiques really be mindful?
Slide 12:Thanks!
Questions?