Learning Reactive Behavior in Autonomous Vehicles: SAMUEL

Learning Reactive Behavior in Autonomous Vehicles:SAMUEL • Sanaa Kamari

SAMUEL • Computer system that learns reactive behavior for autonomous vehicles. • Reactive behavior is the set of actions taken by an AV as a reaction to sensor readings. • uses Genetic algorithm to improve decision making rules. • Each individual in SAMUEL is an entire rule set or strategy.

Motivation for SAMUEL • Learning facilitates extraction of rules from the expert. • Rules are context based => impossible to account for every situation. • Given a set of conditions, the system is able to learn the rules of operation from observing and recording his own actions. • Samuel uses a simulation environment to learn.

SAMUEL • Problem specific module. • The world model and its interface. • Set of internal and external sensors • Controllers that control the AV simulator • Critic component that criticizes the success or failure of the AV. [1]

SAMUEL (cont) Performance module • Matches the rules. • Performs conflict resolution. • Assign some strength values to the rules. • Learning module. • Uses GA to develop reactive behavior, as a set of condition-reaction rules. • GA searches for the behavior to exhibit the best performance • Behaviors are evaluated in real world model. • Behaviors are selected for duplication and modification. [1]

Experiment Domain: Autonomous Underwater Vehicle navigation and collision avoidance • Training the AUV simulator by virtually positioning it in the center of a field with 25 mines, and an objective outside the field. • 2D AUV must navigate through a dense mine field toward a stationary object. • AUV Actions: set speed and direction each decision cycle. • System does not learn path, but a set of rules that reactively decide a move at each step.

Experiment Results • Great improvement in both static and moving mines. • SAMUEL shows that reactive behavior can be learned. [1]

Domain: ROBOT Continuous and embedded learning • To create Autonomous systems that continue to learn throughout their lives. • To adapt a robot’s behavior in response to changes in its operating environment and capabilities. • experiment: robot learns to adapt to failure in its sonar sensors.

Continuous and Embedded learning Model • Execution module: controls the robot’s interaction with its environment. • Learning module: continuously tests new strategies for the robot against a simulation model of its environment. [2]

Execution Model • Includes a rule-based system that operates on reactive (stimulus-response) rules. • IF range = [35, 45] AND front sonar < 20 AND right sonar > 50 THEN SET turn = -24 (Strength 0.8) • Monitor: Identifies symptoms of sonar failure. • measures output of sonar, compare it to recent readings and direction of motion. • Modifies simulation used by learning sys to replicate failure.

Learning Module • Uses SAMUEL: uses Genetic algorithm to improve decision making rules.

Experiment • Task requires Robot to go from one side of a room to the other through an opening. • Robot placed randomly 4 ft from back wall. • Location of opening is random. • Center of front wall is 12.5ft from back wall

Experiment (cont) • Robot begins with a set of default rules for moving toward the goal. • Learning starts with simulation that includes and all sonars working. • After an initial period one ore more sonars are blinded. • Monitor detects failed sonars, learning simulation is adjusted to reflect failure. • Population of competing strategies is re-initialized and learning continues. • The online Robot uses the best rules discovered by the learning system since the last change to the learning simulation model,

Experiment Results • Robot in motion with all sensors intact: • a) during run and b) at goal. • Robot in motion after adapting to loss of three sensors: front, front right and right: • a) during run, and b) at goal. [2]

Experiment Results [2] • a) Robot with full sensors passing directly through doorway. • b) Robot with front sonar covered. • c) Robot after adapting to covered sonar. It uses side sonar to find opening, and then turns into the opening.

References • [1]. A. C. Schultz and J. J.Grefenstetts, “Using a genetic algorithm to learn reactive behavior for autonomous vehicles,” in Proceedings of the AIAA Guidance, Navigation, and Control Conference, (Hilton Head, SC), 1992. • [2]. A. C. Schultz and J. J.Grefenstetts, ”Continuous and Embedded Learning in Autonomous Vehicles: Adapting to Sensor Failures”, in Proceeding of SPIE vol. 4024, pg 55-62, 2000.

Learning Reactive Behavior in Autonomous Vehicles: SAMUEL