Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_paper [2021/03/04 17:33]
sungbean
Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
Line 1: Line 1:
-Behavior cloning [32, 38, 35, 25] is a form of supervised +get_config_param active timestamp_mode  
-learning that can learn sensorimotor policies from off-line + TIME_FROM_INTERNAL_OSC 
-collected data. The only requirements are pairs of input +get_config_param active multipurpose_io_mode 
-sensory observations associated with expert actions. We use + OUTPUT_OFF ​ 
-an expanded formulation for self-driving cars called Conditional +get_config_param active sync_pulse_in_polarity 
-Imitation Learning, CIL [12]. It uses a high-level navigational + ACTIVE_LOW 
-command c that disambiguates imitation around +get_config_param active nmea_in_polarity 
-multiple types of intersections. Given an expert policy + ACTIVE_HIGH 
-π(x) with access to the environment state x, we can execute +get_config_param active nmea_baud_rate 
-this policy to produce a dataset, D = {hoi, ci, aii}N + BAUD_9600 
-i=1, +
-where oi are sensor data observations,​ ci are high-level +
-commands (e.g., take the next right, left, or stay in lane) +
-9330 +
-and ai = π(xi) are the resulting vehicle actions (low-level +
-controls). Observations oi = {i, vm} contain a single image +
-i and the ego car speed vm [12] added for the system to +
-properly react to dynamic objects on the road. Without the +
-speed context, the model cannot learn if and when it should +
-accelerate or brake to reach a desired speed or stop. +
-We want to learn a policy π parametrized by  to produce +
-similar actions to π based only on observations o and highlevel +
-commands c. The best parameters  are obtained by +
-minimizing an imitation cost ℓ:+
Navigation