This is an old revision of the document!
Behavior cloning [32, 38, 35, 25] is a form of supervised learning that can learn sensorimotor policies from off-line collected data. The only requirements are pairs of input sensory observations associated with expert actions. We use an expanded formulation for self-driving cars called Conditional Imitation Learning, CIL [12]. It uses a high-level navigational command c that disambiguates imitation around multiple types of intersections. Given an expert policy π(x) with access to the environment state x, we can execute this policy to produce a dataset, D = {hoi, ci, aii}N i=1, where oi are sensor data observations, ci are high-level commands (e.g., take the next right, left, or stay in lane) 9330 and ai = π(xi) are the resulting vehicle actions (low-level controls). Observations oi = {i, vm} contain a single image i and the ego car speed vm [12] added for the system to properly react to dynamic objects on the road. Without the speed context, the model cannot learn if and when it should accelerate or brake to reach a desired speed or stop. We want to learn a policy π parametrized by to produce similar actions to π based only on observations o and highlevel commands c. The best parameters are obtained by minimizing an imitation cost ℓ: