This is an old revision of the document!
Behavioral cloning solves a regression problem in which the policy parameterization is obtained by maximizing the likelihood of the actions taken in the training data. BC works well for states adequately covered by the training data. It is forced to generalize when predicting actions for states with little or no data coverage, which can lead to poor behavior. Unfortunately, even if simulations are initialized in common states, the stochastic nature of the policies allow small errors in action predictions to compound over time, eventually leading to states that human drivers infrequently visit and are not adequately covered by the training data. Poorer predictions can cause a feedback cycle known as cascading errors [20]. In a highway driving context, cascading errors can lead to off-road driving and collisions. Datasets rarely contain information about how human drivers behave in these situations, which can lead BC policies to act erratically when they encounter such states. Behavioral cloning has been successfully used to produce driving policies for simple behaviors such as car-following on freeways, in which the state and action space can be adequately covered by the training set. When applied to learning general driving models with nuanced behavior and the potential to drive out of lane, BC only produces accurate predictions up to a few seconds [5, 6].