Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
Member:sungbeanJo_paper [2021/03/04 22:09] sungbean |
Member:sungbeanJo_paper [2021/04/21 22:08] (current) sungbean |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Abstract—Deep networks trained on demonstrations of human | + | get_config_param active timestamp_mode |
- | driving have learned to follow roads and avoid obstacles. | + | TIME_FROM_INTERNAL_OSC |
- | However, driving policies trained via imitation learning cannot | + | get_config_param active multipurpose_io_mode |
- | be controlled at test time. A vehicle trained end-to-end to imitate | + | OUTPUT_OFF |
- | an expert cannot be guided to take a specific turn at an upcoming | + | get_config_param active sync_pulse_in_polarity |
- | intersection. This limits the utility of such systems. We | + | ACTIVE_LOW |
- | propose to condition imitation learning on high-level command | + | get_config_param active nmea_in_polarity |
- | input. At test time, the learned driving policy functions as a | + | ACTIVE_HIGH |
- | chauffeur that handles sensorimotor coordination but continues | + | get_config_param active nmea_baud_rate |
- | to respond to navigational commands. We evaluate different | + | BAUD_9600 |
- | architectures for conditional imitation learning in vision-based | + | |
- | driving. We conduct experiments in realistic three-dimensional | + | |
- | simulations of urban driving and on a 1/5 scale robotic truck | + | |
- | that is trained to drive in a residential area. Both systems | + | |
- | drive based on visual input yet remain responsive to high-level | + | |
- | navigational commands. | + | |
- | Imitation learning is receiving renewed interest as a | + | |
- | promising approach to training autonomous driving systems. | + | |
- | Demonstrations of human driving are easy to collect | + | |
- | at scale. Given such demonstrations, imitation learning can | + | |
- | be used to train a model that maps perceptual inputs to | + | |
- | control commands; for example, mapping camera images to | + | |
- | steering and acceleration. This approach has been applied to | + | |
- | lane following [27], [4] and off-road obstacle avoidance | + | |
- | However, these systems have characteristic limitations. For | + | |
- | example, the network trained by Bojarski et al. [4] was given | + | |
- | control over lane and road following only. When a lane | + | |
- | change or a turn from one road to another were required, | + | |
- | the human driver had to take control | + | |
- | Why has imitation learning not scaled up to fully autonomous | ||
- | urban driving? One limitation is in the assumption | ||
- | that the optimal action can be inferred from the perceptual | ||
- | input alone. This assumption often does not hold in practice: | ||
- | for instance, when a car approaches an intersection, the | ||
- | camera input is not sufficient to predict whether the car | ||
- | should turn left, right, or go straight. Mathematically, the | ||
- | mapping from the image to the control command is no longer | ||
- | a function. Fitting a function approximator is thus bound to | ||
- | run into difficulties. This had already been observed in the | ||
- | work of Pomerleau: “Currently upon reaching a fork, the | ||
- | network may output two widely discrepant travel directions, | ||
- | one for each choice. The result is often an oscillation in | ||
- | the dictated travel direction” [27]. Even if the network can | ||
- | resolve the ambiguity in favor of some course of action, it | ||
- | may not be the one desired by the passenger, who lacks a | ||
- | communication channel for controlling the network itself. | ||
- | |||
- | In this paper, we address this challenge with conditional | ||
- | imitation learning. At training time, the model is given | ||
- | not only the perceptual input and the control signal, but | ||
- | also a representation of the expert’s intention. At test time, | ||
- | the network can be given corresponding commands, which | ||
- | resolve the ambiguity in the perceptuomotor mapping and | ||
- | allow the trained model to be controlled by a passenger | ||
- | or a topological planner, just as mapping applications and | ||
- | passengers provide turn-by-turn directions to human drivers. | ||
- | The trained network is thus freed from the task of planning | ||
- | and can devote its representational capacity to driving. This | ||
- | enables scaling imitation learning to vision-based driving in | ||
- | complex urban environments. | ||
- | |||
- | We evaluate the presented approach in realistic simulations | ||
- | of urban driving and on a 1/5 scale robotic truck. Both | ||
- | systems are shown in Figure 1. Simulation allows us to | ||
- | thoroughly analyze the importance of different modeling | ||
- | decisions, carefully compare the approach to relevant baselines, | ||
- | and conduct detailed ablation studies. Experiments | ||
- | with the physical system demonstrate that the approach can | ||
- | be successfully deployed in the physical world. Recordings | ||
- | of both systems are provided in the supplementary video. | ||
- | |||
- | We begin by describing the standard imitation learning | ||
- | setup and then proceed to our command-conditional formulation. | ||
- | Consider a controller that interacts with the environment | ||
- | over discrete time steps. At each time step t, the controller | ||
- | receives an observation ot and takes an action at. The basic | ||
- | idea behind imitation learning is to train a controller that | ||
- | mimics an expert. The training data is a set of observationaction | ||
- | pairs D = fhoi; aiigNi | ||
- | =1 generated by the expert. The | ||
- | assumption is that the expert is successful at performing the | ||
- | task of interest and that a controller trained to mimic the | ||
- | expert will also perform the task well. This is a supervised | ||
- | learning problem, in which the parameters of a function | ||
- | |||
- | . | ||
- | An implicit assumption behind this formulation is that | ||
- | the expert’s actions are fully explained by the observations; | ||
- | that is, there exists a function E that maps observations | ||
- | to the expert’s actions: ai = E(oi). If this assumption | ||
- | holds, a sufficiently expressive approximator will be able | ||
- | to fit the function E given enough data. This explains the | ||
- | success of imitation learning on tasks such as lane following. | ||
- | However, in more complex scenarios the assumption that the | ||
- | mapping of observations to actions is a function breaks down. | ||
- | Consider a driver approaching an intersection. The driver’s | ||
- | subsequent actions are not explained by the observations, but | ||
- | are additionally affected by the driver’s internal state, such as | ||
- | the intended destination. The same observations could lead to | ||
- | different actions, depending on this latent state. This could be | ||
- | modeled as stochasticity, but a stochastic formulation misses | ||
- | the underlying causes of the behavior. Moreover, even if a | ||
- | controller trained to imitate demonstrations of urban driving | ||
- | did learn to make turns and avoid collisions, it would still | ||
- | not constitute a useful driving system. It would wander the | ||
- | streets, making arbitrary decisions at intersections. A passenger | ||
- | in such a vehicle would not be able to communicate | ||
- | the intended direction of travel to the controller, or give it | ||
- | commands regarding which turns to take. | ||
- | To address this, we begin by explicitly modeling the | ||
- | expert’s internal state by a vector h, which together with | ||
- | the observation explains the expert’s action: ai = E(oi; hi). | ||
- | Vector h can include information about the expert’s intentions, | ||
- | goals, and prior knowledge. The standard imitation | ||
- | learning objective can then be rewritten as | ||
- | |||
- | It is now clear that the expert’s action is affected by information | ||
- | that is not provided to the controller F. | ||
- | We expose the latent state h to the controller by introducing | ||
- | an additional command input: c = c(h). At training | ||
- | time, the command c is provided by the expert. It need | ||
- | not constitute the entire latent state h, but should provide | ||
- | useful information about the expert’s decision-making. For | ||
- | example, human drivers already use turn signals to communicate | ||
- | their intent when approaching intersections; these | ||
- | turn signals can be used as commands in our formulation. | ||
- | At test time, commands can be used to affect the behavior of | ||
- | the controller. These test-time commands can come from a | ||
- | human user or a planning module. In urban driving, a typical | ||
- | command would be “turn right at the next intersection”, | ||
- | which can be provided by a navigation system or a passenger. | ||
- | The training dataset becomes D = fhoi; ci; aiigNi | ||
- | =1. |