Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
Member:sungbeanJo_paper [2021/03/04 22:00] sungbean |
Member:sungbeanJo_paper [2021/04/21 22:08] (current) sungbean |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Abstract—Deep networks trained on demonstrations of human | + | get_config_param active timestamp_mode |
- | driving have learned to follow roads and avoid obstacles. | + | TIME_FROM_INTERNAL_OSC |
- | However, driving policies trained via imitation learning cannot | + | get_config_param active multipurpose_io_mode |
- | be controlled at test time. A vehicle trained end-to-end to imitate | + | OUTPUT_OFF |
- | an expert cannot be guided to take a specific turn at an upcoming | + | get_config_param active sync_pulse_in_polarity |
- | intersection. This limits the utility of such systems. We | + | ACTIVE_LOW |
- | propose to condition imitation learning on high-level command | + | get_config_param active nmea_in_polarity |
- | input. At test time, the learned driving policy functions as a | + | ACTIVE_HIGH |
- | chauffeur that handles sensorimotor coordination but continues | + | get_config_param active nmea_baud_rate |
- | to respond to navigational commands. We evaluate different | + | BAUD_9600 |
- | architectures for conditional imitation learning in vision-based | + | |
- | driving. We conduct experiments in realistic three-dimensional | + | |
- | simulations of urban driving and on a 1/5 scale robotic truck | + | |
- | that is trained to drive in a residential area. Both systems | + | |
- | drive based on visual input yet remain responsive to high-level | + | |
- | navigational commands. | + | |
- | Imitation learning is receiving renewed interest as a | + | |
- | promising approach to training autonomous driving systems. | + | |
- | Demonstrations of human driving are easy to collect | + | |
- | at scale. Given such demonstrations, imitation learning can | + | |
- | be used to train a model that maps perceptual inputs to | + | |
- | control commands; for example, mapping camera images to | + | |
- | steering and acceleration. This approach has been applied to | + | |
- | lane following [27], [4] and off-road obstacle avoidance | + | |
- | However, these systems have characteristic limitations. For | + | |
- | example, the network trained by Bojarski et al. [4] was given | + | |
- | control over lane and road following only. When a lane | + | |
- | change or a turn from one road to another were required, | + | |
- | the human driver had to take control | + | |
- | + | ||
- | Why has imitation learning not scaled up to fully autonomous | + | |
- | urban driving? One limitation is in the assumption | + | |
- | that the optimal action can be inferred from the perceptual | + | |
- | input alone. This assumption often does not hold in practice: | + | |
- | for instance, when a car approaches an intersection, the | + | |
- | camera input is not sufficient to predict whether the car | + | |
- | should turn left, right, or go straight. Mathematically, the | + | |
- | mapping from the image to the control command is no longer | + | |
- | a function. Fitting a function approximator is thus bound to | + | |
- | run into difficulties. This had already been observed in the | + | |
- | work of Pomerleau: “Currently upon reaching a fork, the | + | |
- | network may output two widely discrepant travel directions, | + | |
- | one for each choice. The result is often an oscillation in | + | |
- | the dictated travel direction” [27]. Even if the network can | + | |
- | resolve the ambiguity in favor of some course of action, it | + | |
- | may not be the one desired by the passenger, who lacks a | + | |
- | communication channel for controlling the network itself. | + | |
- | + | ||
- | In this paper, we address this challenge with conditional | + | |
- | imitation learning. At training time, the model is given | + | |
- | not only the perceptual input and the control signal, but | + | |
- | also a representation of the expert’s intention. At test time, | + | |
- | the network can be given corresponding commands, which | + | |
- | resolve the ambiguity in the perceptuomotor mapping and | + | |
- | allow the trained model to be controlled by a passenger | + | |
- | or a topological planner, just as mapping applications and | + | |
- | passengers provide turn-by-turn directions to human drivers. | + | |
- | The trained network is thus freed from the task of planning | + | |
- | and can devote its representational capacity to driving. This | + | |
- | enables scaling imitation learning to vision-based driving in | + | |
- | complex urban environments. | + | |
- | + | ||
- | We evaluate the presented approach in realistic simulations | + | |
- | of urban driving and on a 1/5 scale robotic truck. Both | + | |
- | systems are shown in Figure 1. Simulation allows us to | + | |
- | thoroughly analyze the importance of different modeling | + | |
- | decisions, carefully compare the approach to relevant baselines, | + | |
- | and conduct detailed ablation studies. Experiments | + | |
- | with the physical system demonstrate that the approach can | + | |
- | be successfully deployed in the physical world. Recordings | + | |
- | of both systems are provided in the supplementary video. | + | |
- | + | ||
- | We begin by describing the standard imitation learning | + | |
- | setup and then proceed to our command-conditional formulation. | + | |
- | Consider a controller that interacts with the environment | + | |
- | over discrete time steps. At each time step t, the controller | + | |
- | receives an observation ot and takes an action at. The basic | + | |
- | idea behind imitation learning is to train a controller that | + | |
- | mimics an expert. The training data is a set of observationaction | + | |
- | pairs D = fhoi; aiigNi | + | |
- | =1 generated by the expert. The | + | |
- | assumption is that the expert is successful at performing the | + | |
- | task of interest and that a controller trained to mimic the | + | |
- | expert will also perform the task well. This is a supervised | + | |
- | learning problem, in which the parameters of a function | + | |