Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
Member:sungbeanJo_paper [2021/03/04 21:16] sungbean |
Member:sungbeanJo_paper [2021/04/21 22:08] (current) sungbean |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Abstract—Deep networks trained on demonstrations of human | + | get_config_param active timestamp_mode |
- | driving have learned to follow roads and avoid obstacles. | + | TIME_FROM_INTERNAL_OSC |
- | However, driving policies trained via imitation learning cannot | + | get_config_param active multipurpose_io_mode |
- | be controlled at test time. A vehicle trained end-to-end to imitate | + | OUTPUT_OFF |
- | an expert cannot be guided to take a specific turn at an upcoming | + | get_config_param active sync_pulse_in_polarity |
- | intersection. This limits the utility of such systems. We | + | ACTIVE_LOW |
- | propose to condition imitation learning on high-level command | + | get_config_param active nmea_in_polarity |
- | input. At test time, the learned driving policy functions as a | + | ACTIVE_HIGH |
- | chauffeur that handles sensorimotor coordination but continues | + | get_config_param active nmea_baud_rate |
- | to respond to navigational commands. We evaluate different | + | BAUD_9600 |
- | architectures for conditional imitation learning in vision-based | + | |
- | driving. We conduct experiments in realistic three-dimensional | + | |
- | simulations of urban driving and on a 1/5 scale robotic truck | + | |
- | that is trained to drive in a residential area. Both systems | + | |
- | drive based on visual input yet remain responsive to high-level | + | |
- | navigational commands. | + | |
- | Imitation learning is receiving renewed interest as a | + | |
- | promising approach to training autonomous driving systems. | + | |
- | Demonstrations of human driving are easy to collect | + | |
- | at scale. Given such demonstrations, imitation learning can | + | |
- | be used to train a model that maps perceptual inputs to | + | |
- | control commands; for example, mapping camera images to | + | |
- | steering and acceleration. This approach has been applied to | + | |
- | lane following [27], [4] and off-road obstacle avoidance | + | |
- | However, these systems have characteristic limitations. For | + | |
- | example, the network trained by Bojarski et al. [4] was given | + | |
- | control over lane and road following only. When a lane | + | |
- | change or a turn from one road to another were required, | + | |
- | the human driver had to take control | + | |
- | Why has imitation learning not scaled up to fully autonomous | ||
- | urban driving? One limitation is in the assumption | ||
- | that the optimal action can be inferred from the perceptual | ||
- | input alone. This assumption often does not hold in practice: | ||
- | for instance, when a car approaches an intersection, the | ||
- | camera input is not sufficient to predict whether the car | ||
- | should turn left, right, or go straight. Mathematically, the | ||
- | mapping from the image to the control command is no longer | ||
- | a function. Fitting a function approximator is thus bound to | ||
- | run into difficulties. This had already been observed in the | ||
- | work of Pomerleau: “Currently upon reaching a fork, the | ||
- | network may output two widely discrepant travel directions, | ||
- | one for each choice. The result is often an oscillation in | ||
- | the dictated travel direction” [27]. Even if the network can | ||
- | resolve the ambiguity in favor of some course of action, it | ||
- | may not be the one desired by the passenger, who lacks a | ||
- | communication channel for controlling the network itself. | ||
- | |||
- | In this paper, we address this challenge with conditional | ||
- | imitation learning. At training time, the model is given | ||
- | not only the perceptual input and the control signal, but | ||
- | also a representation of the expert’s intention. At test time, | ||
- | the network can be given corresponding commands, which | ||
- | resolve the ambiguity in the perceptuomotor mapping and | ||
- | allow the trained model to be controlled by a passenger | ||
- | or a topological planner, just as mapping applications and | ||
- | passengers provide turn-by-turn directions to human drivers. | ||
- | The trained network is thus freed from the task of planning | ||
- | and can devote its representational capacity to driving. This | ||
- | enables scaling imitation learning to vision-based driving in | ||
- | complex urban environments. | ||
- | |||
- | We evaluate the presented approach in realistic simulations | ||
- | of urban driving and on a 1/5 scale robotic truck. Both | ||
- | systems are shown in Figure 1. Simulation allows us to | ||
- | thoroughly analyze the importance of different modeling | ||
- | decisions, carefully compare the approach to relevant baselines, | ||
- | and conduct detailed ablation studies. Experiments | ||
- | with the physical system demonstrate that the approach can | ||
- | be successfully deployed in the physical world. Recordings | ||
- | of both systems are provided in the supplementary video. |