Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_paper [2021/03/04 20:07]
sungbean
Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
Line 1: Line 1:
-Deep networks trained on demonstrations of human +get_config_param active timestamp_mode  
-driving have learned to follow roads and avoid obstacles. + TIME_FROM_INTERNAL_OSC 
-However, driving policies trained via imitation learning cannot +get_config_param active multipurpose_io_mode 
-be controlled at test time. A vehicle trained end-to-end to imitate + OUTPUT_OFF ​ 
-an expert cannot be guided to take a specific turn at an upcoming +get_config_param active sync_pulse_in_polarity 
-intersection. This limits the utility of such systems. We + ACTIVE_LOW 
-propose to condition imitation learning on high-level command +get_config_param active nmea_in_polarity 
-input. At test time, the learned driving policy functions as a + ACTIVE_HIGH 
-chauffeur that handles sensorimotor coordination but continues +get_config_param active nmea_baud_rate 
-to respond to navigational commands. We evaluate different + BAUD_9600
-architectures for conditional imitation learning in vision-based +
-driving. We conduct experiments in realistic three-dimensional +
-simulations of urban driving and on a 1/5 scale robotic truck +
-that is trained to drive in a residential area. Both systems +
-drive based on visual input yet remain responsive to high-level +
-navigational commands.+
  
-Imitation learning is receiving renewed interest as a 
-promising approach to training autonomous driving systems. 
-Demonstrations of human driving are easy to collect 
-at scale. Given such demonstrations,​ imitation learning can 
-be used to train a model that maps perceptual inputs to 
-control commands; for example, mapping camera images to 
-steering and acceleration. This approach has been applied to 
-lane following [27], [4] and off-road obstacle avoidance [22]. However, these systems have characteristic limitations. For 
-example, the network trained by Bojarski et al. [4] was given 
-control over lane and road following only. When a lane 
-change or a turn from one road to another were required, 
-the human driver had to take control 
- 
-Why has imitation learning not scaled up to fully autonomous 
-urban driving? One limitation is in the assumption 
-that the optimal action can be inferred from the perceptual 
-input alone. This assumption often does not hold in practice: 
-for instance, when a car approaches an intersection,​ the 
-camera input is not sufficient to predict whether the car 
-should turn left, right, or go straight. Mathematically,​ the 
-mapping from the image to the control command is no longer 
-a function. Fitting a function approximator is thus bound to 
-run into difficulties. This had already been observed in the 
-work of Pomerleau: “Currently upon reaching a fork, the 
-network may output two widely discrepant travel directions, 
-one for each choice. The result is often an oscillation in 
-the dictated travel direction” [27]. Even if the network can 
-resolve the ambiguity in favor of some course of action, it 
-may not be the one desired by the passenger, who lacks a 
-communication channel for controlling the network itself. 
-In this paper, we address this challenge with conditional 
-imitation learning. At training time, the model is given 
-not only the perceptual input and the control signal, but 
-also a representation of the expert’s intention. At test time, 
-the network can be given corresponding commands, which 
-resolve the ambiguity in the perceptuomotor mapping and 
-allow the trained model to be controlled by a passenger 
-or a topological planner, just as mapping applications and 
-passengers provide turn-by-turn directions to human drivers. 
-The trained network is thus freed from the task of planning 
-and can devote its representational capacity to driving. This 
-enables scaling imitation learning to vision-based driving in 
-complex urban environments. 
-We evaluate the presented approach in realistic simulations 
-of urban driving and on a 1/5 scale robotic truck. Both 
-systems are shown in Figure 1. Simulation allows us to 
-thoroughly analyze the importance of different modeling 
-decisions, carefully compare the approach to relevant baselines, 
-and conduct detailed ablation studies. Experiments 
-with the physical system demonstrate that the approach can 
-be successfully deployed in the physical world. Recordings 
-of both systems are provided in the supplementary video. 
- 
-We begin by describing the standard imitation learning 
-setup and then proceed to our command-conditional formulation. 
-Consider a controller that interacts with the environment 
-over discrete time steps. At each time step t, the controller 
-receives an observation ot and takes an action at. The basic 
-idea behind imitation learning is to train a controller that 
-mimics an expert. The training data is a set of observationaction 
-pairs D = fhoi; aiigNi 
-=1 generated by the expert. The 
-assumption is that the expert is successful at performing the 
-task of interest and that a controller trained to mimic the 
-expert will also perform the task well. This is a supervised 
-learning problem, in which the parameters  of a function 
-approximator F(o; ) must be optimized to fit the mapping 
-of observations to actions: 
-An implicit assumption behind this formulation is that 
-the expert’s actions are fully explained by the observations;​ 
-that is, there exists a function E that maps observations 
-to the expert’s actions: ai = E(oi). If this assumption 
-holds, a sufficiently expressive approximator will be able 
-to fit the function E given enough data. This explains the 
-success of imitation learning on tasks such as lane following. 
-However, in more complex scenarios the assumption that the 
-mapping of observations to actions is a function breaks down. 
-Consider a driver approaching an intersection. The driver’s 
-subsequent actions are not explained by the observations,​ but 
-are additionally affected by the driver’s internal state, such as 
-the intended destination. The same observations could lead to 
-different actions, depending on this latent state. This could be 
-modeled as stochasticity,​ but a stochastic formulation misses 
-the underlying causes of the behavior. Moreover, even if a 
-controller trained to imitate demonstrations of urban driving 
-did learn to make turns and avoid collisions, it would still 
-not constitute a useful driving system. It would wander the 
-streets, making arbitrary decisions at intersections. A passenger 
-in such a vehicle would not be able to communicate 
-the intended direction of travel to the controller, or give it 
-commands regarding which turns to take. 
-To address this, we begin by explicitly modeling the 
-expert’s internal state by a vector h, which together with 
-the observation explains the expert’s action: ai = E(oi; hi). 
-Vector h can include information about the expert’s intentions, 
-goals, and prior knowledge. The standard imitation 
-learning objective can then be rewritten as 
-It is now clear that the expert’s action is affected by information 
-that is not provided to the controller F. 
Navigation