Differences

This shows you the differences between two versions of the page.

--- Member:sungbeanJo_paper [2021/03/04 22:09]
sungbean
+++ Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
@@ Line 1: / Line 1: @@
-Abstract—Deep networks trained on demonstrations of human
+get_config_param active timestamp_mode
-driving have learned to follow roads and avoid obstacles.
+	TIME_FROM_INTERNAL_OSC
-However, driving policies trained via imitation learning cannot
+get_config_param active multipurpose_io_mode
-be controlled at test time. A vehicle trained end-to-end to imitate
+	OUTPUT_OFF
-an expert cannot be guided to take a specific turn at an upcoming
+get_config_param active sync_pulse_in_polarity
-intersection. This limits the utility of such systems. We
+	ACTIVE_LOW
-propose to condition imitation learning on high-level command
+get_config_param active nmea_in_polarity
-input. At test time, the learned driving policy functions as a
+	ACTIVE_HIGH
-chauffeur that handles sensorimotor coordination but continues
+get_config_param active nmea_baud_rate
-to respond to navigational commands. We evaluate different
+	BAUD_9600
-architectures for conditional imitation learning in vision-based
-driving. We conduct experiments in realistic three-dimensional
-simulations of urban driving and on a 1/5 scale robotic truck
-that is trained to drive in a residential area. Both systems
-drive based on visual input yet remain responsive to high-level
-navigational commands.
-Imitation learning is receiving renewed interest as a
-promising approach to training autonomous driving systems.
-Demonstrations of human driving are easy to collect
-at scale. Given such demonstrations, imitation learning can
-be used to train a model that maps perceptual inputs to
-control commands; for example, mapping camera images to
-steering and acceleration. This approach has been applied to
-lane following [27], [4] and off-road obstacle avoidance
-However, these systems have characteristic limitations. For
-example, the network trained by Bojarski et al. [4] was given
-control over lane and road following only. When a lane
-change or a turn from one road to another were required,
-the human driver had to take control
-Why has imitation learning not scaled up to fully autonomous
-urban driving? One limitation is in the assumption
-that the optimal action can be inferred from the perceptual
-input alone. This assumption often does not hold in practice:
-for instance, when a car approaches an intersection, the
-camera input is not sufficient to predict whether the car
-should turn left, right, or go straight. Mathematically, the
-mapping from the image to the control command is no longer
-a function. Fitting a function approximator is thus bound to
-run into difficulties. This had already been observed in the
-work of Pomerleau: “Currently upon reaching a fork, the
-network may output two widely discrepant travel directions,
-one for each choice. The result is often an oscillation in
-the dictated travel direction” [27]. Even if the network can
-resolve the ambiguity in favor of some course of action, it
-may not be the one desired by the passenger, who lacks a
-communication channel for controlling the network itself.
-In this paper, we address this challenge with conditional
-imitation learning. At training time, the model is given
-not only the perceptual input and the control signal, but
-also a representation of the expert’s intention. At test time,
-the network can be given corresponding commands, which
-resolve the ambiguity in the perceptuomotor mapping and
-allow the trained model to be controlled by a passenger
-or a topological planner, just as mapping applications and
-passengers provide turn-by-turn directions to human drivers.
-The trained network is thus freed from the task of planning
-and can devote its representational capacity to driving. This
-enables scaling imitation learning to vision-based driving in
-complex urban environments.
-We evaluate the presented approach in realistic simulations
-of urban driving and on a 1/5 scale robotic truck. Both
-systems are shown in Figure 1. Simulation allows us to
-thoroughly analyze the importance of different modeling
-decisions, carefully compare the approach to relevant baselines,
-and conduct detailed ablation studies. Experiments
-with the physical system demonstrate that the approach can
-be successfully deployed in the physical world. Recordings
-of both systems are provided in the supplementary video.
-We begin by describing the standard imitation learning
-setup and then proceed to our command-conditional formulation.
-Consider a controller that interacts with the environment
-over discrete time steps. At each time step t, the controller
-receives an observation ot and takes an action at. The basic
-idea behind imitation learning is to train a controller that
-mimics an expert. The training data is a set of observationaction
-pairs D = fhoi; aiigNi
-=1 generated by the expert. The
-assumption is that the expert is successful at performing the
-task of interest and that a controller trained to mimic the
-expert will also perform the task well. This is a supervised
-learning problem, in which the parameters  of a function
-.
-An implicit assumption behind this formulation is that
-the expert’s actions are fully explained by the observations;
-that is, there exists a function E that maps observations
-to the expert’s actions: ai = E(oi). If this assumption
-holds, a sufficiently expressive approximator will be able
-to fit the function E given enough data. This explains the
-success of imitation learning on tasks such as lane following.
-However, in more complex scenarios the assumption that the
-mapping of observations to actions is a function breaks down.
-Consider a driver approaching an intersection. The driver’s
-subsequent actions are not explained by the observations, but
-are additionally affected by the driver’s internal state, such as
-the intended destination. The same observations could lead to
-different actions, depending on this latent state. This could be
-modeled as stochasticity, but a stochastic formulation misses
-the underlying causes of the behavior. Moreover, even if a
-controller trained to imitate demonstrations of urban driving
-did learn to make turns and avoid collisions, it would still
-not constitute a useful driving system. It would wander the
-streets, making arbitrary decisions at intersections. A passenger
-in such a vehicle would not be able to communicate
-the intended direction of travel to the controller, or give it
-commands regarding which turns to take.
-To address this, we begin by explicitly modeling the
-expert’s internal state by a vector h, which together with
-the observation explains the expert’s action: ai = E(oi; hi).
-Vector h can include information about the expert’s intentions,
-goals, and prior knowledge. The standard imitation
-learning objective can then be rewritten as
-It is now clear that the expert’s action is affected by information
-that is not provided to the controller F.
-We expose the latent state h to the controller by introducing
-an additional command input: c = c(h). At training
-time, the command c is provided by the expert. It need
-not constitute the entire latent state h, but should provide
-useful information about the expert’s decision-making. For
-example, human drivers already use turn signals to communicate
-their intent when approaching intersections; these
-turn signals can be used as commands in our formulation.
-At test time, commands can be used to affect the behavior of
-the controller. These test-time commands can come from a
-human user or a planning module. In urban driving, a typical
-command would be “turn right at the next intersection”,
-which can be provided by a navigation system or a passenger.
-The training dataset becomes D = fhoi; ci; aiigNi
-=1.

Trace:

Differences

Search

Navigation