Differences

This shows you the differences between two versions of the page.

--- Member:sungbeanJo_paper [2021/03/09 11:02]
sungbean
+++ Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
@@ Line 1: / Line 1: @@
-We are interested in a specific setting of imitation learning—the problem of learning to perform a
+get_config_param active timestamp_mode
-task from expert demonstrations—in which the learner is given only samples of trajectories from
+	TIME_FROM_INTERNAL_OSC
-the expert, is not allowed to query the expert for more data while training, and is not provided
+get_config_param active multipurpose_io_mode
-reinforcement signal of any kind. There are two main approaches suitable for this setting: behavioral
+	OUTPUT_OFF
-cloning [20], which learns a policy as a supervised learning problem over state-action pairs from
+get_config_param active sync_pulse_in_polarity
-expert trajectories; and inverse reinforcement learning [25, 18], which finds a cost function under
+	ACTIVE_LOW
-which the expert is uniquely optimal.
+get_config_param active nmea_in_polarity
-Behavioral cloning, while appealingly simple, only tends to succeed with large amounts of data, due
+	ACTIVE_HIGH
-to compounding error caused by covariate shift [23, 24]. Inverse reinforcement learning (IRL), on
+get_config_param active nmea_baud_rate
-the other hand, learns a cost function that prioritizes entire trajectories over others, so compounding
+	BAUD_9600
-error, a problem for methods that fit single-timestep decisions, is not an issue. Accordingly, IRL has
-succeeded in a wide range of problems, from predicting behaviors of taxi drivers [31] to planning
-footsteps for quadruped robots [22].
-Unfortunately, many IRL algorithms are extremely expensive to run, requiring reinforcement learning
-in an inner loop. Scaling IRL methods to large environments has thus been the focus of much
-recent work [7, 14]. Fundamentally, however, IRL learns a cost function, which explains expert
-behavior but does not directly tell the learner how to act. Given that learner’s true goal often is to
-take actions imitating the expert—indeed, many IRL algorithms are evaluated on the quality of the
-optimal actions of the costs they learn—why, then, must we learn a cost function, if doing so possibly
-incurs significant computational expense yet fails to directly yield actions?
-We desire an algorithm that tells us explicitly how to act by directly learning a policy. To develop such
-an algorithm, we begin in Section 3, where we characterize the policy given by running reinforcement
-learning on a cost function learned by maximum causal entropy IRL [31, 32]. Our characterization
-introduces a framework for directly learning policies from data, bypassing any intermediate IRL step.
-Then, we instantiate our framework in Sections 4 and 5 with a new model-free imitation learning
-algorithm. We show that our resulting algorithm is intimately connected to generative adversarial
-networks [9], a technique from the deep learning community that has led to recent successes in
-modeling distributions of natural images: our algorithm harnesses generative adversarial training to fit
-distributions of states and actions defining expert behavior. We test our algorithm in Section 6, where
-we find that it outperforms competing methods by a wide margin in training policies for complex,
-high-dimensional physics-based control tasks over various amounts of expert data.

Trace:

Differences

Search

Navigation