Differences

This shows you the differences between two versions of the page.

--- Member:sungbeanJo_paper [2021/03/05 12:39]
sungbean
+++ Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
@@ Line 1: / Line 1: @@
-Reinforcement learning (RL) instead assumes that drivers
+get_config_param active timestamp_mode
-in the real world follow an expert policy E whose actions
+	TIME_FROM_INTERNAL_OSC
-maximize the expected, global return
+get_config_param active multipurpose_io_mode
-weighted by a discount factor
+	OUTPUT_OFF
-[0; 1). The local reward
+get_config_param active sync_pulse_in_polarity
-function r(st; at) may be unknown, but fully characterizes
+	ACTIVE_LOW
-expert behavior such that any policy optimizing R(; r) will
+get_config_param active nmea_in_polarity
-perform indistinguishably from E.
+	ACTIVE_HIGH
-Learning with respect to R(; r) has several advantages
+get_config_param active nmea_baud_rate
-over maximum likelihood BC in the context of sequential
+	BAUD_9600
-decision making [21]. First, r(st; at) is defined for all stateaction
-pairs, allowing an agent to receive a learning signal
-even from unusual states. In contrast, BC only receives a
-learning signal for those states represented in a labeled, finite
-dataset. Second, unlike labels, rewards allow a learner to
-establish preferences between mildly undesirable behavior
-(e.g., hard braking) and extremely undesirable behavioral
-(e.g., collisions). And finally, RL maximizes the global, expected
-return on a trajectory, rather than local instructions for
-each observation. Once preferences are learned, a policy may
-take mildly undesirable actions now in order to avoid awful
-situations later. As such, reinforcement learning algorithms
-provide robustness against cascading errors.

Trace:

Differences

Search

Navigation