Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_paper [2021/03/05 12:39]
sungbean
Member:sungbeanJo_paper [2021/04/21 22:08] (current)
sungbean
Line 1: Line 1:
-Reinforcement learning (RL) instead assumes that drivers +get_config_param active timestamp_mode  
-in the real world follow an expert policy E whose actions + TIME_FROM_INTERNAL_OSC 
-maximize the expected, global return +get_config_param active multipurpose_io_mode 
-weighted by a discount factor ​ + OUTPUT_OFF ​ 
- 2 [0; 1). The local reward +get_config_param active sync_pulse_in_polarity 
-function r(st; at) may be unknown, but fully characterizes + ACTIVE_LOW 
-expert behavior such that any policy optimizing R(; r) will +get_config_param active nmea_in_polarity 
-perform indistinguishably from E. + ACTIVE_HIGH 
-Learning with respect to R(; r) has several advantages +get_config_param active nmea_baud_rate 
-over maximum likelihood BC in the context of sequential + BAUD_9600 
-decision making [21]. First, r(st; at) is defined for all stateaction +
-pairs, allowing an agent to receive a learning signal +
-even from unusual states. In contrast, BC only receives a +
-learning signal for those states represented in a labeled, finite +
-dataset. Second, unlike labels, rewards allow a learner to +
-establish preferences between mildly undesirable behavior +
-(e.g., hard braking) and extremely undesirable behavioral +
-(e.g., collisions). And finally, RL maximizes the global, expected +
-return on a trajectory, rather than local instructions for +
-each observation. Once preferences are learned, a policy may +
-take mildly undesirable actions now in order to avoid awful +
-situations later. As such, reinforcement learning algorithms +
-provide robustness against cascading errors.+
Navigation