Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
Member:sungbeanJo_paper [2021/03/05 12:39] sungbean |
Member:sungbeanJo_paper [2021/04/21 22:08] (current) sungbean |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Reinforcement learning (RL) instead assumes that drivers | + | get_config_param active timestamp_mode |
- | in the real world follow an expert policy E whose actions | + | TIME_FROM_INTERNAL_OSC |
- | maximize the expected, global return | + | get_config_param active multipurpose_io_mode |
- | weighted by a discount factor | + | OUTPUT_OFF |
- | 2 [0; 1). The local reward | + | get_config_param active sync_pulse_in_polarity |
- | function r(st; at) may be unknown, but fully characterizes | + | ACTIVE_LOW |
- | expert behavior such that any policy optimizing R(; r) will | + | get_config_param active nmea_in_polarity |
- | perform indistinguishably from E. | + | ACTIVE_HIGH |
- | Learning with respect to R(; r) has several advantages | + | get_config_param active nmea_baud_rate |
- | over maximum likelihood BC in the context of sequential | + | BAUD_9600 |
- | decision making [21]. First, r(st; at) is defined for all stateaction | + | |
- | pairs, allowing an agent to receive a learning signal | + | |
- | even from unusual states. In contrast, BC only receives a | + | |
- | learning signal for those states represented in a labeled, finite | + | |
- | dataset. Second, unlike labels, rewards allow a learner to | + | |
- | establish preferences between mildly undesirable behavior | + | |
- | (e.g., hard braking) and extremely undesirable behavioral | + | |
- | (e.g., collisions). And finally, RL maximizes the global, expected | + | |
- | return on a trajectory, rather than local instructions for | + | |
- | each observation. Once preferences are learned, a policy may | + | |
- | take mildly undesirable actions now in order to avoid awful | + | |
- | situations later. As such, reinforcement learning algorithms | + | |
- | provide robustness against cascading errors. | + |