Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
|
Member:sungbeanJo_paper [2021/03/05 12:39] sungbean |
Member:sungbeanJo_paper [2021/04/21 22:08] (current) sungbean |
||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | Reinforcement learning (RL) instead assumes that drivers | + | get_config_param active timestamp_mode |
| - | in the real world follow an expert policy E whose actions | + | TIME_FROM_INTERNAL_OSC |
| - | maximize the expected, global return | + | get_config_param active multipurpose_io_mode |
| - | weighted by a discount factor | + | OUTPUT_OFF |
| - | 2 [0; 1). The local reward | + | get_config_param active sync_pulse_in_polarity |
| - | function r(st; at) may be unknown, but fully characterizes | + | ACTIVE_LOW |
| - | expert behavior such that any policy optimizing R(; r) will | + | get_config_param active nmea_in_polarity |
| - | perform indistinguishably from E. | + | ACTIVE_HIGH |
| - | Learning with respect to R(; r) has several advantages | + | get_config_param active nmea_baud_rate |
| - | over maximum likelihood BC in the context of sequential | + | BAUD_9600 |
| - | decision making [21]. First, r(st; at) is defined for all stateaction | + | |
| - | pairs, allowing an agent to receive a learning signal | + | |
| - | even from unusual states. In contrast, BC only receives a | + | |
| - | learning signal for those states represented in a labeled, finite | + | |
| - | dataset. Second, unlike labels, rewards allow a learner to | + | |
| - | establish preferences between mildly undesirable behavior | + | |
| - | (e.g., hard braking) and extremely undesirable behavioral | + | |
| - | (e.g., collisions). And finally, RL maximizes the global, expected | + | |
| - | return on a trajectory, rather than local instructions for | + | |
| - | each observation. Once preferences are learned, a policy may | + | |
| - | take mildly undesirable actions now in order to avoid awful | + | |
| - | situations later. As such, reinforcement learning algorithms | + | |
| - | provide robustness against cascading errors. | + | |