This is an old revision of the document!
For all controllers, the observation o is the currently observed image at 20088 pixel resolution. For the measurement m, we used the current speed of the car, if available (in the physical system the speed estimates were very noisy and we refrained from using them). All networks are composed of modules with identical architectures (e.g., the ConvNet architecture is the same in all conditions). The differences are in the configuration of modules and branches as can be seen in Figure 3. The image module consists of 8 convolutional and 2 fully connected layers. The convolution kernel size is 5 in the first layer and 3 in the following layers. The first, third, and fifth convolutional layers have a stride of 2. The number of channels increases from 32 in the first convolutional layer to 256 in the last. Fully-connected layers contain 512 units each. All modules with the exception of the image module are implemented as standard multilayer perceptrons. We used ReLU nonlinearities after all hidden layers, performed batch normalization after convolutional layers, applied 50% dropout after fully-connected hidden layers, and used 20% dropout after convolutional layers. Actions are two-dimensional vectors that collate steering angle and acceleration: a = hs; ai. Given a predicted action a and a ground truth action agt, the per-sample loss function is defined as All models were trained using the Adam solver [16] with minibatches of 120 samples and an initial learning rate of 0:0002. For the command-conditional models, minibatches were constructed to contain an equal number of samples with each command.