This is an old revision of the document!
We use an input image that is resized to 288×512×3 with bilinear interpolation algorithm, hence to reduce computational burdens for a real-time detection. For the images with different aspect ratios, we cropped the height to match the ratio. Following a common practice in image classification tasks, we subtracted the mean RGB value to achieve zerocentered inputs, which are originally in different scales. Note that our dataset contains images where the camera gains are automatically calibrated to obtain high-quality images. During the testing process, we also used a cropped image in the center part of the image, where traffic lights are commonly observed in that area. Thus, a batch of two images (i.e., whole and cropped images) are fed into our detector.