This is an old revision of the document!
Here, we propose a method that accurately and reliably detects traffic lights from a stream of images captured by a front-view dash-cam attached to the windshield. As we depicted in Figure 2, the proposed method contains two major steps: (1) coarse-grained traffic light detector (Section III-B) and (2) spatiotemporal filtering (Section III-D) of the traffic lights candidates. In the first step (coarse-grained detector), traffic light candidates from each image are collected by utilizing a deep neural object detection architecture. The main focus of this step is to discover the true traffic lights as many as possible (i.e., reducing the number of false negatives). Thus, it is possible that the traffic light candidate collection may contain false positives. In the second step (spatiotemporal filtering), we eliminate such erroneously detected traffic lights by simultaneously considering other traffic lights over time and space. To distinguish between true and false traffic lights, we use a point-based reward system where each detected traffic lights earn rewards with respect to features extracted from both spatial and temporal domains.