Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
Member:sungbeanJo_tran [2021/01/17 14:38] sungbean |
Member:sungbeanJo_tran [2021/01/18 16:21] (current) sungbean |
||
---|---|---|---|
Line 1: | Line 1: | ||
- | Here, we propose a method that accurately and reliably | + | Choosing scales and aspect ratios for default boxes To handle different object scales, |
- | detects traffic lights from a stream of images captured by | + | some methods [4,9] suggest processing the image at different sizes and combining the |
- | a front-view dash-cam attached to the windshield. As we | + | results afterwards. However, by utilizing feature maps from several different layers in a |
- | depicted in Figure 2, the proposed method contains two | + | single network for prediction we can mimic the same effect, while also sharing parameters across all object scales. Previous works [10,11] have shown that using feature maps |
- | major steps: (1) coarse-grained traffic light detector (Section | + | from the lower layers can improve semantic segmentation quality because the lower |
- | III-B) and (2) spatiotemporal filtering (Section III-D) of the | + | layers capture more fine details of the input objects. Similarly, [12] showed that adding |
- | traffic lights candidates. In the first step (coarse-grained detector), traffic light candidates from each image are collected | + | global context pooled from a feature map can help smooth the segmentation results. |
- | by utilizing a deep neural object detection architecture. The | + | Motivated by these methods, we use both the lower and upper feature maps for detection. Figure 1 shows two exemplar feature maps (8×8 and 4×4) which are used in the |
- | main focus of this step is to discover the true traffic lights | + | framework. In practice, we can use many more with small computational overhead. |
- | as many as possible (i.e., reducing the number of false | + | Feature maps from different levels within a network are known to have different |
- | negatives). Thus, it is possible that the traffic light candidate | + | (empirical) receptive field sizes [13]. Fortunately, within the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields of each |
- | collection may contain false positives. In the second step | + | layer. We design the tiling of default boxes so that specific feature maps learn to be |
- | (spatiotemporal filtering), we eliminate such erroneously | + | responsive to particular scales of the objects. Suppose we want to use m feature maps |
- | detected traffic lights by simultaneously considering other | + | for prediction. The scale of the default boxes for each feature map is computed as: |
- | traffic lights over time and space. To distinguish between true | + | |
- | and false traffic lights, we use a point-based reward system | + | |
- | where each detected traffic lights earn rewards with respect | + | |
- | to features extracted from both spatial and temporal domains. | + |