Differences

This shows you the differences between two versions of the page.

--- Member:sungbeanJo_tran [2021/01/17 14:38]
sungbean
+++ Member:sungbeanJo_tran [2021/01/18 16:21] (current)
sungbean
@@ Line 1: / Line 1: @@
-Here, we propose a method that accurately and reliably
+Choosing scales and aspect ratios for default boxes To handle different object scales,
-detects traffic lights from a stream of images captured by
+some methods [4,9] suggest processing the image at different sizes and combining the
-a front-view dash-cam attached to the windshield. As we
+results afterwards. However, by utilizing feature maps from several different layers in a
-depicted in Figure 2, the proposed method contains two
+single network for prediction we can mimic the same effect, while also sharing parameters across all object scales. Previous works [10,11] have shown that using feature maps
-major steps: (1) coarse-grained traffic light detector (Section
+from the lower layers can improve semantic segmentation quality because the lower
-III-B) and (2) spatiotemporal filtering (Section III-D) of the
+layers capture more fine details of the input objects. Similarly, [12] showed that adding
-traffic lights candidates. In the first step (coarse-grained detector), traffic light candidates from each image are collected
+global context pooled from a feature map can help smooth the segmentation results.
-by utilizing a deep neural object detection architecture. The
+Motivated by these methods, we use both the lower and upper feature maps for detection. Figure 1 shows two exemplar feature maps (8×8 and 4×4) which are used in the
-main focus of this step is to discover the true traffic lights
+framework. In practice, we can use many more with small computational overhead.
-as many as possible (i.e., reducing the number of false
+Feature maps from different levels within a network are known to have different
-negatives). Thus, it is possible that the traffic light candidate
+(empirical) receptive field sizes [13]. Fortunately, within the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields of each
-collection may contain false positives. In the second step
+layer. We design the tiling of default boxes so that specific feature maps learn to be
-(spatiotemporal filtering), we eliminate such erroneously
+responsive to particular scales of the objects. Suppose we want to use m feature maps
-detected traffic lights by simultaneously considering other
+for prediction. The scale of the default boxes for each feature map is computed as:
-traffic lights over time and space. To distinguish between true
-and false traffic lights, we use a point-based reward system
-where each detected traffic lights earn rewards with respect
-to features extracted from both spatial and temporal domains.

Trace:

Differences

Search

Navigation