Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_tran [2021/01/17 14:34]
sungbean
Member:sungbeanJo_tran [2021/01/18 16:21] (current)
sungbean
Line 1: Line 1:
-Traffic light detector strongly requires showing reliable performance in real-time ​and working ​for both small +Choosing scales ​and aspect ratios ​for default boxes To handle different object scales, 
-(i.e.3x9 pixels) and large objects with low false positive ​and +some methods [4,9] suggest processing the image at different sizes and combining the 
-low false negative rateswhile maintaining ​high detection +results afterwards. Howeverby utilizing feature maps from several different layers in 
-accuracy. For example, a false red traffic light will lead the +single network for prediction we can mimic the same effect, ​while also sharing parameters across all object scales. Previous works [10,11] have shown that using feature maps 
-autonomous vehicle to abruptly stop while drivingwhile +from the lower layers can improve semantic segmentation quality because ​the lower 
-a missed red light will cause the vehicle to go through an +layers capture more fine details ​of the input objectsSimilarly, [12showed ​that adding 
-intersection originally with red lights in its course ​of driving. +global context pooled from a feature map can help smooth the segmentation results. 
-In this coarse-grained traffic light detection stepwe focus +Motivated by these methods, ​we use both the lower and upper feature maps for detection. Figure 1 shows two exemplar feature maps (8×8 and 4×4) which are used in the 
-to reduce false negative (FN) rates or to collect as many true +frameworkIn practice, we can use many more with small computational overhead
-traffic lights as possible. We utilize the Single-Shot multi-box +Feature maps from different levels within a network are known to have different 
-Detector (SSD) [5] that has been shown to be an effective +(empirical) receptive field sizes [13]. Fortunatelywithin the SSD framework, the default ​boxes do not necessary need to correspond to the actual receptive fields ​of each 
-tool for an object detection taskNote that we use the SSD +layerWe design the tiling of default ​boxes so that specific feature maps learn to be 
-architecture that has shown improved ​detection ​accuracy in +responsive to particular scales ​of the objectsSuppose ​we want to use m feature maps 
-other benchmarks than YOLO network architecture, ​which +for prediction. The scale of the default boxes for each feature map is computed ​as:
-was utilized ​in the existing work by Behrendt et al[1]More +
-modern architecture,​ such as Mask R-CNN [6], may provide +
-better detection accuracybut we leave this comparison for +
-future work. The SSD model is based on a convolutional +
-network and takes the whole image as an input and predicts +
-a fixed-size collection of bounding ​boxes and corresponding +
-confident scores for the presence ​of object instances in +
-those boxesThe final detections are then produced followed +
-by a non-maximum suppression step – all detection ​boxes +
-are sorted on the basis of their predicted scores, and the +
-detections with maximum score is then selected, while other +
-detections with a significant overlap are suppressedAs we +
-described in Figure 2, we use a standard VGG-16 network +
-architecture [7] as a base convolutional network, which is +
-pre-trained on ImageNet Large Scale Visual Recognition+
Navigation