Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_tran [2021/01/17 14:38]
sungbean
Member:sungbeanJo_tran [2021/01/18 16:21] (current)
sungbean
Line 1: Line 1:
-We use an input image that is resized to 288×512×3 with +Choosing scales and aspect ratios for default boxes To handle different object scales, 
-bilinear interpolation algorithmhence to reduce computational burdens for a real-time detection. For the images with +some methods [4,9] suggest processing ​the image at different ​sizes and combining ​the 
-different ​aspect ratios, we cropped the height to match the +results afterwardsHowever, by utilizing feature maps from several different layers ​in a 
-ratioFollowing a common practice ​in image classification +single network for prediction ​we can mimic the same effectwhile also sharing parameters across all object ​scales. ​Previous works [10,11] have shown that using feature maps 
-tasks, ​we subtracted ​the mean RGB value to achieve zerocentered inputswhich are originally in different ​scales. ​Note +from the lower layers can improve semantic segmentation ​quality ​because the lower 
-that our dataset contains images where the camera gains +layers capture more fine details of the input objectsSimilarly, [12] showed that adding 
-are automatically calibrated to obtain high-quality ​images+global context pooled from a feature map can help smooth ​the segmentation results. 
-During ​the testing process, we also used a cropped image +Motivated by these methods, we use both the lower and upper feature maps for detection. Figure 1 shows two exemplar feature maps (8×8 and 4×4) which are used in the 
-in the center part of the imagewhere traffic lights ​are +framework. In practicewe can use many more with small computational overhead. 
-commonly observed in that areaThusa batch of two images +Feature maps from different levels within a network ​are known to have different 
-(i.e., whole and cropped images) are fed into our detector+(empirical) receptive field sizes [13]Fortunatelywithin the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields ​of each 
 +layerWe design the tiling of default boxes so that specific feature maps learn to be 
 +responsive to particular scales of the objectsSuppose we want to use m feature maps 
 +for prediction. The scale of the default boxes for each feature map is computed as:
Navigation