Differences

This shows you the differences between two versions of the page.

--- Member:sungbeanJo_tran [2021/01/17 14:38]
sungbean
+++ Member:sungbeanJo_tran [2021/01/18 16:21] (current)
sungbean
@@ Line 1: / Line 1: @@
-We use an input image that is resized to 288×512×3 with
+Choosing scales and aspect ratios for default boxes To handle different object scales,
-bilinear interpolation algorithm, hence to reduce computational burdens for a real-time detection. For the images with
+some methods [4,9] suggest processing the image at different sizes and combining the
-different aspect ratios, we cropped the height to match the
+results afterwards. However, by utilizing feature maps from several different layers in a
-ratio. Following a common practice in image classification
+single network for prediction we can mimic the same effect, while also sharing parameters across all object scales. Previous works [10,11] have shown that using feature maps
-tasks, we subtracted the mean RGB value to achieve zerocentered inputs, which are originally in different scales. Note
+from the lower layers can improve semantic segmentation quality because the lower
-that our dataset contains images where the camera gains
+layers capture more fine details of the input objects. Similarly, [12] showed that adding
-are automatically calibrated to obtain high-quality images.
+global context pooled from a feature map can help smooth the segmentation results.
-During the testing process, we also used a cropped image
+Motivated by these methods, we use both the lower and upper feature maps for detection. Figure 1 shows two exemplar feature maps (8×8 and 4×4) which are used in the
-in the center part of the image, where traffic lights are
+framework. In practice, we can use many more with small computational overhead.
-commonly observed in that area. Thus, a batch of two images
+Feature maps from different levels within a network are known to have different
-(i.e., whole and cropped images) are fed into our detector
+(empirical) receptive field sizes [13]. Fortunately, within the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields of each
+layer. We design the tiling of default boxes so that specific feature maps learn to be
+responsive to particular scales of the objects. Suppose we want to use m feature maps
+for prediction. The scale of the default boxes for each feature map is computed as:

Trace:

Differences

Search

Navigation