Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Member:sungbeanJo_tran [2021/01/17 14:35]
sungbean
Member:sungbeanJo_tran [2021/01/18 16:21] (current)
sungbean
Line 1: Line 1:
-Auxiliary structures – convolutional predictor ​and the additional convolutional ​feature +Choosing scales ​and aspect ratios for default boxes To handle different object scales, 
-extractor – are used following ​the work by Liu et al. [5]. +some methods [4,9] suggest processing ​the image at different sizes and combining the 
-1) Training objective: The loss function L (= Lloc+Lconf) +results afterwards. However, by utilizing ​feature ​maps from several different layers in a 
-is a weighted sum of two types of loss: (1) the localization +single network for prediction we can mimic the same effect, while also sharing parameters across all object scalesPrevious works [10,11have shown that using feature maps 
-loss Lloc measures ​Smooth L1 loss between ​the predicted +from the lower layers can improve semantic segmentation quality because the lower 
-and the ground-truth bounding box in a feature ​space. (2+layers capture more fine details ​of the input objects. Similarly, [12] showed that adding 
-The confidence loss Lconf is a softmax loss over multiple +global context pooled from feature map can help smooth ​the segmentation results. 
-classes confidencesFor more rigorous details, refer to [5] +Motivated by these methods, we use both the lower and upper feature ​maps for detectionFigure 1 shows two exemplar feature maps (8×8 and 4×4which are used in the 
-2) Data augmentation:​ To train robust detector ​to +frameworkIn practice, we can use many more with small computational overhead. 
-various object ​sizes, we use random cropping (the size of +Feature maps from different levels within ​network are known to have different 
-each sampled image is [0.51] of the original image size +(empirical) receptive field sizes [13]Fortunatelywithin ​the SSD framework, the default boxes do not necessary need to correspond to the actual receptive fields of each 
-with fixed aspect ratio) and flipping ​to yield consistent +layerWe design the tiling of default boxes so that specific feature maps learn to be 
-improvementFollowing [5], we also sample an image so +responsive to particular scales of the objects. ​Suppose we want to use m feature maps 
-that the minimum jaccard overlap with the objects ​is {0.1, +for predictionThe scale of the default boxes for each feature map is computed as:
-0.3, 0.5, 0.7, 0.9}. Note that each sampled image is then +
-resized to a fixed size followed by photometric distortions +
-with respect to brightness, contrast, and saturation.+
Navigation