- Object localization
Classification with localization
Defining the target label y
- Landmark detection
output more landmarks(x, y) to detect the key point of the image.
- Object detection
Using sliding windows detection algorithm
- Convolutional implementation of sliding windows
[Sermanet et al., 2014, Overfeat:Integrated recognition, localization and detection using convolutional networks]
Turning FC layer into convolutional layers
FC = 1 * 1 * channel
- Bounding box predictions
YOLO algorithm, [Redmon et al. 2015, You Only Look Once: Unified real-time object detection]
- look at the mid point of an object
- assign that object to whichever one grid cell contains the id point of the object
- using image classification and localizaiton algorithm to bounding box the object
- Intersection over union
More generally, IoU is a measure of the overlap between two bounding boxes.
- Non-max suppression
Each output prediction is: [pc, bx, by, bh, bw], discard all boxes with pc<= 0.6
While there are any remaining boxes:
. Pick the box with the largest pc Output that as a prediction.
. Discard any remaining box with IoU >= 0.5 with the box output in the previous step
- Anchor boxes
[Redmon et al., 2015, You Onlu Look Once: Unified real-time object detection]
- for each grid call, get 2 predicted bounding boxes.
- get rid of low probability predictions
- for each class(pedestrian, car, motorcycle) use non-max suppression to generate final predictions.
- Region proposals