diff --git a/docs/ops/detection/Proposal_1.md b/docs/ops/detection/Proposal_1.md index 65858eb9587..84a0a1c8c93 100644 --- a/docs/ops/detection/Proposal_1.md +++ b/docs/ops/detection/Proposal_1.md @@ -8,7 +8,7 @@ **Detailed description** -*Proposal* has three inputs: a tensor with probabilities whether particular bounding box corresponds to background and foreground, a tensor with logits for each of the bounding boxes, a tensor with input image size in the [`image_height`, `image_width`, `scale_height_and_width`] or [`image_height`, `image_width`, `scale_height`, `scale_width`] format. The produced tensor has two dimensions `[batch_size * post_nms_topn, 5]`. +*Proposal* has three inputs: a tensor with probabilities whether particular bounding box corresponds to background and foreground, a tensor with bbox_deltas for each of the bounding boxes, a tensor with input image size in the [`image_height`, `image_width`, `scale_height_and_width`] or [`image_height`, `image_width`, `scale_height`, `scale_width`] format. The produced tensor has two dimensions `[batch_size * post_nms_topn, 5]`. *Proposal* layer does the following with the input tensor: 1. Generates initial anchor boxes. Left top corner of all boxes is at (0, 0). Width and height of boxes are calculated from *base_size* with *scale* and *ratio* attributes. 2. For each point in the first input tensor: @@ -31,16 +31,14 @@ * **Required**: *yes* * *pre_nms_topn* - - * **Description**: *pre_nms_topn* is the number of bounding boxes before the NMS operation. For example, *pre_nms_topn* equal to 15 means that the minimum box size is 15. + * **Description**: *pre_nms_topn* is the number of bounding boxes before the NMS operation. For example, *pre_nms_topn* equal to 15 means to take top 15 boxes with the highest scores. * **Range of values**: a positive integer number * **Type**: `int` * **Default value**: None * **Required**: *yes* * *post_nms_topn* - - * **Description**: *post_nms_topn* is the number of bounding boxes after the NMS operation. For example, *post_nms_topn* equal to 15 means that the maximum box size is 15. + * **Description**: *post_nms_topn* is the number of bounding boxes after the NMS operation. For example, *post_nms_topn* equal to 15 means to take after NMS top 15 boxes with the highest scores. * **Range of values**: a positive integer number * **Type**: `int` * **Default value**: None @@ -112,7 +110,7 @@ * *box_size_scale* - * **Description**: *box_size_scale* specifies the scale factor applied to logits of box sizes before decoding. + * **Description**: *box_size_scale* specifies the scale factor applied to bbox_deltas of box sizes before decoding. * **Range of values**: a positive floating-point number * **Type**: `float` * **Default value**: 1.0 @@ -120,7 +118,7 @@ * *box_coordinate_scale* - * **Description**: *box_coordinate_scale* specifies the scale factor applied to logits of box coordinates before decoding. + * **Description**: *box_coordinate_scale* specifies the scale factor applied to bbox_deltas of box coordinates before decoding. * **Range of values**: a positive floating-point number * **Type**: `float` * **Default value**: 1.0 @@ -140,7 +138,7 @@ * **1**: 4D input floating point tensor with class prediction scores. Required. -* **2**: 4D input floating point tensor with box logits. Required. +* **2**: 4D input floating point tensor with box bbox_deltas. Required. * **3**: 1D input floating tensor 3 or 4 elements: [`image_height`, `image_width`, `scale_height_and_width`] or [`image_height`, `image_width`, `scale_height`, `scale_width`]. Required. diff --git a/docs/ops/detection/Proposal_4.md b/docs/ops/detection/Proposal_4.md new file mode 100644 index 00000000000..a2008e10f2f --- /dev/null +++ b/docs/ops/detection/Proposal_4.md @@ -0,0 +1,194 @@ +## Proposal + +**Versioned name**: *Proposal-4* + +**Category**: *Object detection* + +**Short description**: *Proposal* operation filters bounding boxes and outputs only those with the highest prediction confidence. + +**Detailed description** + +*Proposal* has three inputs: a 4D tensor of shape `[num_batches, 2*K, H, W]` with probabilities whether particular +bounding box corresponds to background or foreground, a 4D tensor of shape `[num_batches, 4*K, H, W]` with deltas for each +of the bound box, and a tensor with input image size in the `[image_height, image_width, scale_height_and_width]` or +`[image_height, image_width, scale_height, scale_width]` format. `K` is number of anchors and `H, W` are height and +width of the feature map. Operation produces two tensors: +the first mandatory tensor of shape `[batch_size * post_nms_topn, 5]` with proposed boxes and +the second optional tensor of shape `[batch_size * post_nms_topn]` with probabilities (sometimes referred as scores). + +*Proposal* layer does the following with the input tensor: +1. Generates initial anchor boxes. Left top corner of all boxes is at (0, 0). Width and height of boxes are calculated from *base_size* with *scale* and *ratio* attributes. +2. For each point in the first input tensor: + * pins anchor boxes to the image according to the second input tensor that contains four deltas for each box: for *x* and *y* of center, for *width* and for *height* + * finds out score in the first input tensor +3. Filters out boxes with size less than *min_size* +4. Sorts all proposals (*box*, *score*) by score from highest to lowest +5. Takes top *pre_nms_topn* proposals +6. Calculates intersections for boxes and filter out all boxes with \f$intersection/union > nms\_thresh\f$ +7. Takes top *post_nms_topn* proposals +8. Returns top proposals and optionally their probabilities + + +* *base_size* + + * **Description**: *base_size* is the size of the anchor to which *scale* and *ratio* attributes are applied. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* *pre_nms_topn* + * **Description**: *pre_nms_topn* is the number of bounding boxes before the NMS operation. For example, *pre_nms_topn* equal to 15 means to take top 15 boxes with the highest scores. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* *post_nms_topn* + * **Description**: *post_nms_topn* is the number of bounding boxes after the NMS operation. For example, *post_nms_topn* equal to 15 means to take after NMS top 15 boxes with the highest scores. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* *nms_thresh* + + * **Description**: *nms_thresh* is the minimum value of the proposal to be taken into consideration. For example, *nms_thresh* equal to 0.5 means that all boxes with prediction probability less than 0.5 are filtered out. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: None + * **Required**: *yes* + +* *feat_stride* + + * **Description**: *feat_stride* is the step size to slide over boxes (in pixels). For example, *feat_stride* equal to 16 means that all boxes are analyzed with the slide 16. + * **Range of values**: a positive integer + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* *min_size* + + * **Description**: *min_size* is the minimum size of box to be taken into consideration. For example, *min_size* equal 35 means that all boxes with box size less than 35 are filtered out. + * **Range of values**: a positive integer number + * **Type**: `int` + * **Default value**: None + * **Required**: *yes* + +* *ratio* + + * **Description**: *ratio* is the ratios for anchor generation. + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *yes* + +* *scale* + + * **Description**: *scale* is the scales for anchor generation. + * **Range of values**: a list of floating-point numbers + * **Type**: `float[]` + * **Default value**: None + * **Required**: *yes* + +* *clip_before_nms* + + * **Description**: *clip_before_nms* flag that specifies whether to perform clip bounding boxes before non-maximum suppression or not. + * **Range of values**: True or False + * **Type**: `boolean` + * **Default value**: True + * **Required**: *no* + +* *clip_after_nms* + + * **Description**: *clip_after_nms* is a flag that specifies whether to perform clip bounding boxes after non-maximum suppression or not. + * **Range of values**: True or False + * **Type**: `boolean` + * **Default value**: False + * **Required**: *no* + +* *normalize* + + * **Description**: *normalize* is a flag that specifies whether to perform normalization of output boxes to *[0,1]* interval or not. + * **Range of values**: True or False + * **Type**: `boolean` + * **Default value**: False + * **Required**: *no* + +* *box_size_scale* + + * **Description**: *box_size_scale* specifies the scale factor applied to box sizes before decoding. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *no* + +* *box_coordinate_scale* + + * **Description**: *box_coordinate_scale* specifies the scale factor applied to box coordinates before decoding. + * **Range of values**: a positive floating-point number + * **Type**: `float` + * **Default value**: 1.0 + * **Required**: *no* + +* *framework* + + * **Description**: *framework* specifies how the box coordinates are calculated. + * **Range of values**: + * "" (empty string) - calculate box coordinates like in Caffe* + * *tensorflow* - calculate box coordinates like in the TensorFlow* Object Detection API models + * **Type**: string + * **Default value**: "" (empty string) + * **Required**: *no* + +**Inputs**: + +* **1**: 4D tensor of type *T* and shape `[batch_size, 2*K, H, W]` with class prediction scores. Required. + +* **2**: 4D tensor of type *T* and shape `[batch_size, 4*K, H, W]` with deltas for each bounding box. Required. + +* **3**: 1D tensor of type *T* with 3 or 4 elements: `[image_height, image_width, scale_height_and_width]` or `[image_height, image_width, scale_height, scale_width]`. Required. + +**Outputs**4 + +* **1**: tensor of type *T* and shape `[batch_size * post_nms_topn, 5]`. + +* **2**: tensor of type *T* and shape `[batch_size * post_nms_topn]` with probabilities. *Optional*. + +**Types** + +* *T*: floating point type. + +**Example** + +```xml + + + + + 7 + 4 + 28 + 28 + + + 7 + 8 + 28 + 28 + + + 3 + + + + + 7000 + 5 + + + 7000 + + + +``` \ No newline at end of file diff --git a/docs/ops/opset4.md b/docs/ops/opset4.md index 1714eb0a4e5..5802ae97666 100644 --- a/docs/ops/opset4.md +++ b/docs/ops/opset4.md @@ -95,7 +95,7 @@ declared in `namespace opset4`. * [PReLU](activation/PReLU_1.md) * [PriorBoxClustered](detection/PriorBoxClustered_1.md) * [PriorBox](detection/PriorBox_1.md) -* [Proposal](detection/Proposal_1.md) +* [Proposal](detection/Proposal_4.md) * [PSROIPooling](detection/PSROIPooling_1.md) * [Range](generation/Range_4.md) * [ReLU](activation/ReLU_1.md)