Specify ROIAlign-9 (#11067)

This commit is contained in:
Bo Liu 2022-05-05 16:27:58 +08:00 committed by GitHub
parent e68613a2fc
commit d560cf19a3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 121 additions and 1 deletions

View File

@ -161,6 +161,7 @@
RNNCell-3 <openvino_docs_ops_sequence_RNNCell_3>
RNNSequence-5 <openvino_docs_ops_sequence_RNNSequence_5>
ROIAlign-3 <openvino_docs_ops_detection_ROIAlign_3>
ROIAlign-9 <openvino_docs_ops_detection_ROIAlign_9>
ROIPooling-1 <openvino_docs_ops_detection_ROIPooling_1>
Roll-7 <openvino_docs_ops_movement_Roll_7>
Round-5 <openvino_docs_ops_arithmetic_Round_5>

View File

@ -0,0 +1,119 @@
# ROIAlign {#openvino_docs_ops_detection_ROIAlign_9}
**Versioned name**: *ROIAlign-9*
**Category**: *Object detection*
**Short description**: *ROIAlign* is a *pooling layer* used over feature maps of non-uniform input sizes and outputs a feature map of a fixed size.
**Detailed description**: [Reference](https://arxiv.org/abs/1703.06870).
*ROIAlign* performs the following for each Region of Interest (ROI) for each input feature map:
1. Multiply box coordinates with *spatial_scale* to produce box coordinates relative to the input feature map size based on *aligned_mode* attribute.
2. Divide the box into bins according to the *sampling_ratio* attribute.
3. Apply bilinear interpolation with 4 points in each bin and apply maximum or average pooling based on *mode* attribute to produce output feature map element.
**Attributes**
* *pooled_h*
* **Description**: *pooled_h* is the height of the ROI output feature map.
* **Range of values**: a positive integer
* **Type**: `int`
* **Required**: *yes*
* *pooled_w*
* **Description**: *pooled_w* is the width of the ROI output feature map.
* **Range of values**: a positive integer
* **Type**: `int`
* **Required**: *yes*
* *sampling_ratio*
* **Description**: *sampling_ratio* is the number of bins over height and width to use to calculate each output feature map element. If the value
is equal to 0 then use adaptive number of elements over height and width: `ceil(roi_height / pooled_h)` and `ceil(roi_width / pooled_w)` respectively.
* **Range of values**: a non-negative integer
* **Type**: `int`
* **Required**: *yes*
* *spatial_scale*
* **Description**: *spatial_scale* is a multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling.
* **Range of values**: a positive floating-point number
* **Type**: `float`
* **Required**: *yes*
* *mode*
* **Description**: *mode* specifies a method to perform pooling to produce output feature map elements.
* **Range of values**:
* *max* - maximum pooling
* *avg* - average pooling
* **Type**: string
* **Required**: *yes*
* *aligned_mode*
* **Description**: *aligned_mode* specifies how to transform the coordinate in original tensor to the resized tensor.
* **Range of values**: name of the transformation mode in string format (here spatial_scale is resized_shape[x] / original_shape[x], resized_shape[x] is the shape of resized tensor in axis x, original_shape[x] is the shape of original tensor in axis x and x_original is a coordinate in axis x, for any axis x from the input axes):
* *asymmetric* - the coordinate in the resized tensor axis x is calculated according to the formula x_original * spatial_scale
* *tf_half_pixel_for_nn* - the coordinate in the resized tensor axis x is x_original * spatial_scale - 0.5
* *half_pixel* - the coordinate in the resized tensor axis x is calculated as ((x_original + 0.5) * spatial_scale) - 0.5
* **Type**: string
* **Default value**: asymmetric
* **Required**: *no*
**Inputs**:
* **1**: 4D input tensor of shape `[N, C, H, W]` with feature maps of type *T*. **Required.**
* **2**: 2D input tensor of shape `[NUM_ROIS, 4]` describing box consisting of 4 element tuples: `[x_1, y_1, x_2, y_2]` in relative coordinates of type *T*.
The box height and width are calculated the following way:
* If *aligned_mode* equals *asymmetric*: `roi_width = max(spatial_scale * (x_2 - x_1), 1.0)`, `roi_height = max(spatial_scale * (y_2 - y_1), 1.0)`, so the malformed boxes are expressed as a box of size `1 x 1`.
* else: `roi_width = spatial_scale * (x_2 - x_1)`, `roi_height = spatial_scale * (y_2 - y_1)`.
* **Required.**
* **3**: 1D input tensor of shape `[NUM_ROIS]` with batch indices of type *IND_T*. **Required.**
**Outputs**:
* **1**: 4D output tensor of shape `[NUM_ROIS, C, pooled_h, pooled_w]` with feature maps of type *T*.
**Types**
* *T*: any supported floating-point type.
* *IND_T*: any supported integer type.
**Example**
```xml
<layer ... type="ROIAlign" ... >
<data pooled_h="6" pooled_w="6" spatial_scale="16.0" sampling_ratio="2" mode="avg" aligned_mode="half_pixel"/>
<input>
<port id="0">
<dim>7</dim>
<dim>256</dim>
<dim>200</dim>
<dim>200</dim>
</port>
<port id="1">
<dim>1000</dim>
<dim>4</dim>
</port>
<port id="2">
<dim>1000</dim>
</port>
</input>
<output>
<port id="3" precision="FP32">
<dim>1000</dim>
<dim>256</dim>
<dim>6</dim>
<dim>6</dim>
</port>
</output>
</layer>
```

View File

@ -144,7 +144,7 @@ declared in `namespace opset9`.
* [ReverseSequence](movement/ReverseSequence_1.md)
* [RNNCell](sequence/RNNCell_3.md)
* [RNNSequence](sequence/RNNSequence_5.md)
* [ROIAlign](detection/ROIAlign_3.md)
* [ROIAlign](detection/ROIAlign_9.md)
* [ROIPooling](detection/ROIPooling_1.md)
* [Roll](movement/Roll_7.md)
* [Round](arithmetic/Round_5.md)