Specify ROIAlign-9 (#11067)

2022-05-05 16:27:58 +08:00 · 2022-05-05 16:27:58 +08:00 · d560cf19a3
commit d560cf19a3
parent e68613a2fc
3 changed files with 121 additions and 1 deletions
--- a/docs/OV_Runtime_UG/Operations_specifications.md
+++ b/docs/OV_Runtime_UG/Operations_specifications.md
@ -161,6 +161,7 @@
   RNNCell-3 <openvino_docs_ops_sequence_RNNCell_3>
   RNNSequence-5 <openvino_docs_ops_sequence_RNNSequence_5>
   ROIAlign-3 <openvino_docs_ops_detection_ROIAlign_3>
+   ROIAlign-9 <openvino_docs_ops_detection_ROIAlign_9>
   ROIPooling-1 <openvino_docs_ops_detection_ROIPooling_1>
   Roll-7 <openvino_docs_ops_movement_Roll_7>
   Round-5 <openvino_docs_ops_arithmetic_Round_5>
--- a/docs/ops/detection/ROIAlign_9.md
+++ b/docs/ops/detection/ROIAlign_9.md
@ -0,0 +1,119 @@
+# ROIAlign {#openvino_docs_ops_detection_ROIAlign_9}
+
+**Versioned name**: *ROIAlign-9*
+
+**Category**: *Object detection*
+
+**Short description**: *ROIAlign* is a *pooling layer* used over feature maps of non-uniform input sizes and outputs a feature map of a fixed size.
+
+**Detailed description**: [Reference](https://arxiv.org/abs/1703.06870).
+
+*ROIAlign* performs the following for each Region of Interest (ROI) for each input feature map:
+1. Multiply box coordinates with *spatial_scale* to produce box coordinates relative to the input feature map size based on *aligned_mode* attribute.
+2. Divide the box into bins according to the *sampling_ratio* attribute.
+3. Apply bilinear interpolation with 4 points in each bin and apply maximum or average pooling based on *mode* attribute to produce output feature map element.
+
+**Attributes**
+
+* *pooled_h*
+
+  * **Description**: *pooled_h* is the height of the ROI output feature map.
+  * **Range of values**: a positive integer
+  * **Type**: `int`
+  * **Required**: *yes*
+
+* *pooled_w*
+
+  * **Description**: *pooled_w* is the width of the ROI output feature map.
+  * **Range of values**: a positive integer
+  * **Type**: `int`
+  * **Required**: *yes*
+
+* *sampling_ratio*
+
+  * **Description**: *sampling_ratio* is the number of bins over height and width to use to calculate each output feature map element. If the value
+  is equal to 0 then use adaptive number of elements over height and width: `ceil(roi_height / pooled_h)` and `ceil(roi_width / pooled_w)` respectively.
+  * **Range of values**: a non-negative integer
+  * **Type**: `int`
+  * **Required**: *yes*
+
+* *spatial_scale*
+
+  * **Description**: *spatial_scale* is a multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling.
+  * **Range of values**: a positive floating-point number
+  * **Type**: `float`
+  * **Required**: *yes*
+
+* *mode*
+
+  * **Description**: *mode* specifies a method to perform pooling to produce output feature map elements.
+  * **Range of values**:
+    * *max* - maximum pooling
+    * *avg* - average pooling
+  * **Type**: string
+  * **Required**: *yes*
+
+* *aligned_mode*
+
+  * **Description**: *aligned_mode* specifies how to transform the coordinate in original tensor to the resized tensor.
+  * **Range of values**: name of the transformation mode in string format (here spatial_scale is resized_shape[x] / original_shape[x], resized_shape[x] is the shape of resized tensor in axis x, original_shape[x] is the shape of original tensor in axis x and x_original is a coordinate in axis x, for any axis x from the input axes):
+    * *asymmetric* - the coordinate in the resized tensor axis x is calculated according to the formula x_original * spatial_scale
+    * *tf_half_pixel_for_nn* - the coordinate in the resized tensor axis x is x_original * spatial_scale - 0.5
+    * *half_pixel* - the coordinate in the resized tensor axis x is calculated as ((x_original + 0.5) * spatial_scale) - 0.5
+  * **Type**: string
+  * **Default value**: asymmetric  
+  * **Required**: *no*
+
+**Inputs**:
+
+*   **1**: 4D input tensor of shape `[N, C, H, W]` with feature maps of type *T*. **Required.**
+
+*   **2**: 2D input tensor of shape `[NUM_ROIS, 4]` describing box consisting of 4 element tuples: `[x_1, y_1, x_2, y_2]` in relative coordinates of type *T*.
+The box height and width are calculated the following way:
+    * If *aligned_mode* equals *asymmetric*: `roi_width = max(spatial_scale * (x_2 - x_1), 1.0)`, `roi_height = max(spatial_scale * (y_2 - y_1), 1.0)`, so the malformed boxes are expressed as a box of size `1 x 1`.
+    * else: `roi_width = spatial_scale * (x_2 - x_1)`, `roi_height = spatial_scale * (y_2 - y_1)`.
+    * **Required.**
+
+*   **3**: 1D input tensor of shape `[NUM_ROIS]` with batch indices of type *IND_T*. **Required.**
+
+**Outputs**:
+
+*   **1**: 4D output tensor of shape `[NUM_ROIS, C, pooled_h, pooled_w]` with feature maps of type *T*.
+
+**Types**
+
+* *T*: any supported floating-point type.
+
+* *IND_T*: any supported integer type.
+
+
+**Example**
+
+```xml
+<layer ... type="ROIAlign" ... >
+    <data pooled_h="6" pooled_w="6" spatial_scale="16.0" sampling_ratio="2" mode="avg" aligned_mode="half_pixel"/>
+    <input>
+        <port id="0">
+            <dim>7</dim>
+            <dim>256</dim>
+            <dim>200</dim>
+            <dim>200</dim>
+        </port>
+        <port id="1">
+            <dim>1000</dim>
+            <dim>4</dim>
+        </port>
+        <port id="2">
+            <dim>1000</dim>
+        </port>
+    </input>
+    <output>
+        <port id="3" precision="FP32">
+            <dim>1000</dim>
+            <dim>256</dim>
+            <dim>6</dim>
+            <dim>6</dim>
+        </port>
+    </output>
+</layer>
+```
--- a/docs/ops/opset9.md
+++ b/docs/ops/opset9.md
@ -144,7 +144,7 @@ declared in `namespace opset9`.
 * [ReverseSequence](movement/ReverseSequence_1.md)
 * [RNNCell](sequence/RNNCell_3.md)
 * [RNNSequence](sequence/RNNSequence_5.md)
-* [ROIAlign](detection/ROIAlign_3.md)
+* [ROIAlign](detection/ROIAlign_9.md)
 * [ROIPooling](detection/ROIPooling_1.md)
 * [Roll](movement/Roll_7.md)
 * [Round](arithmetic/Round_5.md)