Files
openvino/docs/ops/detection/ROIAlign_3.md
Nikolay Tyukaev ef45b5da8d Doc Migration (master) (#1377)
* Doc Migration from Gitlab (#1289)

* doc migration

* fix

* Update FakeQuantize_1.md

* Update performance_benchmarks.md

* Updates graphs for FPGA

* Update performance_benchmarks.md

* Change DL Workbench structure (#1)

* Changed DL Workbench structure

* Fixed tags

* fixes

* Update ie_docs.xml

* Update performance_benchmarks_faq.md

* Fixes in DL Workbench layout

* Fixes for CVS-31290

* [DL Workbench] Minor correction

* Fix for CVS-30955

* Added nGraph deprecation notice as requested by Zoe

* fix broken links in api doxy layouts

* CVS-31131 fixes

* Additional fixes

* Fixed POT TOC

* Update PAC_Configure.md

PAC DCP 1.2.1 install guide.

* Update inference_engine_intro.md

* fix broken link

* Update opset.md

* fix

* added opset4 to layout

* added new opsets to layout, set labels for them

* Update VisionAcceleratorFPGA_Configure.md

Updated from 2020.3 to 2020.4

Co-authored-by: domi2000 <domi2000@users.noreply.github.com>
2020-07-20 17:36:08 +03:00

3.7 KiB

ROIAlign

Versioned name: ROIAlign-3

Category: Object detection

Short description: ROIAlign is a pooling layer used over feature maps of non-uniform input sizes and outputs a feature map of a fixed size.

Detailed description: Reference.

ROIAlign performs the following for each Region of Interest (ROI) for each input feature map:

  1. Multiply box coordinates with spatial_scale to produce box coordinates relative to the input feature map size.
  2. Divide the box into bins according to the sampling_ratio attribute.
  3. Apply bilinear interpolation with 4 points in each bin and apply maximum or average pooling based on mode attribute to produce output feature map element.

Attributes

  • pooled_h

    • Description: pooled_h is the height of the ROI output feature map.
    • Range of values: a positive integer
    • Type: int
    • Default value: None
    • Required: yes
  • pooled_w

    • Description: pooled_w is the width of the ROI output feature map.
    • Range of values: a positive integer
    • Type: int
    • Default value: None
    • Required: yes
  • sampling_ratio

    • Description: sampling_ratio is the number of bins over height and width to use to calculate each output feature map element. If the value is equal to 0 then use adaptive number of elements over height and width: ceil(roi_height / pooled_h) and ceil(roi_width / pooled_w) respectively.
    • Range of values: a non-negative integer
    • Type: int
    • Default value: None
    • Required: yes
  • spatial_scale

    • Description: spatial_scale is a multiplicative spatial scale factor to translate ROI coordinates from their input spatial scale to the scale used when pooling.
    • Range of values: a positive floating-point number
    • Type: float
    • Default value: None
    • Required: yes
  • mode

    • Description: mode specifies a method to perform pooling to produce output feature map elements.
    • Range of values:
      • max - maximum pooling
      • avg - average pooling
    • Type: string
    • Default value: None
    • Required: yes

Inputs:

  • 1: 4D input tensor of shape [N, C, H, W] with feature maps of type T. Required.

  • 2: 2D input tensor of shape [NUM_ROIS, 4] describing box consisting of 4 element tuples: [x_1, y_1, x_2, y_2] in relative coordinates of type T. The box height and width are calculated the following way: roi_width = max(spatial_scale * (x_2 - x_1), 1.0), roi_height = max(spatial_scale * (y_2 - y_1), 1.0), so the malformed boxes are expressed as a box of size 1 x 1. Required.

  • 3: 1D input tensor of shape [NUM_ROIS] with batch indices of type IND_T. Required.

Outputs:

  • 1: 4D output tensor of shape [NUM_ROIS, C, pooled_h, pooled_w] with feature maps of type T.

Types

  • T: any supported floating point type.

  • IND_T: any supported integer type.

Example

<layer ... type="ROIAlign" ... >
    <data pooled_h="6" pooled_w="6" spatial_scale="16.0" sampling_ratio="2" mode="avg"/>
    <input>
        <port id="0">
            <dim>7</dim>
            <dim>256</dim>
            <dim>200</dim>
            <dim>200</dim>
        </port>
        <port id="1">
            <dim>1000</dim>
            <dim>4</dim>
        </port>
        <port id="2">
            <dim>1000</dim>
        </port>
    </input>
    <output>
        <port id="3" precision="FP32">
            <dim>1000</dim>
            <dim>256</dim>
            <dim>6</dim>
            <dim>6</dim>
        </port>
    </output>    
</layer>