8.2 KiB
MulticlassNonMaxSuppression
Versioned name: MulticlassNonMaxSuppression-9
Category: Sorting and maximization
Short description: MulticlassNonMaxSuppression performs multi-class non-maximum suppression of the boxes with predicted scores.
Detailed description: MulticlassNonMaxSuppression is a multi-phase operation. It implements non-maximum suppression algorithm as described below:
- Let
B = [b_0,...,b_n]be the list of initial detection boxes,S = [s_0,...,s_N]be the list of corresponding scores. - Let
D = []be an initial collection of resulting boxes. Letadaptive_threshold = iou_threshold. - If
Bis empty, go to step 9. - Take the box with highest score. Suppose that it is the box
bwith the scores. - Delete
bfromB. - If the score
sis greater than or equal toscore_threshold, addbtoD, else go to step 9. - If
nms_eta < 1andadaptive_threshold > 0.5, updateadaptive_threshold *= nms_eta. - For each input box
b_ifromBand the corresponding scores_i, sets_i = 0wheniou(b, b_i) > adaptive_threshold, and go to step 3. - Return
D, a collection of the corresponding scoresS, and the number of elements inD.
This algorithm is applied independently to each class of each batch element. The operation feeds at most nms_top_k scoring candidate boxes to this algorithm.
The total number of output boxes of each batch element must not exceed keep_top_k.
Boxes of background_class are skipped and thus eliminated.
Attributes:
-
sort_result
- Description: sort_result specifies the order of output elements.
- Range of values:
class,score,none- class - sort selected boxes by class id (ascending).
- score - sort selected boxes by score (descending).
- none - do not guarantee the order.
- Type:
string - Default value:
none - Required: no
-
sort_result_across_batch
- Description: sort_result_across_batch is a flag that specifies whenever it is necessary to sort selected boxes across batches or not.
- Range of values: true or false
- true - sort selected boxes across batches.
- false - do not sort selected boxes across batches (boxes are sorted per batch element).
- Type: boolean
- Default value: false
- Required: no
-
output_type
- Description: the tensor type of outputs
selected_indicesandvalid_outputs. - Range of values:
i64ori32 - Type:
string - Default value:
i64 - Required: no
- Description: the tensor type of outputs
-
iou_threshold
- Description: intersection over union threshold.
- Range of values: a floating-point number
- Type:
float - Default value:
0 - Required: no
-
score_threshold
- Description: minimum score to consider box for the processing.
- Range of values: a floating-point number
- Type:
float - Default value:
0 - Required: no
-
nms_top_k
- Description: maximum number of boxes to be selected per class.
- Range of values: an integer
- Type:
int - Default value:
-1meaning to keep all boxes - Required: no
-
keep_top_k
- Description: maximum number of boxes to be selected per batch element.
- Range of values: an integer
- Type:
int - Default value:
-1meaning to keep all boxes - Required: no
-
background_class
- Description: the background class id.
- Range of values: an integer
- Type:
int - Default value:
-1meaning to keep all classes. - Required: no
-
normalized
- Description: normalized is a flag that indicates whether
boxesare normalized or not. - Range of values: true or false
- true - the box coordinates are normalized.
- false - the box coordinates are not normalized.
- Type: boolean
- Default value: True
- Required: no
- Description: normalized is a flag that indicates whether
-
nms_eta
- Description: eta parameter for adaptive NMS.
- Range of values: a floating-point number in close range
[0, 1.0]. - Type:
float - Default value:
1.0 - Required: no
Inputs:
There are 2 kinds of input formats. The first one is of two inputs. The boxes are shared by all classes.
-
1:
boxes- tensor of type T and shape[num_batches, num_boxes, 4]with box coordinates. The box coordinates are layout as[xmin, ymin, xmax, ymax]. Required. -
2:
scores- tensor of type T and shape[num_batches, num_classes, num_boxes]with box scores. The tensor type should be same withboxes. Required.
The second format is of three inputs. Each class has its own boxes that are not shared.
-
1:
boxes- tensor of type T and shape[num_classes, num_boxes, 4]with box coordinates. The box coordinates are layout as[xmin, ymin, xmax, ymax]. Required. -
2:
scores- tensor of type T and shape[num_classes, num_boxes]with box scores. The tensor type should be same withboxes. Required. -
3:
roisnum- tensor of type T_IND and shape[num_batches]with box numbers in each image.num_batchesis the number of images. Each element in this tensor is the number of boxes for corresponding image. The sum of all elements isnum_boxes. Required.
Outputs:
-
1:
selected_outputs- tensor of type T which should be same withboxesand shape[number of selected boxes, 6]containing the selected boxes with score and class as tuples[class_id, box_score, xmin, ymin, xmax, ymax]. -
2:
selected_indices- tensor of type T_IND and shape[number of selected boxes, 1]the selected indices in the flattenedboxes, which are absolute values cross batches. Therefore possible valid values are in the range[0, num_batches * num_boxes - 1]. -
3:
selected_num- 1D tensor of type T_IND and shape[num_batches]representing the number of selected boxes for each batch element.
When there is no box selected, selected_num is filled with 0. selected_outputs is an empty tensor of shape [0, 6], and selected_indices is an empty tensor of shape [0, 1].
Types
-
T: floating-point type.
-
T_IND:
int64orint32.
Example
<layer ... type="MulticlassNonMaxSuppression" ... >
<data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1" normalized="false" nms_eta="0.0"/>
<input>
<port id="0">
<dim>3</dim>
<dim>100</dim>
<dim>4</dim>
</port>
<port id="1">
<dim>3</dim>
<dim>5</dim>
<dim>100</dim>
</port>
</input>
<output>
<port id="5" precision="FP32">
<dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
<dim>6</dim>
</port>
<port id="6" precision="I64">
<dim>-1</dim>
<dim>1</dim>
</port>
<port id="7" precision="I64">
<dim>3</dim>
</port>
</output>
</layer>
Another possible example with 3 inputs could be like:
<layer ... type="MulticlassNonMaxSuppression" ... >
<data sort_result="score" output_type="i64" sort_result_across_batch="false" iou_threshold="0.2" score_threshold="0.5" nms_top_k="-1" keep_top_k="-1" background_class="-1" normalized="false" nms_eta="0.0"/>
<input>
<port id="0">
<dim>3</dim>
<dim>100</dim>
<dim>4</dim>
</port>
<port id="1">
<dim>3</dim>
<dim>100</dim>
</port>
<port id="2">
<dim>10</dim>
</port>
</input>
<output>
<port id="5" precision="FP32">
<dim>-1</dim> <!-- "-1" means a undefined dimension calculated during the model inference -->
<dim>6</dim>
</port>
<port id="6" precision="I64">
<dim>-1</dim>
<dim>1</dim>
</port>
<port id="7" precision="I64">
<dim>3</dim>
</port>
</output>
</layer>