From da54a40fa1f36710c713d4372958015a411c23e0 Mon Sep 17 00:00:00 2001 From: iliya mironov Date: Thu, 17 Dec 2020 14:05:24 +0300 Subject: [PATCH] Add spec for CTCGreedyDecoderSecLen (#3250) * Add spec for CTCGreedyDecoder * Update spec * Fix spec according to code rewiev * Update spec * Update spec * Update spec according to review * Update spec * Update spec * Update spec * Update example spec * Fix space in spec * Fix spec * Fix spec according to review * fix spec * update spec * Update spec * Change format outputs in spec * Hot fix * Minor fixes * Add new attribute for op in spec * change input * Add precision to outputs * Fix input in spec * Update spec * Update CTCGreedyDecoderSeqLen_6.md fix mistakes * Change first input layout * fix example Co-authored-by: Your Name --- docs/doxygen/ie_docs.xml | 1 + docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md | 99 +++++++++++++++++++ 2 files changed, 100 insertions(+) create mode 100644 docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md diff --git a/docs/doxygen/ie_docs.xml b/docs/doxygen/ie_docs.xml index 9028504e31d..7095f03bb3b 100644 --- a/docs/doxygen/ie_docs.xml +++ b/docs/doxygen/ie_docs.xml @@ -89,6 +89,7 @@ + diff --git a/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md b/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md new file mode 100644 index 00000000000..f59c1636f1e --- /dev/null +++ b/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md @@ -0,0 +1,99 @@ +## CTCGreedyDecoderSeqLen {#openvino_docs_ops_sequence_CTCGreedyDecoderSeqLen_6} + +**Versioned name**: *CTCGreedyDecoderSeqLen-6* + +**Category**: Sequence processing + +**Short description**: *CTCGreedyDecoderSeqLen* performs greedy decoding of the logits provided as the first input. The sequence lengths are provided as the second input. + +**Detailed description**: + +This operation is similar to the [TensorFlow CTCGreedyDecoder](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder). + +The operation *CTCGreedyDecoderSeqLen* implements best path decoding. +Decoding is done in two steps: + +1. Concatenate the most probable classes per time-step which yields the best path. + +2. Remove duplicate consecutive elements if the attribute *merge_repeated* is true and then remove all blank elements. + +Sequences in the batch can have different length. The lengths of sequences are coded in the second input integer tensor `sequence_length`. + +The main difference between [CTCGreedyDecoder](CTCGreedyDecoder_1.md) and CTCGreedyDecoderSeqLen is in the second input. CTCGreedyDecoder uses 2D input floating point tensor with sequence masks for each sequence in the batch while CTCGreedyDecoderSeqLen uses 1D integer tensor with sequence lengths. + +**Attributes** + +* *merge_repeated* + + * **Description**: *merge_repeated* is a flag for merging repeated labels during the CTC calculation. If the value is false the sequence `ABB*B*B` (where '*' is the blank class) will look like `ABBBB`. But if the value is true, the sequence will be `ABBB`. + * **Range of values**: true or false + * **Type**: `boolean` + * **Default value**: true + * **Required**: *No* + +* *classes_index_type* + + * **Description**: the type of output tensor with classes indices + * **Range of values**: "i64" or "i32" + * **Type**: string + * **Default value**: "i32" + * **Required**: *No* + +* *sequence_length_type* + + * **Description**: the type of output tensor with sequence length + * **Range of values**: "i64" or "i32" + * **Type**: string + * **Default value**: "i32" + * **Required**: *No* + +**Inputs** + +* **1**: `data` - input tensor of type *T_F* of shape `[N, T, C]` with a batch of sequences. Where `T` is the maximum sequence length, `N` is the batch size and `C` is the number of classes. **Required.** + +* **2**: `sequence_length` - input tensor of type *T_I* of shape `[N]` with sequence lengths. The values of sequence length must be less or equal to `T`. **Required.** + +* **3**: `blank_index` - scalar or 1D tensor with 1 element of type *T_I*. Specifies the class index to use for the blank class. The `blank_index` is not saved to the result sequence and it is used for post-processing. Default value is `C-1`. **Optional**. + +**Output** + +* **1**: Output tensor of type *T_IND1* shape `[N, T]` and containing the decoded classes. All elements that do not code sequence classes are filled with -1. + +* **2**: Output tensor of type *T_IND2* shape `[N]` and containing length of decoded class sequence for each batch. + +**Types** + +* *T_F*: any supported floating point type. + +* *T_I*: `int32` or `int64`. + +* *T_IND1*: `int32` or `int64` and depends on `classes_index_type` attribute. + +* *T_IND2*: `int32` or `int64` and depends on `sequence_length_type` attribute. + +**Example** + +```xml + + + + 8 + 20 + 128 + + + 8 + + + + + + 8 + 20 + + + 8 + + + +```