diff --git a/docs/ops/activation/Clamp_1.md b/docs/ops/activation/Clamp_1.md index a0ff821c75c..c32bc3f5ba0 100644 --- a/docs/ops/activation/Clamp_1.md +++ b/docs/ops/activation/Clamp_1.md @@ -1,5 +1,7 @@ # Clamp {#openvino_docs_ops_activation_Clamp_1} +@sphinxdirective + **Versioned name**: *Clamp-1* **Category**: *Activation function* @@ -8,15 +10,17 @@ **Detailed description**: -*Clamp* performs clipping operation over the input tensor element-wise. Element values of the output are within the range `[min, max]`. +*Clamp* performs clipping operation over the input tensor element-wise. Element values of the output are within the range ``[min, max]``. + * Input values that are smaller than *min* are replaced with *min* value. * Input values that are greater than *max* are replaced with *max* value. -* Input values within the range `[min, max]` remain unchanged. +* Input values within the range ``[min, max]`` remain unchanged. Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows: -\f[ -clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big) -\f] + +.. math:: + + clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big) **Attributes**: @@ -24,14 +28,14 @@ clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big * **Description**: *min* is the lower bound of values in the output. * **Range of values**: arbitrary floating-point number - * **Type**: `float` + * **Type**: ``float`` * **Required**: *yes* * *max* * **Description**: *max* is the upper bound of values in the output. * **Range of values**: arbitrary floating-point number - * **Type**: `float` + * **Type**: ``float`` * **Required**: *yes* **Inputs**: @@ -45,22 +49,24 @@ clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big **Types** * *T*: any numeric type. -* **Note**: In case of integral numeric type, ceil is used to convert *min* from `float` to *T* and floor is used to convert *max* from `float` to *T*. +* **Note**: In case of integral numeric type, ceil is used to convert *min* from ``float`` to *T* and floor is used to convert *max* from ``float`` to *T*. **Example** -```xml - - - - - 256 - - - - - 256 - - - -``` +.. code-block:: cpp + + + + + + 256 + + + + + 256 + + + + +@endsphinxdirective diff --git a/docs/ops/arithmetic/Ceiling_1.md b/docs/ops/arithmetic/Ceiling_1.md index fb0d387db7b..6c2139ff741 100644 --- a/docs/ops/arithmetic/Ceiling_1.md +++ b/docs/ops/arithmetic/Ceiling_1.md @@ -1,17 +1,18 @@ # Ceiling {#openvino_docs_ops_arithmetic_Ceiling_1} +@sphinxdirective + **Versioned name**: *Ceiling-1* **Category**: *Arithmetic unary* **Short description**: *Ceiling* performs element-wise ceiling operation with given tensor. -**Detailed description**: For each element from the input tensor calculates corresponding -element in the output tensor with the following formula: +**Detailed description**: For each element from the input tensor calculates corresponding element in the output tensor with the following formula: -\f[ -a_{i} = \lceil a_{i} \rceil -\f] +.. math:: + + a_{i} = \lceil a_{i} \rceil **Attributes**: *Ceiling* operation has no attributes. @@ -27,24 +28,26 @@ a_{i} = \lceil a_{i} \rceil * *T*: any numeric type. - **Examples** *Example 1* -```xml - - - - 256 - 56 - - - - - 256 - 56 - - - -``` +.. code-block:: cpp + + + + + 256 + 56 + + + + + 256 + 56 + + + + +@endsphinxdirective + diff --git a/docs/ops/arithmetic/Cos_1.md b/docs/ops/arithmetic/Cos_1.md index 8eeb2a1f789..585d9bb48f9 100644 --- a/docs/ops/arithmetic/Cos_1.md +++ b/docs/ops/arithmetic/Cos_1.md @@ -1,5 +1,7 @@ # Cos {#openvino_docs_ops_arithmetic_Cos_1} +@sphinxdirective + **Versioned name**: *Cos-1* **Category**: *Arithmetic unary* @@ -8,9 +10,9 @@ **Detailed description**: *Cos* performs element-wise cosine operation on a given input tensor, based on the following mathematical formula: -\f[ -a_{i} = cos(a_{i}) -\f] +.. math:: + + a_{i} = cos(a_{i}) **Attributes**: *Cos* operation has no attributes. @@ -26,22 +28,23 @@ a_{i} = cos(a_{i}) * *T*: any numeric type. - **Example** -```xml - - - - 256 - 56 - - - - - 256 - 56 - - - -``` +.. code-block:: cpp + + + + + 256 + 56 + + + + + 256 + 56 + + + + +@endsphinxdirective diff --git a/docs/ops/arithmetic/Cosh_1.md b/docs/ops/arithmetic/Cosh_1.md index 48a2a34ce07..5622ef1d2ff 100644 --- a/docs/ops/arithmetic/Cosh_1.md +++ b/docs/ops/arithmetic/Cosh_1.md @@ -1,5 +1,7 @@ # Cosh {#openvino_docs_ops_arithmetic_Cosh_1} +@sphinxdirective + **Versioned name**: *Cosh-1* **Category**: *Arithmetic unary* @@ -8,9 +10,9 @@ **Detailed description**: *Cosh* performs element-wise hyperbolic cosine (cosh) operation on a given input tensor, based on the following mathematical formula: -\f[ -a_{i} = cosh(a_{i}) -\f] +.. math:: + + a_{i} = cosh(a_{i}) **Attributes**: *Cosh* operation has no attributes. @@ -28,19 +30,22 @@ a_{i} = cosh(a_{i}) **Example** -```xml - - - - 256 - 56 - - - - - 256 - 56 - - - -``` +.. code-block:: cpp + + + + + 256 + 56 + + + + + 256 + 56 + + + + +@endsphinxdirective + diff --git a/docs/ops/arithmetic/CumSum_3.md b/docs/ops/arithmetic/CumSum_3.md index 3e2c766aca9..0e07383583a 100644 --- a/docs/ops/arithmetic/CumSum_3.md +++ b/docs/ops/arithmetic/CumSum_3.md @@ -1,42 +1,45 @@ # CumSum {#openvino_docs_ops_arithmetic_CumSum_3} +@sphinxdirective + **Versioned name**: *CumSum-3* **Category**: *Arithmetic unary* **Short description**: *CumSum* performs cumulative summation of the input elements along the given axis. -**Detailed description**: *CumSum* performs cumulative summation of the input elements along the `axis` specified by the second input. By default, the `j-th` output element is the inclusive sum of the first `j` elements in the given sequence, and the first element in the sequence is copied to the output as is. -In the `exclusive` mode the `j-th` output element is the sum of the first `j-1` elements and the first element in the output sequence is `0`. -To perform the summation in the opposite direction of the axis, set reverse attribute to `true`. +**Detailed description**: *CumSum* performs cumulative summation of the input elements along the ``axis`` specified by the second input. By default, the ``j-th`` output element is the inclusive sum of the first ``j`` elements in the given sequence, and the first element in the sequence is copied to the output as is. +In the ``exclusive`` mode the ``j-th`` output element is the sum of the first ``j-1`` elements and the first element in the output sequence is ``0``. +To perform the summation in the opposite direction of the axis, set reverse attribute to ``true``. **Attributes**: * *exclusive* -* **Description**: If the attribute is set to `true`, then exclusive sums are returned, the `j-th` element is not included in the `j-th` sum. Otherwise, the inclusive sum of the first `j` elements for the `j-th` element is calculated. + * **Description**: If the attribute is set to ``true``, then exclusive sums are returned, the ``j-th`` element is not included in the ``j-th`` sum. Otherwise, the inclusive sum of the first ``j`` elements for the ``j-th`` element is calculated. * **Range of values**: - * `false` - include the top element - * `true` - do not include the top element - * **Type**: `boolean` - * **Default value**: `false` + + * ``false`` - include the top element + * ``true`` - do not include the top element + * **Type**: ``boolean`` + * **Default value**: ``false`` * **Required**: *no* * *reverse* - * **Description**: If set to `true` will perform the sums in reverse direction. + * **Description**: If set to ``true`` will perform the sums in reverse direction. * **Range of values**: - * `false` - do not perform sums in reverse direction - * `true` - perform sums in reverse direction - * **Type**: `boolean` - * **Default value**: `false` + + * ``false`` - do not perform sums in reverse direction + * ``true`` - perform sums in reverse direction + * **Type**: ``boolean`` + * **Default value**: ``false`` * **Required**: *no* **Inputs** * **1**: A tensor of type *T* and rank greater or equal to 1. **Required.** - -* **2**: Axis index along which the cumulative sum is performed. A scalar of type *T_AXIS*. Negative value means counting dimensions from the back. Default value is `0`. **Optional.** +* **2**: Axis index along which the cumulative sum is performed. A scalar of type *T_AXIS*. Negative value means counting dimensions from the back. Default value is ``0``. **Optional.** **Outputs** @@ -46,78 +49,81 @@ To perform the summation in the opposite direction of the axis, set reverse attr * *T*: any numeric type. -* *T_AXIS*: `int64` or `int32`. +* *T_AXIS*: ``int64`` or ``int32``. **Examples** *Example 1* -```xml - - - - 5 - - - - - - 5 - - - -``` +.. code-block:: cpp + + + + < !-- input value is: [1., 2., 3., 4., 5.] --> + 5 + + < !-- axis value is: 0 --> + + + < !-- output value is: [1., 3., 6., 10., 15.] --> + 5 + + + *Example 2* -```xml - - - - 5 - - - - - - 5 - - - -``` +.. code-block:: cpp + + + + < !-- input value is: [1., 2., 3., 4., 5.] --> + 5 + + < !-- axis value is: 0 --> + + + < !-- output value is: [0., 1., 3., 6., 10.] --> + 5 + + + *Example 3* -```xml - - - - 5 - - - - - - 5 - - - -``` +.. code-block:: cpp + + + + < !-- input value is: [1., 2., 3., 4., 5.] --> + 5 + + < !-- axis value is: 0 --> + + + < !-- output value is: [15., 14., 12., 9., 5.] --> + 5 + + + *Example 4* -```xml - - - - 5 - - - - - - 5 - - - -``` +.. code-block:: cpp + + + + < -- input value is: [1., 2., 3., 4., 5.] --> + 5 + + < -- axis value is: 0 --> + + + < -- output value is: [14., 12., 9., 5., 0.] --> + 5 + + + + +@endsphinxdirective + diff --git a/docs/ops/convolution/ConvolutionBackpropData_1.md b/docs/ops/convolution/ConvolutionBackpropData_1.md index fadcc01ed08..ac612c3a445 100644 --- a/docs/ops/convolution/ConvolutionBackpropData_1.md +++ b/docs/ops/convolution/ConvolutionBackpropData_1.md @@ -1,5 +1,7 @@ # ConvolutionBackpropData {#openvino_docs_ops_convolution_ConvolutionBackpropData_1} +@sphinxdirective + **Versioned name**: *ConvolutionBackpropData-1* **Category**: *Convolution* @@ -8,33 +10,33 @@ **Detailed description**: -ConvolutionBackpropData takes the input tensor, weights tensor and output shape and computes the output tensor of a given shape. The shape of the output can be specified as an input 1D integer tensor explicitly or determined by other attributes implicitly. If output shape is specified as an explicit input, shape of the output exactly matches the specified size and required amount of padding is computed. More thorough explanation can be found in [Transposed Convolutions](https://arxiv.org/abs/1603.07285). +ConvolutionBackpropData takes the input tensor, weights tensor and output shape and computes the output tensor of a given shape. The shape of the output can be specified as an input 1D integer tensor explicitly or determined by other attributes implicitly. If output shape is specified as an explicit input, shape of the output exactly matches the specified size and required amount of padding is computed. More thorough explanation can be found in `Transposed Convolutions `__. -ConvolutionBackpropData accepts the same set of attributes as a regular Convolution operation and additionally `output_padding` attribute, but they are interpreted in a "backward way", so they are applied to the output of ConvolutionBackpropData, but not to the input. Refer to a regular [Convolution](Convolution_1.md) operation for detailed description of each Convolution attribute. +ConvolutionBackpropData accepts the same set of attributes as a regular Convolution operation and additionally ``output_padding`` attribute, but they are interpreted in a "backward way", so they are applied to the output of ConvolutionBackpropData, but not to the input. Refer to a regular :doc:`Convolution ` operation for detailed description of each Convolution attribute. -When output shape is specified as an input tensor `output_shape` then it specifies only spatial dimensions. No batch or channel dimension should be passed along with spatial dimensions. If `output_shape` is omitted, then `pads_begin`, `pads_end` or `auto_pad` are used to determine output spatial shape `[O_z, O_y, O_x]` by input spatial shape `[I_z, I_y, I_x]` in the following way: +When output shape is specified as an input tensor ``output_shape`` then it specifies only spatial dimensions. No batch or channel dimension should be passed along with spatial dimensions. If ``output_shape`` is omitted, then ``pads_begin``, ``pads_end`` or ``auto_pad`` are used to determine output spatial shape ``[O_z, O_y, O_x]`` by input spatial shape ``[I_z, I_y, I_x]`` in the following way: -``` -if auto_pads != None: - pads_begin[i] = 0 - pads_end[i] = 0 +.. code-block:: cpp + + if auto_pads != None: + pads_begin[i] = 0 + pads_end[i] = 0 + + Y_i = stride[i] * (X_i - 1) + ((K_i - 1) * dilations[i] + 1) - pads_begin[i] - pads_end[i] + output_padding[i] -Y_i = stride[i] * (X_i - 1) + ((K_i - 1) * dilations[i] + 1) - pads_begin[i] - pads_end[i] + output_padding[i] -``` +where ``K_i`` filter kernel dimension along spatial axis ``i``. -where `K_i` filter kernel dimension along spatial axis `i`. +If ``output_shape`` is specified, ``pads_begin`` and ``pads_end`` are ignored, and ``auto_pad`` defines how to distribute padding amount around the tensor. In this case pads are determined based on the next formulas to correctly align input and output tensors: - If `output_shape` is specified, `pads_begin` and `pads_end` are ignored, and `auto_pad` defines how to distribute padding amount around the tensor. In this case pads are determined based on the next formulas to correctly align input and output tensors: - -``` -total_padding[i] = stride[i] * (X_i - 1) + ((K_i - 1) * dilations[i] + 1) - output_shape[i] + output_padding[i] -if auto_pads != SAME_UPPER: - pads_begin[i] = total_padding[i] // 2 - pads_end[i] = total_padding[i] - pads_begin[i] -else: - pads_end[i] = total_padding[i] // 2 - pads_begin[i] = total_padding[i] - pads_end[i] -``` +.. code-block:: cpp + + total_padding[i] = stride[i] * (X_i - 1) + ((K_i - 1) * dilations[i] + 1) - output_shape[i] + output_padding[i] + if auto_pads != SAME_UPPER: + pads_begin[i] = total_padding[i] // 2 + pads_end[i] = total_padding[i] - pads_begin[i] + else: + pads_end[i] = total_padding[i] // 2 + pads_begin[i] = total_padding[i] - pads_end[i] **Attributes** @@ -42,14 +44,14 @@ else: * **Description**: *strides* has the same definition as *strides* for a regular Convolution but applied in the backward way, for the output tensor. * **Range of values**: positive integers - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * *pads_begin* * **Description**: *pads_begin* has the same definition as *pads_begin* for a regular Convolution but applied in the backward way, for the output tensor. May be omitted specified, in which case pads are calculated automatically. * **Range of values**: non-negative integers - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * **Note**: the attribute is ignored when *auto_pad* attribute is specified. @@ -57,7 +59,7 @@ else: * **Description**: *pads_end* has the same definition as *pads_end* for a regular Convolution but applied in the backward way, for the output tensor. May be omitted, in which case pads are calculated automatically. * **Range of values**: non-negative integers - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * **Note**: the attribute is ignored when *auto_pad* attribute is specified. @@ -65,44 +67,44 @@ else: * **Description**: *dilations* has the same definition as *dilations* for a regular Convolution but applied in the backward way, for the output tensor. * **Range of values**: positive integers - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * *auto_pad* * **Description**: *auto_pad* has the same definition as *auto_pad* for a regular Convolution but applied in the backward way, for the output tensor. - * *explicit*: use explicit padding values from `pads_begin` and `pads_end`. + + * *explicit*: use explicit padding values from ``pads_begin`` and ``pads_end``. * *same_upper* the input is padded to match the output size. In case of odd padding value an extra padding is added at the end. * *same_lower* the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning. * *valid* - do not use padding. - * **Type**: `string` + * **Type**: ``string`` * **Default value**: None * **Required**: *no* * **Note**: *pads_begin* and *pads_end* attributes are ignored when *auto_pad* is specified. * *output_padding* - * **Description**: *output_padding* adds additional amount of paddings per each spatial axis in the `output` tensor. It unlocks more elements in the output allowing them to be computed. Elements are added at the higher coordinate indices for the spatial dimensions. Number of elements in *output_padding* list matches the number of spatial dimensions in `data` and `output` tensors. + * **Description**: *output_padding* adds additional amount of paddings per each spatial axis in the ``output`` tensor. It unlocks more elements in the output allowing them to be computed. Elements are added at the higher coordinate indices for the spatial dimensions. Number of elements in *output_padding* list matches the number of spatial dimensions in ``data`` and ``output`` tensors. * **Range of values**: non-negative integer values - * **Type**: `int[]` + * **Type**: ``int[]`` * **Default value**: all zeros * **Required**: *no* **Inputs**: -* **1**: Input tensor of type *T1* and rank 3, 4 or 5. Layout is `[N, C_INPUT, Z, Y, X]` (number of batches, number of input channels, spatial axes Z, Y, X). **Required.** - -* **2**: Convolution kernel tensor of type *T1* and rank 3, 4 or 5. Layout is `[C_INPUT, C_OUTPUT, Z, Y, X]` (number of input channels, number of output channels, spatial axes Z, Y, X). Spatial size of the kernel is derived from the shape of this input and aren't specified by any attribute. **Required.** - -* **3**: `output_shape` is 1D tensor of type *T2* that specifies spatial shape of the output. If specified, *padding amount* is deduced from relation of input and output spatial shapes according to formulas in the description. If not specified, *output shape* is calculated based on the `pads_begin` and `pads_end` or completely according to `auto_pad`. **Optional.** -* **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute: - * 1D convolution (input tensors rank 3) means that there is only one spatial axis X, - * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X, - * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X. +* **1**: Input tensor of type *T1* and rank 3, 4 or 5. Layout is ``[N, C_INPUT, Z, Y, X]`` (number of batches, number of input channels, spatial axes Z, Y, X). **Required.** +* **2**: Convolution kernel tensor of type *T1* and rank 3, 4 or 5. Layout is ``[C_INPUT, C_OUTPUT, Z, Y, X]`` (number of input channels, number of output channels, spatial axes Z, Y, X). Spatial size of the kernel is derived from the shape of this input and aren't specified by any attribute. **Required.** +* **3**: ``output_shape`` is 1D tensor of type *T2* that specifies spatial shape of the output. If specified, *padding amount* is deduced from relation of input and output spatial shapes according to formulas in the description. If not specified, *output shape* is calculated based on the ``pads_begin`` and ``pads_end`` or completely according to ``auto_pad``. **Optional.** +* **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute: + + * 1D convolution (input tensors rank 3) means that there is only one spatial axis X, + * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X, + * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X. **Outputs**: -* **1**: Output tensor of type *T1* and rank 3, 4 or 5. Layout is `[N, C_OUTPUT, Z, Y, X]` (number of batches, number of kernel output channels, spatial axes Z, Y, X). +* **1**: Output tensor of type *T1* and rank 3, 4 or 5. Layout is ``[N, C_OUTPUT, Z, Y, X]`` (number of batches, number of kernel output channels, spatial axes Z, Y, X). **Types**: @@ -113,93 +115,96 @@ else: *Example 1: 2D ConvolutionBackpropData* -```xml - - - - - 1 - 20 - 224 - 224 - - - 20 - 10 - 3 - 3 - - - - - 1 - 10 - 447 - 447 - - - -``` +.. code-block:: cpp + + + + + + 1 + 20 + 224 + 224 + + + 20 + 10 + 3 + 3 + + + + + 1 + 10 + 447 + 447 + + + *Example 2: 2D ConvolutionBackpropData with output_padding* -```xml - - - - - 1 - 20 - 2 - 2 - - - 20 - 10 - 3 - 3 - - - - - 1 - 10 - 8 - 8 - - - -``` +.. code-block:: cpp + + + + + + 1 + 20 + 2 + 2 + + + 20 + 10 + 3 + 3 + + + + + 1 + 10 + 8 + 8 + + + *Example 3: 2D ConvolutionBackpropData with output_shape input* -```xml - - - - - 1 - 20 - 224 - 224 - - - 20 - 10 - 3 - 3 - - - 2 - - - - - 1 - 10 - 450 - 450 - - - -``` +.. code-block:: cpp + + + + + + 1 + 20 + 224 + 224 + + + 20 + 10 + 3 + 3 + + + 2 < !-- output_shape value is: [450, 450]--> + + + + + 1 + 10 + 450 + 450 + + + + +@endsphinxdirective + diff --git a/docs/ops/convolution/Convolution_1.md b/docs/ops/convolution/Convolution_1.md index 43cb3161edd..8fb3554eb73 100644 --- a/docs/ops/convolution/Convolution_1.md +++ b/docs/ops/convolution/Convolution_1.md @@ -1,92 +1,105 @@ # Convolution {#openvino_docs_ops_convolution_Convolution_1} +@sphinxdirective + **Versioned name**: *Convolution-1* **Category**: *Convolution* **Short description**: Computes 1D, 2D or 3D convolution (cross-correlation to be precise) of input and kernel tensors. -**Detailed description**: Basic building block of convolution is a dot product of input patch and kernel. Whole operation consist of multiple such computations over multiple input patches and kernels. More thorough explanation can be found in [Convolutional Neural Networks](http://cs231n.github.io/convolutional-networks/#conv) and [Convolution operation](https://medium.com/apache-mxnet/convolutions-explained-with-ms-excel-465d6649831c). +**Detailed description**: Basic building block of convolution is a dot product of input patch and kernel. Whole operation consist of multiple such computations over multiple input patches and kernels. More thorough explanation can be found in `Convolutional Neural Networks `__ and `Convolution operation `__ . For the convolutional layer, the number of output features in each dimension is calculated using the formula: -\f[ -n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1 -\f] + +.. math:: + + n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1 The receptive field in each layer is calculated using the formulas: -* Jump in the output feature map: - \f[ - j_{out} = j_{in} \cdot s - \f] -* Size of the receptive field of output feature: - \f[ - r_{out} = r_{in} + ( k - 1 ) \cdot j_{in} - \f] -* Center position of the receptive field of the first output feature: - \f[ - start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in} - \f] -* Output is calculated using the following formula: - \f[ - out = \sum_{i = 0}^{n}w_{i}x_{i} + b - \f] + +* Jump in the output feature map: + + .. math:: + + j_{out} = j_{in} \cdot s + +* Size of the receptive field of output feature: + + .. math:: + + r_{out} = r_{in} + ( k - 1 ) \cdot j_{in} + +* Center position of the receptive field of the first output feature: + + .. math:: + + start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in} + +* Output is calculated using the following formula: + + .. math:: + + out = \sum_{i = 0}^{n}w_{i}x_{i} + b **Attributes**: * *strides* - * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the `(z, y, x)` axes for 3D convolutions and `(y, x)` axes for 2D convolutions. For example, *strides* equal `4,2,1` means sliding the filter 4 pixel at a time over depth dimension, 2 over height dimension and 1 over width dimension. + * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the ``(z, y, x)`` axes for 3D convolutions and ``(y, x)`` axes for 2D convolutions. For example, *strides* equal ``4,2,1`` means sliding the filter 4 pixel at a time over depth dimension, 2 over height dimension and 1 over width dimension. * **Range of values**: integer values starting from 0 - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * *pads_begin* - * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, *pads_begin* equal `1,2` means adding 1 pixel to the top of the input and 2 to the left of the input. + * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, *pads_begin* equal ``1,2`` means adding 1 pixel to the top of the input and 2 to the left of the input. * **Range of values**: integer values starting from 0 - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * **Note**: the attribute is ignored when *auto_pad* attribute is specified. * *pads_end* - * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, *pads_end* equal `1,2` means adding 1 pixel to the bottom of the input and 2 to the right of the input. + * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, *pads_end* equal ``1,2`` means adding 1 pixel to the bottom of the input and 2 to the right of the input. * **Range of values**: integer values starting from 0 - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * **Note**: the attribute is ignored when *auto_pad* attribute is specified. * *dilations* - * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. For example, *dilation* equal `1,1` means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. *dilation* equal `2,2` means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1. + * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. For example, *dilation* equal ``1,1`` means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. *dilation* equal ``2,2`` means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1. * **Range of values**: integer value starting from 0 - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* * *auto_pad* * **Description**: *auto_pad* how the padding is calculated. Possible values: + * *explicit* - use explicit padding values from *pads_begin* and *pads_end*. * *same_upper* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the end. * *same_lower* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning. * *valid* - do not use padding. - * **Type**: `string` + * **Type**: ``string`` * **Default value**: explicit * **Required**: *no* * **Note**: *pads_begin* and *pads_end* attributes are ignored when *auto_pad* is specified. **Inputs**: -* **1**: Input tensor of type *T* and rank 3, 4 or 5. Layout is `[N, C_IN, Z, Y, X]` (number of batches, number of channels, spatial axes Z, Y, X). **Required.** -* **2**: Kernel tensor of type *T* and rank 3, 4 or 5. Layout is `[C_OUT, C_IN, Z, Y, X]` (number of output channels, number of input channels, spatial axes Z, Y, X). **Required.** -* **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute: - * 1D convolution (input tensors rank 3) means that there is only one spatial axis X - * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X - * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X +* **1**: Input tensor of type *T* and rank 3, 4 or 5. Layout is ``[N, C_IN, Z, Y, X]`` (number of batches, number of channels, spatial axes Z, Y, X). **Required.** +* **2**: Kernel tensor of type *T* and rank 3, 4 or 5. Layout is ``[C_OUT, C_IN, Z, Y, X]`` (number of output channels, number of input channels, spatial axes Z, Y, X). **Required.** +* **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute: + + * 1D convolution (input tensors rank 3) means that there is only one spatial axis X + * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X + * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X **Outputs**: -* **1**: Output tensor of type *T* and rank 3, 4 or 5. Layout is `[N, C_OUT, Z, Y, X]` (number of batches, number of kernel output channels, spatial axes Z, Y, X). +* **1**: Output tensor of type *T* and rank 3, 4 or 5. Layout is ``[N, C_OUT, Z, Y, X]`` (number of batches, number of kernel output channels, spatial axes Z, Y, X). **Types**: @@ -95,87 +108,96 @@ The receptive field in each layer is calculated using the formulas: **Example**: 1D Convolution -```xml - - - - - 1 - 5 - 128 - - - 16 - 5 - 4 - - - - - 1 - 16 - 63 - - - -``` + +.. code-block:: cpp + + + + + + 1 + 5 + 128 + + + 16 + 5 + 4 + + + + + 1 + 16 + 63 + + + + + 2D Convolution -```xml - - - - - 1 - 3 - 224 - 224 - - - 64 - 3 - 5 - 5 - - - - - 1 - 64 - 224 - 224 - - - -``` + +.. code-block:: cpp + + + + + + 1 + 3 + 224 + 224 + + + 64 + 3 + 5 + 5 + + + + + 1 + 64 + 224 + 224 + + + 3D Convolution -```xml - - - - - 1 - 7 - 320 - 320 - 320 - - - 32 - 7 - 3 - 3 - 3 - - - - - 1 - 32 - 106 - 106 - 106 - - - -``` + +.. code-block:: cpp + + + + + + 1 + 7 + 320 + 320 + 320 + + + 32 + 7 + 3 + 3 + 3 + + + + + 1 + 32 + 106 + 106 + 106 + + + + + +@endsphinxdirective + diff --git a/docs/ops/infrastructure/Constant_1.md b/docs/ops/infrastructure/Constant_1.md index 3018057638f..790cf37b80b 100644 --- a/docs/ops/infrastructure/Constant_1.md +++ b/docs/ops/infrastructure/Constant_1.md @@ -1,5 +1,7 @@ # Constant {#openvino_docs_ops_infrastructure_Constant_1} +@sphinxdirective + **Versioned name**: *Constant-1* **Category**: *Infrastructure* @@ -12,28 +14,28 @@ * **Description**: specifies position in binary file with weights where the content of the constant begins; value in bytes * **Range of values**: non-negative integer value - * **Type**: `int` + * **Type**: ``int`` * **Required**: *yes* * *size* * **Description**: size of constant content in binary files; value in bytes * **Range of values**: positive integer bigger than zero - * **Type**: `int` + * **Type**: ``int`` * **Required**: *yes* * *element_type* * **Description**: the type of element of output tensor * **Range of values**: u1, u8, u16, u32, u64, i8, i16, i32, i64, f16, f32, boolean, bf16 - * **Type**: `string` + * **Type**: ``string`` * **Required**: *yes* * *shape* * **Description**: the shape of the output tensor * **Range of values**: list of non-negative integers, empty list is allowed, which means 0D or scalar tensor - * **Type**: `int[]` + * **Type**: ``int[]`` * **Required**: *yes* @@ -47,14 +49,17 @@ **Example** -```xml - - - - - 8 - 8 - - - -``` +.. code-block:: cpp + + + + + + 8 + 8 + + + + +@endsphinxdirective + diff --git a/docs/ops/movement/Concat_1.md b/docs/ops/movement/Concat_1.md index a6f71c2e9c6..cad367213d3 100644 --- a/docs/ops/movement/Concat_1.md +++ b/docs/ops/movement/Concat_1.md @@ -1,5 +1,7 @@ # Concat {#openvino_docs_ops_movement_Concat_1} +@sphinxdirective + **Versioned name**: *Concat-1* **Category**: *Data movement* @@ -11,17 +13,17 @@ * *axis* * **Description**: *axis* specifies dimension to concatenate along - * **Range of values**: integer number. Negative value means counting dimension from the end. The range is `[-R, R-1]`, where `R` is the rank of all inputs. + * **Range of values**: integer number. Negative value means counting dimension from the end. The range is ``[-R, R-1]``, where ``R`` is the rank of all inputs. * **Type**: int * **Required**: *yes* **Inputs**: -* **1..N**: Arbitrary number of input tensors of type *T*. Types of all tensors should match. Rank of all tensors should match. The rank is positive, so scalars as inputs are not allowed. Shapes for all inputs should match at every position except `axis` position. At least one input is required. +* **1..N**: Arbitrary number of input tensors of type *T*. Types of all tensors should match. Rank of all tensors should match. The rank is positive, so scalars as inputs are not allowed. Shapes for all inputs should match at every position except ``axis`` position. At least one input is required. **Outputs**: -* **1**: Tensor of the same type *T* as input tensor and shape `[d1, d2, ..., d_axis, ...]`, where `d_axis` is a sum of sizes of input tensors along `axis` dimension. +* **1**: Tensor of the same type *T* as input tensor and shape ``[d1, d2, ..., d_axis, ...]``, where ``d_axis`` is a sum of sizes of input tensors along ``axis`` dimension. **Types** @@ -29,72 +31,74 @@ **Examples** -```xml - - - - - 1 - 8 - 50 - 50 - - - 1 - 16 - 50 - 50 - - - 1 - 32 - 50 - 50 - - - - - 1 - 56 - 50 - 50 - - - +.. code-block:: cpp + + + + + + 1 + 8 < !-- axis for concatenation --> + 50 + 50 + + + 1 + 16 < !-- axis for concatenation --> + 50 + 50 + + + 1 + 32 < !-- axis for concatenation --> + 50 + 50 + + + + + 1 + 56 < !-- concatenated axis: 8 + 16 + 32 = 48 --> + 50 + 50 + + + -``` -```xml - - - - - 1 - 8 - 50 - 50 - - - 1 - 16 - 50 - 50 - - - 1 - 32 - 50 - 50 - - - - - 1 - 56 - 50 - 50 - - - +.. code-block:: cpp + + + + + + 1 + 8 < !-- axis for concatenation --> + 50 + 50 + + + 1 + 16 < !-- axis for concatenation --> + 50 + 50 + + + 1 + 32 < !-- axis for concatenation --> + 50 + 50 + + + + + 1 + 56 < !-- concatenated axis: 8 + 16 + 32 = 48 --> + 50 + 50 + + + + +@endsphinxdirective -``` diff --git a/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md b/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md index 4bcbf6d312e..75c39446d93 100644 --- a/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md +++ b/docs/ops/sequence/CTCGreedyDecoderSeqLen_6.md @@ -1,5 +1,7 @@ # CTCGreedyDecoderSeqLen {#openvino_docs_ops_sequence_CTCGreedyDecoderSeqLen_6} +@sphinxdirective + **Versioned name**: *CTCGreedyDecoderSeqLen-6* **Category**: *Sequence processing* @@ -8,7 +10,7 @@ **Detailed description**: -This operation is similar to the [TensorFlow CTCGreedyDecoder](https://www.tensorflow.org/api_docs/python/tf/nn/ctc_greedy_decoder). +This operation is similar to the `TensorFlow CTCGreedyDecoder `__. The operation *CTCGreedyDecoderSeqLen* implements best path decoding. Decoding is done in two steps: @@ -17,17 +19,17 @@ Decoding is done in two steps: 2. Remove duplicate consecutive elements if the attribute *merge_repeated* is true and then remove all blank elements. -Sequences in the batch can have different length. The lengths of sequences are coded in the second input integer tensor `sequence_length`. +Sequences in the batch can have different length. The lengths of sequences are coded in the second input integer tensor ``sequence_length``. -The main difference between [CTCGreedyDecoder](CTCGreedyDecoder_1.md) and CTCGreedyDecoderSeqLen is in the second input. CTCGreedyDecoder uses 2D input floating-point tensor with sequence masks for each sequence in the batch while CTCGreedyDecoderSeqLen uses 1D integer tensor with sequence lengths. +The main difference between :doc:`CTCGreedyDecoder ` and CTCGreedyDecoderSeqLen is in the second input. CTCGreedyDecoder uses 2D input floating-point tensor with sequence masks for each sequence in the batch while CTCGreedyDecoderSeqLen uses 1D integer tensor with sequence lengths. **Attributes** * *merge_repeated* - * **Description**: *merge_repeated* is a flag for merging repeated labels during the CTC calculation. If the value is false the sequence `ABB*B*B` (where '*' is the blank class) will look like `ABBBB`. But if the value is true, the sequence will be `ABBB`. + * **Description**: *merge_repeated* is a flag for merging repeated labels during the CTC calculation. If the value is false the sequence ``ABB*B*B`` (where '*' is the blank class) will look like ``ABBBB``. But if the value is true, the sequence will be ``ABBB``. * **Range of values**: true or false - * **Type**: `boolean` + * **Type**: ``boolean`` * **Default value**: true * **Required**: *no* @@ -49,52 +51,49 @@ The main difference between [CTCGreedyDecoder](CTCGreedyDecoder_1.md) and CTCGre **Inputs** -* **1**: `data` - input tensor of type *T_F* of shape `[N, T, C]` with a batch of sequences. Where `T` is the maximum sequence length, `N` is the batch size and `C` is the number of classes. **Required.** - -* **2**: `sequence_length` - input tensor of type *T_I* of shape `[N]` with sequence lengths. The values of sequence length must be less or equal to `T`. **Required.** - -* **3**: `blank_index` - scalar or 1D tensor with 1 element of type *T_I*. Specifies the class index to use for the blank class. Regardless of the value of `merge_repeated` attribute, if the output index for a given batch and time step corresponds to the `blank_index`, no new element is emitted. Default value is `C-1`. **Optional.** +* **1**: ``data`` - input tensor of type *T_F* of shape ``[N, T, C]`` with a batch of sequences. Where ``T`` is the maximum sequence length, ``N`` is the batch size and ``C`` is the number of classes. **Required.** +* **2**: ``sequence_length`` - input tensor of type *T_I* of shape ``[N]`` with sequence lengths. The values of sequence length must be less or equal to ``T``. **Required.** +* **3**: ``blank_index`` - scalar or 1D tensor with 1 element of type *T_I*. Specifies the class index to use for the blank class. Regardless of the value of ``merge_repeated`` attribute, if the output index for a given batch and time step corresponds to the ``blank_index``, no new element is emitted. Default value is `C-1`. **Optional.** **Output** -* **1**: Output tensor of type *T_IND1* shape `[N, T]` and containing the decoded classes. All elements that do not code sequence classes are filled with -1. - -* **2**: Output tensor of type *T_IND2* shape `[N]` and containing length of decoded class sequence for each batch. +* **1**: Output tensor of type *T_IND1* shape ``[N, T]`` and containing the decoded classes. All elements that do not code sequence classes are filled with -1. +* **2**: Output tensor of type *T_IND2* shape ``[N]`` and containing length of decoded class sequence for each batch. **Types** * *T_F*: any supported floating-point type. - -* *T_I*: `int32` or `int64`. - -* *T_IND1*: `int32` or `int64` and depends on `classes_index_type` attribute. - -* *T_IND2*: `int32` or `int64` and depends on `sequence_length_type` attribute. +* *T_I*: ``int32`` or ``int64``. +* *T_IND1*: ``int32`` or ``int64`` and depends on ``classes_index_type`` attribute. +* *T_IND2*: ``int32`` or ``int64`` and depends on ``sequence_length_type`` attribute. **Example** -```xml - - - - - 8 - 20 - 128 - - - 8 - - - - - - 8 - 20 - - - 8 - - - -``` +.. code-block:: cpp + + + + + + 8 + 20 + 128 + + + 8 + + < !-- blank_index = 120 --> + + + + 8 + 20 + + + 8 + + + + +@endsphinxdirective + diff --git a/docs/ops/sequence/CTCGreedyDecoder_1.md b/docs/ops/sequence/CTCGreedyDecoder_1.md index 5fd281fd84e..738532bf853 100644 --- a/docs/ops/sequence/CTCGreedyDecoder_1.md +++ b/docs/ops/sequence/CTCGreedyDecoder_1.md @@ -1,67 +1,72 @@ # CTCGreedyDecoder {#openvino_docs_ops_sequence_CTCGreedyDecoder_1} +@sphinxdirective + **Versioned name**: *CTCGreedyDecoder-1* **Category**: *Sequence processing* **Short description**: *CTCGreedyDecoder* performs greedy decoding on the logits given in input (best path). -**Detailed description**: -Given an input sequence \f$X\f$ of length \f$T\f$, *CTCGreedyDecoder* assumes the probability of a length \f$T\f$ character sequence \f$C\f$ is given by -\f[ -p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) -\f] +**Detailed description**: Given an input sequence :math:`X` of length :math:`T`, *CTCGreedyDecoder* assumes the probability of a length :math:`T` character sequence :math:`C` is given by -Sequences in the batch can have different length. The lengths of sequences are coded as values 1 and 0 in the second input tensor `sequence_mask`. Value `sequence_mask[j, i]` specifies whether there is a sequence symbol at index `i` in the sequence `i` in the batch of sequences. If there is no symbol at `j`-th position `sequence_mask[j, i] = 0`, and `sequence_mask[j, i] = 1` otherwise. Starting from `j = 0`, `sequence_mass[j, i]` are equal to 1 up to the particular index `j = last_sequence_symbol`, which is defined independently for each sequence `i`. For `j > last_sequence_symbol`, values in `sequence_mask[j, i]` are all zeros. +.. math:: + + p(C|X) = \prod_{t=1}^{T} p(c_{t}|X) -**Note**: Regardless of the value of `ctc_merge_repeated` attribute, if the output index for a given batch and time step corresponds to the `blank_index`, no new element is emitted. +Sequences in the batch can have different length. The lengths of sequences are coded as values 1 and 0 in the second input tensor ``sequence_mask``. Value ``sequence_mask[j, i]`` specifies whether there is a sequence symbol at index ``i`` in the sequence ``i`` in the batch of sequences. If there is no symbol at ``j``-th position ``sequence_mask[j, i] = 0``, and ``sequence_mask[j, i] = 1`` otherwise. Starting from ``j = 0``, ``sequence_mass[j, i]`` are equal to 1 up to the particular index ``j = last_sequence_symbol``, which is defined independently for each sequence ``i``. For ``j > last_sequence_symbol``, values in ``sequence_mask[j, i]`` are all zeros. + +**Note**: Regardless of the value of ``ctc_merge_repeated`` attribute, if the output index for a given batch and time step corresponds to the ``blank_index``, no new element is emitted. **Attributes** * *ctc_merge_repeated* * **Description**: *ctc_merge_repeated* is a flag for merging repeated labels during the CTC calculation. - * **Range of values**: `true` or `false` - * **Type**: `boolean` - * **Default value**: `true` + * **Range of values**: ``true`` or ``false`` + * **Type**: ``boolean`` + * **Default value**: ``true`` * **Required**: *no* **Inputs** -* **1**: `data` - input tensor with batch of sequences of type *T_F* and shape `[T, N, C]`, where `T` is the maximum sequence length, `N` is the batch size and `C` is the number of classes. **Required.** - -* **2**: `sequence_mask` - input tensor with sequence masks for each sequence in the batch of type *T_F* populated with values `0` and `1` and shape `[T, N]`. **Required.** +* **1**: ``data`` - input tensor with batch of sequences of type *T_F* and shape ``[T, N, C]``, where ``T`` is the maximum sequence length, ``N`` is the batch size and ``C`` is the number of classes. **Required.** +* **2**: ``sequence_mask`` - input tensor with sequence masks for each sequence in the batch of type *T_F* populated with values ``0`` and ``1`` and shape ``[T, N]``. **Required.** **Output** -* **1**: Output tensor of type *T_F* and shape `[N, T, 1, 1]` which is filled with integer elements containing final sequence class indices. A final sequence can be shorter that the size `T` of the tensor, all elements that do not code sequence classes are filled with `-1`. +* **1**: Output tensor of type *T_F* and shape ``[N, T, 1, 1]`` which is filled with integer elements containing final sequence class indices. A final sequence can be shorter that the size ``T`` of the tensor, all elements that do not code sequence classes are filled with ``-1``. **Types** + * *T_F*: any supported floating-point type. **Example** -```xml - - - - - 20 - 8 - 128 - - - 20 - 8 - - - - - 8 - 20 - 1 - 1 - - - -``` +.. code-block:: cpp + + + + + + 20 + 8 + 128 + + + 20 + 8 + + + + + 8 + 20 + 1 + 1 + + + + +@endsphinxdirective + diff --git a/docs/ops/sequence/CTCLoss_4.md b/docs/ops/sequence/CTCLoss_4.md index 4f6ad270023..2473552b35f 100644 --- a/docs/ops/sequence/CTCLoss_4.md +++ b/docs/ops/sequence/CTCLoss_4.md @@ -1,5 +1,7 @@ # CTCLoss {#openvino_docs_ops_sequence_CTCLoss_4} +@sphinxdirective + **Versioned name**: *CTCLoss-4* **Category**: *Sequence processing* @@ -8,57 +10,50 @@ **Detailed description**: -*CTCLoss* operation is presented in [Connectionist Temporal Classification - Labeling Unsegmented Sequence Data with Recurrent Neural Networks: Graves et al., 2016](http://www.cs.toronto.edu/~graves/icml_2006.pdf) +*CTCLoss* operation is presented in `Connectionist Temporal Classification - Labeling Unsegmented Sequence Data with Recurrent Neural Networks: Graves et al., 2016 `__ -*CTCLoss* estimates likelihood that a target `labels[i,:]` can occur (or is real) for given input sequence of logits `logits[i,:,:]`. -Briefly, *CTCLoss* operation finds all sequences aligned with a target `labels[i,:]`, computes log-probabilities of the aligned sequences using `logits[i,:,:]` -and computes a negative sum of these log-probabilies. +*CTCLoss* estimates likelihood that a target ``labels[i,:]`` can occur (or is real) for given input sequence of logits ``logits[i,:,:]``. Briefly, *CTCLoss* operation finds all sequences aligned with a target ``labels[i,:]``, computes log-probabilities of the aligned sequences using ``logits[i,:,:]`` and computes a negative sum of these log-probabilies. -Input sequences of logits `logits` can have different lengths. The length of each sequence `logits[i,:,:]` equals `logit_length[i]`. -A length of target sequence `labels[i,:]` equals `label_length[i]`. The length of the target sequence must not be greater than the length of corresponding input sequence `logits[i,:,:]`. +Input sequences of logits ``logits`` can have different lengths. The length of each sequence ``logits[i,:,:]`` equals ``logit_length[i]``. +A length of target sequence ``labels[i,:]`` equals ``label_length[i]``. The length of the target sequence must not be greater than the length of corresponding input sequence ``logits[i,:,:]``. Otherwise, the operation behaviour is undefined. *CTCLoss* calculation scheme: -1. Compute probability of `j`-th character at time step `t` for `i`-th input sequence from `logits` using softmax formula: -\f[ -p_{i,t,j} = \frac{\exp(logits[i,t,j])}{\sum^{K}_{k=0}{\exp(logits[i,t,k])}} -\f] +1. Compute probability of ``j``-th character at time step ``t`` for ``i``-th input sequence from ``logits`` using softmax formula: -2. For a given `i`-th target from `labels[i,:]` find all aligned paths. -A path `S = (c1,c2,...,cT)` is aligned with a target `G=(g1,g2,...,gT)` if both chains are equal after decoding. -The decoding extracts substring of length `label_length[i]` from a target `G`, merges repeated characters in `G` in case *preprocess_collapse_repeated* equal to true and -finds unique elements in the order of character occurrence in case *unique* equal to true. -The decoding merges repeated characters in `S` in case *ctc_merge_repeated* equal to true and removes blank characters represented by `blank_index`. -By default, `blank_index` is equal to `C-1`, where `C` is a number of classes including the blank. -For example, in case default *ctc_merge_repeated*, *preprocess_collapse_repeated*, *unique* and `blank_index` a target sequence `G=(0,3,2,2,2,2,2,4,3)` of a length `label_length[i]=4` is processed -to `(0,3,2,2)` and a path `S=(0,0,4,3,2,2,4,2,4)` of a length `logit_length[i]=9` is also processed to `(0,3,2,2)`, where `C=5`. -There exist other paths that are also aligned with `G`, for instance, `0,4,3,3,2,4,2,2,2`. Paths checked for alignment with a target `label[:,i]` must be of length `logit_length[i] = L_i`. -Compute probabilities of these aligned paths (alignments) as follows: -\f[ -p(S) = \prod_{t=1}^{L_i} p_{i,t,ct} -\f] +.. math:: + + p_{i,t,j} = \frac{\exp(logits[i,t,j])}{\sum^{K}_{k=0}{\exp(logits[i,t,k])}} + +2. For a given ``i``-th target from ``labels[i,:]`` find all aligned paths. A path ``S = (c1,c2,...,cT)`` is aligned with a target ``G=(g1,g2,...,gT)`` if both chains are equal after decoding. The decoding extracts substring of length ``label_length[i]`` from a target ``G``, merges repeated characters in ``G`` in case *preprocess_collapse_repeated* equal to true and finds unique elements in the order of character occurrence in case *unique* equal to true. The decoding merges repeated characters in ``S`` in case *ctc_merge_repeated* equal to true and removes blank characters represented by ``blank_index``. By default, ``blank_index`` is equal to ``C-1``, where ``C`` is a number of classes including the blank. For example, in case default *ctc_merge_repeated*, *preprocess_collapse_repeated*, *unique* and ``blank_index`` a target sequence ``G=(0,3,2,2,2,2,2,4,3)`` of a length ``label_length[i]=4`` is processed to ``(0,3,2,2)`` and a path ``S=(0,0,4,3,2,2,4,2,4)`` of a length ``logit_length[i]=9`` is also processed to ``(0,3,2,2)``, where ``C=5``. There exist other paths that are also aligned with ``G``, for instance, ``0,4,3,3,2,4,2,2,2``. Paths checked for alignment with a target ``label[:,i]`` must be of length ``logit_length[i] = L_i``. Compute probabilities of these aligned paths (alignments) as follows: + +.. math:: + + p(S) = \prod_{t=1}^{L_i} p_{i,t,ct} 3. Finally, compute negative log of summed up probabilities of all found alignments: -\f[ -CTCLoss = - \ln \sum_{S} p(S) -\f] -**Note 1**: This calculation scheme does not provide steps for optimal implementation and primarily serves for better explanation. +.. math:: + + CTCLoss = - \ln \sum_{S} p(S) -**Note 2**: This is recommended to compute a log-probability \f$ \ln p(S)\f$ for an aligned path as a sum of log-softmax of input logits. It helps to avoid underflow and overflow during calculation. +**Note 1**: This calculation scheme does not provide steps for optimal implementation and primarily serves for better explanation. + +**Note 2**: This is recommended to compute a log-probability :math:`\ln p(S)` for an aligned path as a sum of log-softmax of input logits. It helps to avoid underflow and overflow during calculation. Having log-probabilities for aligned paths, log of summed up probabilities for these paths can be computed as follows: -\f[ -\ln(a + b) = \ln(a) + \ln(1 + \exp(\ln(b) - \ln(a))) -\f] + +.. math:: + + \ln(a + b) = \ln(a) + \ln(1 + \exp(\ln(b) - \ln(a))) **Attributes** * *preprocess_collapse_repeated* - * **Description**: *preprocess_collapse_repeated* is a flag for a preprocessing step before loss calculation, wherein repeated labels in `labels[i,:]` passed to the loss are merged into single labels. + * **Description**: *preprocess_collapse_repeated* is a flag for a preprocessing step before loss calculation, wherein repeated labels in ``labels[i,:]`` passed to the loss are merged into single labels. * **Range of values**: true or false - * **Type**: `boolean` + * **Type**: ``boolean`` * **Default value**: false * **Required**: *no* @@ -66,66 +61,64 @@ Having log-probabilities for aligned paths, log of summed up probabilities for t * **Description**: *ctc_merge_repeated* is a flag for merging repeated characters in a potential alignment during the CTC loss calculation. * **Range of values**: true or false - * **Type**: `boolean` + * **Type**: ``boolean`` * **Default value**: true * **Required**: *no* * *unique* - * **Description**: *unique* is a flag to find unique elements for a target `labels[i,:]` before matching with potential alignments. Unique elements in the processed `labels[i,:]` are sorted in the order of their occurrence in original `labels[i,:]`. For example, the processed sequence for `labels[i,:]=(0,1,1,0,1,3,3,2,2,3)` of length `label_length[i]=10` will be `(0,1,3,2)` in case *unique* equal to true. + * **Description**: *unique* is a flag to find unique elements for a target ``labels[i,:]`` before matching with potential alignments. Unique elements in the processed ``labels[i,:]`` are sorted in the order of their occurrence in original ``labels[i,:]``. For example, the processed sequence for ``labels[i,:]=(0,1,1,0,1,3,3,2,2,3)`` of length ``label_length[i]=10`` will be ``(0,1,3,2)`` in case *unique* equal to true. * **Range of values**: true or false - * **Type**: `boolean` + * **Type**: ``boolean`` * **Default value**: false * **Required**: *no* **Inputs** -* **1**: `logits` - Input tensor with a batch of sequences of logits. Type of elements is *T_F*. Shape of the tensor is `[N, T, C]`, where `N` is the batch size, `T` is the maximum sequence length and `C` is the number of classes including the blank. **Required.** - -* **2**: `logit_length` - 1D input tensor of type *T1* and of a shape `[N]`. The tensor must consist of non-negative values not greater than `T`. Lengths of input sequences of logits `logits[i,:,:]`. **Required.** - -* **3**: `labels` - 2D tensor with shape `[N, T]` of type *T2*. A length of a target sequence `labels[i,:]` is equal to `label_length[i]` and must contain of integers from a range `[0; C-1]` except `blank_index`. **Required.** - -* **4**: `label_length` - 1D tensor of type *T1* and of a shape `[N]`. The tensor must consist of non-negative values not greater than `T` and `label_length[i] <= logit_length[i]` for all possible `i`. **Required.** - -* **5**: `blank_index` - Scalar of type *T2*. Set the class index to use for the blank label. Default value is `C-1`. **Optional.** +* **1**: ``logits`` - Input tensor with a batch of sequences of logits. Type of elements is *T_F*. Shape of the tensor is ``[N, T, C]``, where ``N`` is the batch size, ``T`` is the maximum sequence length and ``C`` is the number of classes including the blank. **Required.** +* **2**: ``logit_length`` - 1D input tensor of type *T1* and of a shape ``[N]``. The tensor must consist of non-negative values not greater than ``T``. Lengths of input sequences of logits ``logits[i,:,:]``. **Required.** +* **3**: ``labels`` - 2D tensor with shape ``[N, T]`` of type *T2*. A length of a target sequence ``labels[i,:]`` is equal to ``label_length[i]`` and must contain of integers from a range ``[0; C-1]`` except ``blank_index``. **Required.** +* **4**: ``label_length`` - 1D tensor of type *T1* and of a shape ``[N]``. The tensor must consist of non-negative values not greater than ``T`` and ``label_length[i] <= logit_length[i]`` for all possible ``i``. **Required.** +* **5**: ``blank_index`` - Scalar of type *T2*. Set the class index to use for the blank label. Default value is ``C-1``. **Optional.** **Output** -* **1**: Output tensor with shape `[N]`, negative sum of log-probabilities of alignments. Type of elements is *T_F*. +* **1**: Output tensor with shape ``[N]``, negative sum of log-probabilities of alignments. Type of elements is *T_F*. **Types** * *T_F*: any supported floating-point type. - -* *T1*, *T2*: `int32` or `int64`. +* *T1*, *T2*: ``int32`` or ``int64``. **Example** -```xml - - - - 8 - 20 - 128 - - - 8 - - - 8 - 20 - - - 8 - - - - - - 8 - - - -``` +.. code-block:: cpp + + + + + 8 + 20 + 128 + + + 8 + + + 8 + 20 + + + 8 + + < !-- blank_index value is: 120 --> + + + + 8 + + + + +@endsphinxdirective + diff --git a/docs/ops/type/ConvertLike_1.md b/docs/ops/type/ConvertLike_1.md index e08f9341cc4..a08947e09a0 100644 --- a/docs/ops/type/ConvertLike_1.md +++ b/docs/ops/type/ConvertLike_1.md @@ -1,10 +1,12 @@ # ConvertLike {#openvino_docs_ops_type_ConvertLike_1} +@sphinxdirective + **Versioned name**: *ConvertLike-1* **Category**: *Type conversion* -**Short description**: *ConvertLike* operation performs element-wise conversion on a given input tensor `data` to the element type of an additional input tensor `like`. +**Short description**: *ConvertLike* operation performs element-wise conversion on a given input tensor ``data`` to the element type of an additional input tensor ``like``. **Detailed description** @@ -12,20 +14,22 @@ Conversion from one supported type to another supported type is always allowed. Output elements are represented as follows: +.. code-block:: cpp + o[i] = Convert[destination_type=type(b)](a[i]) -where `a` and `b` correspond to `data` and `like` input tensors, respectively. +where ``a`` and ``b`` correspond to ``data`` and ``like`` input tensors, respectively. **Attributes**: *ConvertLike* operation has no attributes. **Inputs** -* **1**: `data` - A tensor of type *T1* and arbitrary shape. **Required.** -* **2**: `like` - A tensor of type *T2* and arbitrary shape. **Required.** +* **1**: ``data`` - A tensor of type *T1* and arbitrary shape. **Required.** +* **2**: ``like`` - A tensor of type *T2* and arbitrary shape. **Required.** **Outputs** -* **1**: The result of element-wise *ConvertLike* operation applied to input tensor `data`. A tensor of type *T2* and the same shape as `data` input tensor. +* **1**: The result of element-wise *ConvertLike* operation applied to input tensor ``data``. A tensor of type *T2* and the same shape as ``data`` input tensor. **Types** @@ -34,22 +38,25 @@ where `a` and `b` correspond to `data` and `like` input tensors, respectively. **Example** -```xml - - - - 256 - 56 - - - 3 - - - - - 256 - 56 - - - -``` +.. code-block:: cpp + + + + < !-- type: int32 --> + 256 + 56 + + < !-- type: float32 --> + 3 < !-- any data --> + + + + < !-- result type: float32 --> + 256 + 56 + + + + +@endsphinxdirective + diff --git a/docs/ops/type/Convert_1.md b/docs/ops/type/Convert_1.md index 36ceb522b97..90b5deb968b 100644 --- a/docs/ops/type/Convert_1.md +++ b/docs/ops/type/Convert_1.md @@ -1,5 +1,7 @@ # Convert {#openvino_docs_ops_type_Convert_1} +@sphinxdirective + **Versioned name**: *Convert-1* **Category**: *Type conversion* @@ -8,22 +10,17 @@ **Detailed description** -Conversion from one supported type to another supported type is always allowed. User must be aware of precision loss -and value change caused by range difference between two types. For example, a 32-bit float `3.141592` may be round -to a 32-bit int `3`. +Conversion from one supported type to another supported type is always allowed. User must be aware of precision loss and value change caused by range difference between two types. For example, a 32-bit float ``3.141592`` may be round to a 32-bit int ``3``. -Conversion of negative signed integer to unsigned integer value happens in accordance with c++ standard. Notably, -result is the unique value of the destination unsigned type that is congruent to the source integer modulo 2^N (where -N is the bit width of the destination type). For example, when an int32 value `-1` is converted to uint32 the result -will be `uint32 max` which is `4,294,967,295`. +Conversion of negative signed integer to unsigned integer value happens in accordance with c++ standard. Notably, result is the unique value of the destination unsigned type that is congruent to the source integer modulo 2^N (where N is the bit width of the destination type). For example, when an int32 value ``-1`` is converted to uint32 the result will be ``uint32 max`` which is ``4,294,967,295``. The result of unsupported conversions is undefined. Output elements are represented as follows: -\f[ -o_{i} = Convert(a_{i}) -\f] +.. math:: + + o_{i} = Convert(a_{i}) -where `a` corresponds to the input tensor. +where ``a`` corresponds to the input tensor. **Attributes**: @@ -31,7 +28,7 @@ where `a` corresponds to the input tensor. * **Description**: the destination type. * **Range of values**: one of the supported types *T* - * **Type**: `string` + * **Type**: ``string`` * **Required**: *yes* **Inputs** @@ -48,20 +45,23 @@ where `a` corresponds to the input tensor. **Example** -```xml - - - - - 256 - 56 - - - - - 256 - 56 - - - -``` +.. code-block:: cpp + + + + + < !-- type: i32 --> + 256 + 56 + + + + < !-- result type: f32 --> + 256 + 56 + + + + +@endsphinxdirective +