DOCS shift to rst - Opsets B (#17169)

* Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_5.md * Update BatchToSpace_2.md * Update BinaryConvolution_1.md * Update Broadcast_1.md * Update Broadcast_3.md * Update Bucketize_3.md * fix * fix-2
2023-04-25 16:06:17 +02:00 · 2023-04-25 16:06:17 +02:00 · 49b5d039db
commit 49b5d039db
parent acd424bb5e
7 changed files with 647 additions and 565 deletions
--- a/docs/ops/condition/Bucketize_3.md
+++ b/docs/ops/condition/Bucketize_3.md
@ -1,14 +1,16 @@
 # Bucketize {#openvino_docs_ops_condition_Bucketize_3}

+@sphinxdirective
+
 **Versioned name**: *Bucketize-3*

 **Category**: *Condition*

-**Short description**: *Bucketize* bucketizes the input based on boundaries. This is similar to [Reference](https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/bucketize).
+**Short description**: *Bucketize* bucketizes the input based on boundaries. This is similar to `Reference <https://www.tensorflow.org/api_docs/cc/class/tensorflow/ops/bucketize>`__ .

 **Detailed description**: *Bucketize* computes a bucket index for each element from the first input and outputs a tensor of the first input shape. Buckets are defined with boundaries from the second input.

-For example, if the first input tensor is `[[3, 50], [10, -1]]` and the second input is `[0, 5, 10]` with included right bound, the output will be `[[1, 3], [2, 0]]`.
+For example, if the first input tensor is ``[[3, 50], [10, -1]]`` and the second input is ``[0, 5, 10]`` with included right bound, the output will be ``[[1, 3], [2, 0]]``.

 **Attributes**

@ -16,7 +18,7 @@ For example, if the first input tensor is `[[3, 50], [10, -1]]` and the second i

  * **Description**: the output tensor type
  * **Range of values**: "i64" or "i32"
-  * **Type**: string
+  * **Type**: ``string``
  * **Default value**: "i64"
  * **Required**: *no*

@ -24,20 +26,21 @@ For example, if the first input tensor is `[[3, 50], [10, -1]]` and the second i

  * **Description**: indicates whether bucket includes the right or the left edge of interval.
  * **Range of values**:
+
    * true - bucket includes the right interval edge
    * false - bucket includes the left interval edge
-  * **Type**: `boolean`
+  * **Type**: ``boolean``
  * **Default value**: true
  * **Required**: *no*

 **Inputs**:

-*   **1**: N-D tensor of *T* type with elements for the bucketization. **Required.**
-*   **2**: 1-D tensor of *T_BOUNDARIES* type with sorted unique boundaries for buckets. **Required.**
+* **1**: N-D tensor of *T* type with elements for the bucketization. **Required.**
+* **2**: 1-D tensor of *T_BOUNDARIES* type with sorted unique boundaries for buckets. **Required.**

 **Outputs**:

-*   **1**: Output tensor with bucket indices of *T_IND* type. If the second input is empty, the bucket index for all elements is equal to 0. The output tensor shape is the same as the first input tensor shape.
+* **1**: Output tensor with bucket indices of *T_IND* type. If the second input is empty, the bucket index for all elements is equal to 0. The output tensor shape is the same as the first input tensor shape.

 **Types**

@ -45,26 +48,29 @@ For example, if the first input tensor is `[[3, 50], [10, -1]]` and the second i

 * *T_BOUNDARIES*: any numeric type.

-* *T_IND*: `int32` or `int64`.
+* *T_IND*: ``int32`` or ``int64``.

 **Example**

-```xml
-<layer ... type="Bucketize">
-    <input>
-        <port id="0">
-            <dim>49</dim>
-            <dim>11</dim>
-        </port>
-        <port id="1">
-            <dim>5</dim>
-        </port>
-     </input>
-    <output>
-        <port id="1">
-            <dim>49</dim>
-            <dim>11</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer ... type="Bucketize">
+       <input>
+           <port id="0">
+               <dim>49</dim>
+               <dim>11</dim>
+           </port>
+           <port id="1">
+               <dim>5</dim>
+           </port>
+        </input>
+       <output>
+           <port id="1">
+               <dim>49</dim>
+               <dim>11</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
--- a/docs/ops/convolution/BinaryConvolution_1.md
+++ b/docs/ops/convolution/BinaryConvolution_1.md
@ -1,31 +1,34 @@
 # BinaryConvolution {#openvino_docs_ops_convolution_BinaryConvolution_1}

+@sphinxdirective
+
 **Versioned name**: *BinaryConvolution-1*

 **Category**: *Convolution*

 **Short description**: Computes 2D convolution of binary input and binary kernel tensors.

-**Detailed description**: The operation behaves as regular *Convolution* but uses specialized algorithm for computations on binary data. More thorough explanation can be found in [Understanding Binary Neural Networks](https://sushscience.wordpress.com/2017/10/01/understanding-binary-neural-networks/) and [Bitwise Neural Networks](https://saige.sice.indiana.edu/wp-content/uploads/icml2015_mkim.pdf).
+**Detailed description**: The operation behaves as regular *Convolution* but uses specialized algorithm for computations on binary data. More thorough explanation can be found in `Understanding Binary Neural Networks <https://sushscience.wordpress.com/2017/10/01/understanding-binary-neural-networks/>`__ and `Bitwise Neural Networks <https://saige.sice.indiana.edu/wp-content/uploads/icml2015_mkim.pdf>`__.


 Computation algorithm for mode *xnor-popcount*:
- `X = XNOR(input_patch, filter)`, where XNOR is bitwise operation on two bit streams
- `P = popcount(X)`, where popcount is the number of `1` bits in the `X` bit stream
- `Output = 2 * P - B`, where `B` is the total number of bits in the `P` bit stream
+
+- ``X = XNOR(input_patch, filter)``, where XNOR is bitwise operation on two bit streams
+- ``P = popcount(X)``, where popcount is the number of ``1`` bits in the ``X`` bit stream
+- ``Output = 2 \* P - B``, where ``B`` is the total number of bits in the ``P`` bit stream

 **Attributes**:

 * *strides*

-  * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the `(y, x)` axes for 2D convolutions. For example, *strides* equal `2,1` means sliding the filter 2 pixel at a time over height dimension and 1 over width dimension.
+  * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the ``(y, x)`` axes for 2D convolutions. For example, *strides* equal ``2,1`` means sliding the filter 2 pixel at a time over height dimension and 1 over width dimension.
  * **Range of values**: integer values starting from 0
  * **Type**: int[]
  * **Required**: *yes*

 * *pads_begin*

-  * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, *pads_begin* equal `1,2` means adding 1 pixel to the top of the input and 2 to the left of the input.
+  * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, *pads_begin* equal ``1,2`` means adding 1 pixel to the top of the input and 2 to the left of the input.
  * **Range of values**: integer values starting from 0
  * **Type**: int[]
  * **Required**: *yes*
@ -33,7 +36,7 @@ Computation algorithm for mode *xnor-popcount*:

 * *pads_end*

-  * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, *pads_end* equal `1,2` means adding 1 pixel to the bottom of the input and 2 to the right of the input.
+  * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, *pads_end* equal ``1,2`` means adding 1 pixel to the bottom of the input and 2 to the right of the input.
  * **Range of values**: integer values starting from 0
  * **Type**: int[]
  * **Required**: *yes*
@ -41,30 +44,32 @@ Computation algorithm for mode *xnor-popcount*:

 * *dilations*

-  * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. For example, *dilation* equal `1,1` means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. *dilation* equal `2,2` means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
+  * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. For example, *dilation* equal ``1,1`` means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. *dilation* equal ``2,2`` means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
  * **Range of values**: integer value starting from 0
  * **Type**: int[]
  * **Required**: *yes*

 * *mode*

-  * **Description**: *mode* defines how input tensor `0/1` values and weights `0/1` are interpreted as real numbers and how the result is computed.
+  * **Description**: *mode* defines how input tensor ``0/1`` values and weights ``0/1`` are interpreted as real numbers and how the result is computed.
  * **Range of values**:
+
    * *xnor-popcount*
-  * **Type**: `string`
+  * **Type**: ``string``
  * **Required**: *yes*
-  * **Note**: value `0` in inputs is interpreted as `-1`, value `1` as `1`
+  * **Note**: value ``0`` in inputs is interpreted as ``-1``, value ``1`` as ``1``

 * *pad_value*

  * **Description**: *pad_value* is a floating-point value used to fill pad area.
  * **Range of values**: a floating-point number
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 * *auto_pad*

  * **Description**: *auto_pad* how the padding is calculated. Possible values:
+
    * *explicit* - use explicit padding values from *pads_begin* and *pads_end*.
    * *same_upper* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the end.
    * *same_lower* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning.
@ -76,47 +81,51 @@ Computation algorithm for mode *xnor-popcount*:

 **Inputs**:

-*   **1**: Input tensor of type *T1* and rank 4. Layout is `[N, C_IN, Y, X]` (number of batches, number of channels, spatial axes Y, X). **Required.**
-*   **2**: Kernel tensor of type *T2* and rank 4. Layout is `[C_OUT, C_IN, Y, X]` (number of output channels, number of input channels, spatial axes Y, X). **Required.**
+*   **1**: Input tensor of type *T1* and rank 4. Layout is ``[N, C_IN, Y, X]`` (number of batches, number of channels, spatial axes Y, X). **Required.**
+*   **2**: Kernel tensor of type *T2* and rank 4. Layout is ``[C_OUT, C_IN, Y, X]`` (number of output channels, number of input channels, spatial axes Y, X). **Required.**
 *   **Note**: Interpretation of tensor values is defined by *mode* attribute.

 **Outputs**:

-*   **1**: Output tensor of type *T3* and rank 4. Layout is `[N, C_OUT, Y, X]` (number of batches, number of kernel output channels, spatial axes Y, X).
+*   **1**: Output tensor of type *T3* and rank 4. Layout is ``[N, C_OUT, Y, X]`` (number of batches, number of kernel output channels, spatial axes Y, X).

 **Types**:

-* *T1*: any numeric type with values `0` or `1`.
-* *T2*: `u1` type with binary values `0` or `1`.
+* *T1*: any numeric type with values ``0`` or ``1``.
+* *T2*: ``u1`` type with binary values ``0`` or ``1``.
 * *T3*: *T1* type with full range of values.

 **Example**:

 2D Convolution
-```xml
-<layer type="BinaryConvolution" ...>
-    <data dilations="1,1" pads_begin="2,2" pads_end="2,2" strides="1,1" mode="xnor-popcount" pad_value="0" auto_pad="explicit"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>64</dim>
-            <dim>3</dim>
-            <dim>5</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2" precision="FP32">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+   
+   <layer type="BinaryConvolution" ...>
+       <data dilations="1,1" pads_begin="2,2" pads_end="2,2" strides="1,1" mode="xnor-popcount" pad_value="0" auto_pad="explicit"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>64</dim>
+               <dim>3</dim>
+               <dim>5</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2" precision="FP32">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
--- a/docs/ops/movement/BatchToSpace_2.md
+++ b/docs/ops/movement/BatchToSpace_2.md
@ -1,35 +1,49 @@
 # BatchToSpace {#openvino_docs_ops_movement_BatchToSpace_2}

+@sphinxdirective
+
 **Versioned name**: *BatchToSpace-2*

 **Category**: *Data movement*

-**Short description**: *BatchToSpace* operation permutes the batch dimension on a given input `data` into blocks in the spatial dimensions specified by `block_shape` input. The spatial dimensions are then optionally cropped according to `crops_begin` and `crops_end` inputs to produce the output.
+**Short description**: *BatchToSpace* operation permutes the batch dimension on a given input ``data`` into blocks in the spatial dimensions specified by ``block_shape`` input. The spatial dimensions are then optionally cropped according to ``crops_begin`` and ``crops_end`` inputs to produce the output.

 **Detailed description**

-*BatchToSpace* operation is equivalent to the following operation steps on the input `data` with shape `[batch, D_1, D_2, ..., D_{N-1}]` and `block_shape`, `crops_begin`, `crops_end` inputs with shape `[N]` to produce the output tensor \f$y\f$.
+*BatchToSpace* operation is equivalent to the following operation steps on the input ``data`` with shape ``[batch, D_1, D_2, ..., D_{N-1}]`` and ``block_shape``, ``crops_begin``, ``crops_end`` inputs with shape ``[N]`` to produce the output tensor :math:`y`.

-1. Reshape `data` input to produce a tensor of shape \f$[B_1, \dots, B_{N - 1}, \frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, D_2, \dots, D_{N - 1}]\f$
-\f[x^{\prime} = reshape(data, [B_1, \dots, B_{N - 1}, \frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, D_2, \dots, D_{N - 1}])\f]
+1. Reshape ``data`` input to produce a tensor of shape :math:`[B_1, \dots, B_{N - 1}, \frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, D_2, \dots, D_{N - 1}]`

-2. Permute dimensions of \f$x^{\prime}\f$ to produce a tensor of shape \f$[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, B_1, D_2, B_2, \dots, D_{N-1}, B_{N - 1}]\f$
-\f[x^{\prime\prime} = transpose(x', [N, N + 1, 0, N + 2, 1, \dots, N + N - 1, N - 1])\f]
+.. math::
+   
+   x^{\prime} = reshape(data, [B_1, \dots, B_{N - 1}, \frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, D_2, \dots, D_{N - 1}])

-3. Reshape \f$x^{\prime\prime}\f$ to produce a tensor of shape \f$[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1 \times B_1, D_2 \times B_2, \dots, D_{N - 1} \times B_{N - 1}]\f$
-\f[x^{\prime\prime\prime} = reshape(x^{\prime\prime}, [\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1 \times B_1, D_2 \times B_2, \dots, D_{N - 1} \times B_{N - 1}])\f]
+2. Permute dimensions of :math:`x^{\prime}` to produce a tensor of shape :math:`[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1, B_1, D_2, B_2, \dots, D_{N-1}, B_{N - 1}]`

-4. Crop the start and end of spatial dimensions of \f$x^{\prime\prime\prime}\f$ according to `crops_begin` and `crops_end` inputs to produce the output \f$y\f$ of shape:
-\f[\left[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, crop(D_1 \times B_1, CB_1, CE_1), crop(D_2 \times B_2, CB_2, CE_2), \dots , crop(D_{N - 1} \times B_{N - 1}, CB_{N - 1}, CE_{N - 1})\right]\f]
+.. math::
+   
+   x^{\prime\prime} = transpose(x', [N, N + 1, 0, N + 2, 1, \dots, N + N - 1, N - 1])
+
+3. Reshape :math:`x^{\prime\prime}` to produce a tensor of shape :math:`[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1 \times B_1, D_2 \times B_2, \dots, D_{N - 1} \times B_{N - 1}]`
+
+.. math::
+   
+   x^{\prime\prime\prime} = reshape(x^{\prime\prime}, [\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, D_1 \times B_1, D_2 \times B_2, \dots, D_{N - 1} \times B_{N - 1}])
+
+4. Crop the start and end of spatial dimensions of :math:`x^{\prime\prime\prime}` according to ``crops_begin`` and ``crops_end`` inputs to produce the output :math:`y` of shape:
+
+.. math::
+   
+   \left[\frac{batch}{\left(B_1 \times \dots \times B_{N - 1}\right)}, crop(D_1 \times B_1, CB_1, CE_1), crop(D_2 \times B_2, CB_2, CE_2), \dots , crop(D_{N - 1} \times B_{N - 1}, CB_{N - 1}, CE_{N - 1})\right]

 Where

- \f$B_i\f$ = block_shape[i]
- \f$B_0\f$ is expected to be 1
- \f$CB_i\f$ = crops_begin[i]
- \f$CE_i\f$ = crops_end[i]
- \f$CB_0\f$ and \f$CE_0\f$ are expected to be 0
- \f$CB_i + CE_i \leq D_i \times B_i \f$
+- :math:`B_i` = block_shape[i]
+- :math:`B_0` is expected to be 1
+- :math:`CB_i` = crops_begin[i]
+- :math:`CE_i` = crops_end[i]
+- :math:`CB_0` and :math:`CE_0` are expected to be 0
+- :math:`CB_i + CE_i \leq D_i \times B_i`

 *BatchToSpace* operation is the reverse of *SpaceToBatch* operation.

@ -37,17 +51,17 @@ Where

 **Inputs**

-*   **1**: `data` - A tensor of type *T* and rank greater than or equal to 2. Layout is `[batch, D_1, D_2 ... D_{N-1}]` (number of batches, spatial axes). **Required.**
-*   **2**: `block_shape` - Specifies the block sizes of `batch` axis of `data` input which are moved to the corresponding spatial axes. A 1D tensor of type *T_INT* and shape `[N]`. All element values must be greater than or equal to 1.`block_shape[0]` is expected to be 1. **Required.**
-*   **3**: `crops_begin` - Specifies the amount to crop from the beginning along each axis of `data` input. A 1D tensor of type *T_INT* and shape `[N]`. All element values must be greater than or equal to 0. `crops_begin[0]` is expected to be 0. **Required.**
-*   **4**: `crops_end` - Specifies the amount to crop from the ending along each axis of `data` input. A 1D tensor of type *T_INT* and shape `[N]`. All element values must be greater than or equal to 0. `crops_end[0]` is expected to be 0. **Required.**
-*   **Note**: `N` corresponds to the rank of `data` input.
-*   **Note**: `batch` axis of `data` input must be evenly divisible by the cumulative product of `block_shape` elements.
-*   **Note**: It is required that `crops_begin[i] + crops_end[i] <= block_shape[i] * input_shape[i]`.
+*   **1**: ``data`` - A tensor of type *T* and rank greater than or equal to 2. Layout is ``[batch, D_1, D_2 ... D_{N-1}]`` (number of batches, spatial axes). **Required.**
+*   **2**: ``block_shape`` - Specifies the block sizes of ``batch`` axis of ``data`` input which are moved to the corresponding spatial axes. A 1D tensor of type *T_INT* and shape ``[N]``. All element values must be greater than or equal to 1. ``block_shape[0]`` is expected to be 1. **Required.**
+*   **3**: ``crops_begin`` - Specifies the amount to crop from the beginning along each axis of ``data`` input. A 1D tensor of type *T_INT* and shape ``[N]``. All element values must be greater than or equal to 0. ``crops_begin[0]`` is expected to be 0. **Required.**
+*   **4**: ``crops_end`` - Specifies the amount to crop from the ending along each axis of ``data`` input. A 1D tensor of type *T_INT* and shape ``[N]``. All element values must be greater than or equal to 0. ``crops_end[0]`` is expected to be 0. **Required.**
+*   **Note**: ``N`` corresponds to the rank of ``data`` input.
+*   **Note**: ``batch`` axis of ``data`` input must be evenly divisible by the cumulative product of ``block_shape`` elements.
+*   **Note**: It is required that ``crops_begin[i] + crops_end[i] <= block_shape[i] \* input_shape[i]``.

 **Outputs**

-*   **1**: Permuted tensor of type *T* with the same rank as `data` input tensor, and shape `[batch / (block_shape[0] * block_shape[1] * ... * block_shape[N - 1]), D_1 * block_shape[1] - crops_begin[1] - crops_end[1], D_2 * block_shape[2] - crops_begin[2] - crops_end[2], ..., D_{N - 1} * block_shape[N - 1] - crops_begin[N - 1] - crops_end[N - 1]`.
+*   **1**: Permuted tensor of type *T* with the same rank as ``data`` input tensor, and shape ``[batch / (block_shape[0] \* block_shape[1] \* ... \* block_shape[N - 1]), D_1 \* block_shape[1] - crops_begin[1] - crops_end[1], D_2 \* block_shape[2] - crops_begin[2] - crops_end[2], ..., D_{N - 1} \* block_shape[N - 1] - crops_begin[N - 1] - crops_end[N - 1]``.

 **Types**

@ -56,64 +70,67 @@ Where

 **Examples**

-*Example: 2D input tensor `data`*
+Example: 2D input tensor ``data``

-```xml
-<layer type="BatchToSpace" ...>
-    <input>
-        <port id="0">       <!-- data -->
-            <dim>10</dim>   <!-- batch -->
-            <dim>2</dim>    <!-- spatial dimension 1 -->
-        </port>
-        <port id="1">       <!-- block_shape value: [1, 5] -->
-            <dim>2</dim>
-        </port>
-        <port id="2">       <!-- crops_begin value: [0, 2] -->
-            <dim>2</dim>
-        </port>
-        <port id="3">       <!-- crops_end value: [0, 0] -->
-            <dim>2</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>2</dim>    <!-- data.shape[0] / (block_shape.shape[0] * block_shape.shape[1]) -->
-            <dim>8</dim>    <!-- data.shape[1] * block_shape.shape[1] - crops_begin[1] - crops_end[1]-->
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer type="BatchToSpace" ...>
+       <input>
+           <port id="0">       < !-- data -->
+               <dim>10</dim>   < !-- batch -->
+               <dim>2</dim>    < !-- spatial dimension 1 -->
+           </port>
+           <port id="1">       < !-- block_shape value: [1, 5] -->
+               <dim>2</dim>
+           </port>
+           <port id="2">       < !-- crops_begin value: [0, 2] -->
+               <dim>2</dim>
+           </port>
+           <port id="3">       < !-- crops_end value: [0, 0] -->
+               <dim>2</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>2</dim>    < !-- data.shape[0] / (block_shape.shape[0] * block_shape.shape[1]) -->
+               <dim>8</dim>    < !-- data.shape[1] * block_shape.shape[1] - crops_begin[1] - crops_end[1]-->
+           </port>
+       </output>
+   </layer>

-*Example: 5D input tensor `data`*
+Example: 5D input tensor ``data``
+
+.. code-block:: cpp
+   
+   <layer type="BatchToSpace" ...>
+       <input>
+           <port id="0">       < !-- data -->
+               <dim>48</dim>   < !-- batch -->
+               <dim>3</dim>    < !-- spatial dimension 1 -->
+               <dim>3</dim>    < !-- spatial dimension 2 -->
+               <dim>1</dim>    < !-- spatial dimension 3 -->
+               <dim>3</dim>    < !-- spatial dimension 4 -->
+           </port>
+           <port id="1">       < !-- block_shape value: [1, 2, 4, 3, 1] -->
+               <dim>5</dim>
+           </port>
+           <port id="2">       < !-- crops_begin value: [0, 0, 1, 0, 0] -->
+               <dim>5</dim>
+           </port>
+           <port id="3">       < !-- crops_end value: [0, 0, 1, 0, 0] -->
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>2</dim>    < !-- data.shape[0] / (block_shape.shape[0] * block_shape.shape[1] * ... * block_shape.shape[4]) -->
+               <dim>6</dim>    < !-- data.shape[1] * block_shape.shape[1] - crops_begin[1] - crops_end[1]-->
+               <dim>10</dim>   < !-- data.shape[2] * block_shape.shape[2] - crops_begin[2] - crops_end[2] -->
+               <dim>3</dim>    < !-- data.shape[3] * block_shape.shape[3] - crops_begin[3] - crops_end[3] -->
+               <dim>3</dim>    < !-- data.shape[4] * block_shape.shape[4] - crops_begin[4] - crops_end[4] -->
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective

-```xml
-<layer type="BatchToSpace" ...>
-    <input>
-        <port id="0">       <!-- data -->
-            <dim>48</dim>   <!-- batch -->
-            <dim>3</dim>    <!-- spatial dimension 1 -->
-            <dim>3</dim>    <!-- spatial dimension 2 -->
-            <dim>1</dim>    <!-- spatial dimension 3 -->
-            <dim>3</dim>    <!-- spatial dimension 4 -->
-        </port>
-        <port id="1">       <!-- block_shape value: [1, 2, 4, 3, 1] -->
-            <dim>5</dim>
-        </port>
-        <port id="2">       <!-- crops_begin value: [0, 0, 1, 0, 0] -->
-            <dim>5</dim>
-        </port>
-        <port id="3">       <!-- crops_end value: [0, 0, 1, 0, 0] -->
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>2</dim>    <!-- data.shape[0] / (block_shape.shape[0] * block_shape.shape[1] * ... * block_shape.shape[4]) -->
-            <dim>6</dim>    <!-- data.shape[1] * block_shape.shape[1] - crops_begin[1] - crops_end[1]-->
-            <dim>10</dim>   <!-- data.shape[2] * block_shape.shape[2] - crops_begin[2] - crops_end[2] -->
-            <dim>3</dim>    <!-- data.shape[3] * block_shape.shape[3] - crops_begin[3] - crops_end[3] -->
-            <dim>3</dim>    <!-- data.shape[4] * block_shape.shape[4] - crops_begin[4] - crops_end[4] -->
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/movement/Broadcast_1.md
+++ b/docs/ops/movement/Broadcast_1.md
@ -1,5 +1,7 @@
 # Broadcast {#openvino_docs_ops_movement_Broadcast_1}

+@sphinxdirective
+
 **Versioned name**: *Broadcast-1*

 **Category**: *Data movement*
@ -8,23 +10,24 @@

 **Detailed description**:

-*Broadcast* takes the first tensor `data` and, following broadcasting rules that are specified by `mode` attribute and the 3rd input `axes_mapping`, builds a new tensor with shape matching the 2nd input tensor `target_shape`. `target_shape` input is a 1D integer tensor that represents required shape of the output.
+*Broadcast* takes the first tensor ``data`` and, following broadcasting rules that are specified by ``mode`` attribute and the 3rd input ``axes_mapping``, builds a new tensor with shape matching the 2nd input tensor ``target_shape``. ``target_shape`` input is a 1D integer tensor that represents required shape of the output.

-Attribute `mode` and the 3rd input `axes_mapping` are relevant for cases when rank of the input `data` tensor doesn't match the size of the `target_shape` input. They both define how axes from `data` shape are mapped to the output axes. If `mode` is set to `numpy`, it means that the standard one-directional numpy broadcasting rules are applied. These rules are described in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md), when only one-directional broadcasting is applied: input tensor `data` is broadcasted to `target_shape` but not vice-versa.
+Attribute ``mode`` and the 3rd input ``axes_mapping`` are relevant for cases when rank of the input ``data`` tensor doesn't match the size of the ``target_shape`` input. They both define how axes from ``data`` shape are mapped to the output axes. If ``mode`` is set to ``numpy``, it means that the standard one-directional numpy broadcasting rules are applied. These rules are described in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`, when only one-directional broadcasting is applied: input tensor ``data`` is broadcasted to ``target_shape`` but not vice-versa.

-In case if `mode` is set to `explicit`, then 3rd input `axes_mapping` comes to play. It contains a list of axis indices, each index maps an axis from the 1st input tensor `data` to axis in the output. The size of `axis_mapping` should match the rank of input `data` tensor, so all axes from `data` tensor should be mapped to axes of the output.
+In case if ``mode`` is set to ``explicit``, then 3rd input ``axes_mapping`` comes to play. It contains a list of axis indices, each index maps an axis from the 1st input tensor ``data`` to axis in the output. The size of ``axis_mapping`` should match the rank of input ``data`` tensor, so all axes from ``data`` tensor should be mapped to axes of the output.

-For example, `axes_mapping = [1]` enables broadcasting of a tensor with shape `[C]` to shape `[N,C,H,W]` by replication of initial tensor along dimensions 0, 2 and 3. Another example is broadcasting of tensor with shape `[H,W]` to shape `[N,H,W,C]` with `axes_mapping = [1, 2]`. Both examples requires `mode` set to `explicit` and providing mentioned `axes_mapping` input, because such operations cannot be expressed with `axes_mapping` set to `numpy`.
+For example, ``axes_mapping = [1]`` enables broadcasting of a tensor with shape ``[C]`` to shape ``[N,C,H,W]`` by replication of initial tensor along dimensions 0, 2 and 3. Another example is broadcasting of tensor with shape ``[H,W]`` to shape ``[N,H,W,C]`` with ``axes_mapping = [1, 2]``. Both examples requires ``mode`` set to ``explicit`` and providing mentioned ``axes_mapping`` input, because such operations cannot be expressed with ``axes_mapping`` set to ``numpy``.


 **Attributes**:

 * *mode*

-  * **Description**: specifies rules used for mapping of `input` tensor axes to output shape axes.
+  * **Description**: specifies rules used for mapping of ``input`` tensor axes to output shape axes.
  * **Range of values**:
-    * *numpy* - numpy broadcasting rules, aligned with ONNX Broadcasting. Description is available in <a href="https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md">ONNX docs</a>.; only one-directional broadcasting is applied from `data` to `target_shape`. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
-    * *explicit* - mapping of the input `data` shape axes to output shape is provided as an explicit 3rd input.
+
+    * *numpy* - numpy broadcasting rules, aligned with ONNX Broadcasting. Description is available in `ONNX docs <https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md>`__.; only one-directional broadcasting is applied from ``data`` to ``target_shape``. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
+    * *explicit* - mapping of the input ``data`` shape axes to output shape is provided as an explicit 3rd input.
  * **Type**: string
  * **Default value**: "numpy"
  * **Required**: *no*
@ -32,86 +35,87 @@ For example, `axes_mapping = [1]` enables broadcasting of a tensor with shape `[

 **Inputs**:

-*   **1**: `data` - source tensor of any type and shape that is being broadcasted. **Required.**
-
-*   **2**: `taget_shape` - 1D integer tensor describing output shape. **Required.**
-
-*   **3**: `axes_mapping` - 1D integer tensor describing a list of axis indices, each index maps an axis from the 1st input tensor `data` to axis in the output. The index values in this tensor should be sorted, that disables on-the-fly transpositions of input `data` tensor while the broadcasting. `axes_mapping` input is optional depending on `mode` value.
+* **1**: ``data`` - source tensor of any type and shape that is being broadcasted. **Required.**
+* **2**: ``taget_shape`` - 1D integer tensor describing output shape. **Required.**
+* **3**: ``axes_mapping`` - 1D integer tensor describing a list of axis indices, each index maps an axis from the 1st input tensor ``data`` to axis in the output. The index values in this tensor should be sorted, that disables on-the-fly transpositions of input ``data`` tensor while the broadcasting. ``axes_mapping`` input is optional depending on ``mode`` value.

 **Outputs**:

-*   **1**: Output tensor with replicated content from the 1st tensor `data` and with shape matched `target_shape`.
+* **1**: Output tensor with replicated content from the 1st tensor ``data`` and with shape matched ``target_shape``.

 **Example**

-```xml
-<layer ... type="Broadcast" ...>
-    <data mode="numpy"/>
-    <input>
-        <port id="0">
-            <dim>16</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 16, 50, 50] -->
-        </port>
-        <!-- the 3rd input shouldn't be provided with mode="numpy" -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>16</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-        </port>
-    </output>
-</layer>
+.. code-block:: cpp
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="numpy"/>
+       <input>
+           <port id="0">
+               <dim>16</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 16, 50, 50] -->
+           </port>
+           < !-- the 3rd input shouldn't be provided with mode="numpy" -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>16</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+           </port>
+       </output>
+   </layer>
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="explicit"/>
+       <input>
+           <port id="0">
+               <dim>16</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 16, 50, 50] -->
+           </port>
+           <port id="1">
+               <dim>1</dim>   < !--The tensor contains 1 elements: [1] -->
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>16</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+           </port>
+       </output>
+   </layer>
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="explicit"/>
+       <input>
+           <port id="0">
+               <dim>50</dim>
+               <dim>50</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 50, 50, 16] -->
+           </port>
+           <port id="1">
+               <dim>2</dim>   < !--The tensor contains 2 elements: [1, 2] -->
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+               <dim>16</dim>
+           </port>
+       </output>
+   </layer>

-<layer ... type="Broadcast" ...>
-    <data mode="explicit"/>
-    <input>
-        <port id="0">
-            <dim>16</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 16, 50, 50] -->
-        </port>
-        <port id="1">
-            <dim>1</dim>   <!--The tensor contains 1 elements: [1] -->
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>16</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-        </port>
-    </output>
-</layer>
+@endsphinxdirective

-<layer ... type="Broadcast" ...>
-    <data mode="explicit"/>
-    <input>
-        <port id="0">
-            <dim>50</dim>
-            <dim>50</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 50, 50, 16] -->
-        </port>
-        <port id="1">
-            <dim>2</dim>   <!--The tensor contains 2 elements: [1, 2] -->
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-            <dim>16</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/movement/Broadcast_3.md
+++ b/docs/ops/movement/Broadcast_3.md
@ -1,5 +1,7 @@
 # Broadcast {#openvino_docs_ops_movement_Broadcast_3}

+@sphinxdirective
+
 **Versioned name**: *Broadcast-3*

 **Category**: *Data movement*
@ -8,142 +10,143 @@

 **Detailed description**:

-*Broadcast* takes the first tensor `data` and, following broadcasting rules that are specified by `mode` attribute and the 3rd input `axes_mapping`, builds a new tensor with shape matching the 2nd input tensor `target_shape`. `target_shape` input is a 1D integer tensor that represents required shape of the output.
+*Broadcast* takes the first tensor ``data`` and, following broadcasting rules that are specified by ``mode`` attribute and the 3rd input ``axes_mapping``, builds a new tensor with shape matching the 2nd input tensor ``target_shape``. ``target_shape`` input is a 1D integer tensor that represents required shape of the output.

-Attribute `mode` and the 3rd input `axes_mapping` are relevant for cases when rank of the input `data` tensor doesn't match the size of the `target_shape` input. They both define how axes from `data` shape are mapped to the output axes. If `mode` is set to `numpy`, it means that the standard one-directional numpy broadcasting rules are applied. These rules are described in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md), when only one-directional broadcasting is applied: input tensor `data` is broadcasted to `target_shape` but not vice-versa.
+Attribute ``mode`` and the 3rd input ``axes_mapping`` are relevant for cases when rank of the input ``data`` tensor doesn't match the size of the ``target_shape`` input. They both define how axes from ``data`` shape are mapped to the output axes. If ``mode`` is set to ``numpy``, it means that the standard one-directional numpy broadcasting rules are applied. These rules are described in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`, when only one-directional broadcasting is applied: input tensor ``data`` is broadcasted to ``target_shape`` but not vice-versa.

-In case if `mode` is set to `bidirectional`, then the broadcast rule is similar to `numpy.array(input) * numpy.ones(target_shape)`. Dimensions are right alignment. Two corresponding dimension must have the same value, or one of them is equal to 1. If this attribute value is used, then the 3rd input for the operation shouldn't be provided. The behaviour is described in [Bidirectional Broadcast Rules](../broadcast_rules.md).
+In case if ``mode`` is set to ``bidirectional``, then the broadcast rule is similar to ``numpy.array(input) * numpy.ones(target_shape)``. Dimensions are right alignment. Two corresponding dimension must have the same value, or one of them is equal to 1. If this attribute value is used, then the 3rd input for the operation shouldn't be provided. The behaviour is described in :doc:`Bidirectional Broadcast Rules <openvino_docs_ops_broadcast_rules>`.

-In case if `mode` is set to `explicit`, then 3rd input `axes_mapping` comes to play. It contains a list of axis indices, each index maps an axis from the 1st input tensor `data` to axis in the output. The size of `axis_mapping` should match the rank of input `data` tensor, so all axes from `data` tensor should be mapped to axes of the output.
+In case if ``mode`` is set to ``explicit``, then 3rd input ``axes_mapping`` comes to play. It contains a list of axis indices, each index maps an axis from the 1st input tensor ``data`` to axis in the output. The size of ``axis_mapping`` should match the rank of input ``data`` tensor, so all axes from ``data`` tensor should be mapped to axes of the output.

-For example, `axes_mapping = [1]` enables broadcasting of a tensor with shape `[C]` to shape `[N,C,H,W]` by replication of initial tensor along dimensions 0, 2 and 3. Another example is broadcasting of tensor with shape `[H,W]` to shape `[N,H,W,C]` with `axes_mapping = [1, 2]`. Both examples requires `mode` set to `explicit` and providing mentioned `axes_mapping` input, because such operations cannot be expressed with `axes_mapping` set to `numpy`.
+For example, ``axes_mapping = [1]`` enables broadcasting of a tensor with shape ``[C]`` to shape ``[N,C,H,W]`` by replication of initial tensor along dimensions 0, 2 and 3. Another example is broadcasting of tensor with shape ``[H,W]`` to shape ``[N,H,W,C]`` with ``axes_mapping = [1, 2]``. Both examples requires ``mode`` set to ``explicit`` and providing mentioned ``axes_mapping`` input, because such operations cannot be expressed with ``axes_mapping`` set to ``numpy``.


 **Attributes**:

 * *mode*

-  * **Description**: specifies rules used for mapping of `input` tensor axes to output shape axes.
+  * **Description**: specifies rules used for mapping of ``input`` tensor axes to output shape axes.
  * **Range of values**:
-    * *numpy* - numpy broadcasting rules, aligned with ONNX Broadcasting. Description is available in <a href="https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md">ONNX docs</a>.; only one-directional broadcasting is applied from `data` to `target_shape`. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
-    * *explicit* - mapping of the input `data` shape axes to output shape is provided as an explicit 3rd input.
-    * *bidirectional* - the broadcast rule is similar to `numpy.array(input) * numpy.ones(target_shape)`. Dimensions are right alignment. Two corresponding dimension must have the same value, or one of them is equal to 1. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
-  * **Type**: string
+
+    * *numpy* - numpy broadcasting rules, aligned with ONNX Broadcasting. Description is available in `ONNX docs <https://github.com/onnx/onnx/blob/master/docs/Broadcasting.md>`__ .; only one-directional broadcasting is applied from ``data`` to ``target_shape``. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
+    * *explicit* - mapping of the input ``data`` shape axes to output shape is provided as an explicit 3rd input.
+    * *bidirectional* - the broadcast rule is similar to ``numpy.array(input) * numpy.ones(target_shape)``. Dimensions are right alignment. Two corresponding dimension must have the same value, or one of them is equal to 1. If this attribute value is used, then the 3rd input for the operation shouldn't be provided.
+  * **Type**: ``string``
  * **Default value**: "numpy"
  * **Required**: *no*


 **Inputs**:

-*   **1**: `data` - source tensor of type *T* and shape that is being broadcasted. **Required.**
-
-*   **2**: `target_shape` - 1D tensor of type *T_SHAPE* describing output shape. **Required.**
-
-*   **3**: `axes_mapping` - 1D tensor of type *T_SHAPE* describing a list of axis indices, each index maps an axis from the 1st input tensor `data` to axis in the output. The index values in this tensor should be sorted, that disables on-the-fly transpositions of input `data` tensor while the broadcasting. `axes_mapping` input is needed for `mode` equal to *explicit* only.
+* **1**: ``data`` - source tensor of type *T* and shape that is being broadcasted. **Required.**
+* **2**: ``target_shape`` - 1D tensor of type *T_SHAPE* describing output shape. **Required.**
+* **3**: ``axes_mapping`` - 1D tensor of type *T_SHAPE* describing a list of axis indices, each index maps an axis from the 1st input tensor ``data`` to axis in the output. The index values in this tensor should be sorted, that disables on-the-fly transpositions of input ``data`` tensor while the broadcasting. ``axes_mapping`` input is needed for ``mode`` equal to *explicit* only.

 **Outputs**:

-*   **1**: Output tensor of `data` tensor type with replicated content from the 1st tensor `data` and with shape matched `target_shape`.
+* **1**: Output tensor of ``data`` tensor type with replicated content from the 1st tensor ``data`` and with shape matched ``target_shape``.

 **Types**

 * *T*: any numeric type.
-
 * *T_SHAPE*: any integer type.

 **Example**

-```xml
-<layer ... type="Broadcast" ...>
-    <data mode="numpy"/>
-    <input>
-        <port id="0">
-            <dim>16</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 16, 50, 50] -->
-        </port>
-        <!-- the 3rd input shouldn't be provided with mode="numpy" -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>16</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-        </port>
-    </output>
-</layer>
+.. code-block:: cpp
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="numpy"/>
+       <input>
+           <port id="0">
+               <dim>16</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 16, 50, 50] -->
+           </port>
+           < !-- the 3rd input shouldn't be provided with mode="numpy" -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>16</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+           </port>
+       </output>
+   </layer>
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="explicit"/>
+       <input>
+           <port id="0">
+               <dim>16</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 16, 50, 50] -->
+           </port>
+           <port id="1">
+               <dim>1</dim>   < !--The tensor contains 1 elements: [1] -->
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>16</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+           </port>
+       </output>
+   </layer>
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="explicit"/>
+       <input>
+           <port id="0">
+               <dim>50</dim>
+               <dim>50</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 50, 50, 16] -->
+           </port>
+           <port id="1">
+               <dim>2</dim>   < !--The tensor contains 2 elements: [1, 2] -->
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+               <dim>16</dim>
+           </port>
+       </output>
+   </layer>
+   
+   <layer ... type="Broadcast" ...>
+       <data mode="bidirectional"/>
+       <input>
+           <port id="0">
+               <dim>16</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+          </port>
+           <port id="1">
+               <dim>4</dim>   < !--The tensor contains 4 elements: [1, 1, 50, 50] -->
+           </port>
+           < !-- the 3rd input shouldn't be provided with mode="bidirectional" -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>1</dim>
+               <dim>16</dim>
+               <dim>50</dim>
+               <dim>50</dim>
+           </port>
+       </output>
+   </layer>

-<layer ... type="Broadcast" ...>
-    <data mode="explicit"/>
-    <input>
-        <port id="0">
-            <dim>16</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 16, 50, 50] -->
-        </port>
-        <port id="1">
-            <dim>1</dim>   <!--The tensor contains 1 elements: [1] -->
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>16</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-        </port>
-    </output>
-</layer>
+@endsphinxdirective

-<layer ... type="Broadcast" ...>
-    <data mode="explicit"/>
-    <input>
-        <port id="0">
-            <dim>50</dim>
-            <dim>50</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 50, 50, 16] -->
-        </port>
-        <port id="1">
-            <dim>2</dim>   <!--The tensor contains 2 elements: [1, 2] -->
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-            <dim>16</dim>
-        </port>
-    </output>
-</layer>
-
-<layer ... type="Broadcast" ...>
-    <data mode="bidirectional"/>
-    <input>
-        <port id="0">
-            <dim>16</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-       </port>
-        <port id="1">
-            <dim>4</dim>   <!--The tensor contains 4 elements: [1, 1, 50, 50] -->
-        </port>
-        <!-- the 3rd input shouldn't be provided with mode="bidirectional" -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>1</dim>
-            <dim>16</dim>
-            <dim>50</dim>
-            <dim>50</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/normalization/BatchNormInference_1.md
+++ b/docs/ops/normalization/BatchNormInference_1.md
@ -1,78 +1,96 @@
 # BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_1}

-**Versioned name**: *BatchNormInference-1*
+@sphinxdirective
+
+**Versioned name**: *BatchNormInference-5*

 **Category**: *Normalization*

-**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
+**Short description**: *BatchNormInference* performs Batch Normalization operation described in the `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167v2>`__ article.

 **Detailed Description**

-*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
+*BatchNormInference* performs the following operations on a given data batch input tensor ``data``:

-* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
-\f[
-   \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
-\f]
-where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
+* Normalizes each activation :math:`x^{(k)}` by the mean and variance.
+  
+  .. math::
+     
+     \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}

-* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
-\f[
-   \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
-\f]
-where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
+  where :math:`E[x^{(k)}]` and :math:`Var(x^{(k)})` are the mean and variance, calculated per channel axis of ``data`` input, and correspond to ``mean`` and ``variance`` inputs, respectively. Additionally, :math:`\epsilon` is a value added to the variance for numerical stability and corresponds to ``epsilon`` attribute.
+
+* Performs linear transformation of each normalized activation based on ``gamma`` and ``beta`` input, representing the scaling factor and shift, respectively.
+  
+  .. math::
+     
+     \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
+  
+  where :math:`\gamma^{(k)}` and :math:`\beta^{(k)}` are learnable parameters, calculated per channel axis, and correspond to ``gamma`` and ``beta`` inputs.

 **Mathematical Formulation**

-Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
+Let ``x`` be a *d*-dimensional input, :math:`x=(x_{1}\dotsc x_{d})`. Since normalization is applied to each activation :math:`E[x^{(k)}]`, you can focus on a particular activation and omit k.

-For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+For a particular activation, consider a mini-batch :math:`\mathcal{B}` of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:

-*   **Input**: Values of \f$x\f$ over a mini-batch:
-    \f[
-    \mathcal{B} = \{ x_{1...m} \}
-    \f]
-*   **Parameters to learn**: \f$ \gamma, \beta\f$
-*   **Output**:
-    \f[
-    \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}
-    \f]
-*   **Mini-batch mean**:
-    \f[
-    \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
-    \f]
-*   **Mini-batch variance**:
-    \f[
-    \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
-    \f]
-*   **Normalize**:
-    \f[
-    \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
-    \f]
-*   **Scale and shift**:
-    \f[
-    o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
-    \f]
+* **Input**: Values of :math:`x` over a mini-batch:
+  
+  .. math::
+     
+     \mathcal{B} = {x_{1...m}}
+
+* **Parameters to learn**: :math:`\gamma, \beta`
+* **Output**:
+  
+  .. math::
+     
+     {o_{i} = BN_{\gamma, \beta} ( b_{i} )}
+
+* **Mini-batch mean**:
+  
+  .. math::
+     
+     \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
+
+* **Mini-batch variance**:
+  
+  .. math::
+     
+     \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
+
+* **Normalize**:
+  
+  .. math::
+     
+     \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
+
+* **Scale and shift**:
+  
+  .. math::
+     
+     o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )

 **Attributes**:

 * *epsilon*
+  
  * **Description**: *epsilon* is a constant added to the variance for numerical stability.
  * **Range of values**: a floating-point number greater than or equal to zero
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 **Inputs**

-* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
-* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
+* **1**: ``data`` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
+* **2**: ``gamma`` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **3**: ``beta`` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **4**: ``mean`` - Value for mean normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **5**: ``variance`` - Value for variance normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**

 **Outputs**

-* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
+* **1**: The result of element-wise Batch Normalization operation applied to the input tensor ``data``. A tensor of type *T* and the same shape as ``data`` input tensor.

 **Types**

@ -80,70 +98,73 @@ For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values

 **Examples**

-*Example: 2D input tensor `data`*
+Example: 2D input tensor ``data`` 

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>128</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>128</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>128</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>128</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>128</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>128</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>

-*Example: 4D input tensor `data`*
+Example: 4D input tensor ``data``
+
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>3</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>3</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>3</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>3</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>3</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>3</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/normalization/BatchNormInference_5.md
+++ b/docs/ops/normalization/BatchNormInference_5.md
@ -1,78 +1,97 @@
 # BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_5}

+@sphinxdirective
+
 **Versioned name**: *BatchNormInference-5*

 **Category**: *Normalization*

-**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
+**Short description**: *BatchNormInference* performs Batch Normalization operation described in the `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167v2>`__ article.

 **Detailed Description**

-*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
+*BatchNormInference* performs the following operations on a given data batch input tensor ``data``:

-* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
-\f[
-   \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
-\f]
-where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
+* Normalizes each activation :math:`x^{(k)}` by the mean and variance.

-* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
-\f[
-   \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
-\f]
-where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
+  .. math::
+     
+     \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
+  
+  where :math:`E[x^{(k)}]` and :math:`Var(x^{(k)})` are the mean and variance, calculated per channel axis of ``data`` input, and correspond to ``mean`` and ``variance`` inputs, respectively. Additionally, :math:`\epsilon` is a value added to the variance for numerical stability and corresponds to ``epsilon`` attribute.
+
+* Performs linear transformation of each normalized activation based on ``gamma`` and ``beta`` input, representing the scaling factor and shift, respectively.
+
+  .. math::
+     
+     \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
+  
+  where :math:`\gamma^{(k)}` and :math:`\beta^{(k)}` are learnable parameters, calculated per channel axis, and correspond to ``gamma`` and ``beta`` inputs.

 **Mathematical Formulation**

-Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
+Let ``x`` be a *d*-dimensional input, :math:`x=(x_{1}\dotsc x_{d})`. Since normalization is applied to each activation :math:`E[x^{(k)}]`, you can focus on a particular activation and omit k.

-For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+For a particular activation, consider a mini-batch :math:`\mathcal{B}` of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+
+* **Input**: Values of :math:`x` over a mini-batch:
+  
+  .. math::
+     
+     \mathcal{B} = {x_{1...m}}
+    
+* **Parameters to learn**: :math:`\gamma, \beta`
+* **Output**:
+  
+  .. math::
+     
+     {o_{i} = BN_{\gamma, \beta} ( b_{i} )}
+    
+* **Mini-batch mean**:
+  
+  .. math::
+     
+     \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
+
+* **Mini-batch variance**:
+  
+  .. math::
+     
+     \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
+
+* **Normalize**:
+  
+  .. math::
+     
+     \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
+
+* **Scale and shift**:
+  
+  .. math::
+     
+     o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )

-*   **Input**: Values of \f$x\f$ over a mini-batch:
-    \f[
-    \mathcal{B} = \{ x_{1...m} \}
-    \f]
-*   **Parameters to learn**: \f$ \gamma, \beta\f$
-*   **Output**:
-    \f[
-    \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}
-    \f]
-*   **Mini-batch mean**:
-    \f[
-    \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
-    \f]
-*   **Mini-batch variance**:
-    \f[
-    \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
-    \f]
-*   **Normalize**:
-    \f[
-    \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
-    \f]
-*   **Scale and shift**:
-    \f[
-    o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
-    \f]

 **Attributes**:

 * *epsilon*
+  
  * **Description**: *epsilon* is a constant added to the variance for numerical stability.
  * **Range of values**: a floating-point number greater than or equal to zero
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 **Inputs**

-* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
-* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
+* **1**: ``data`` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
+* **2**: ``gamma`` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **3**: ``beta`` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **4**: ``mean`` - Value for mean normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **5**: ``variance`` - Value for variance normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**

 **Outputs**

-* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
+* **1**: The result of element-wise Batch Normalization operation applied to the input tensor ``data``. A tensor of type *T* and the same shape as ``data`` input tensor.

 **Types**

@ -80,70 +99,73 @@ For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values

 **Examples**

-*Example: 2D input tensor `data`*
+Example: 2D input tensor ``data``

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>128</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>128</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>128</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>128</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>128</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>128</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>

-*Example: 4D input tensor `data`*
+Example: 4D input tensor ``data``
+
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>3</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>3</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>3</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>3</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>3</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>3</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```