[DOCS] shift to rst - opsets F,G (#17253)

2023-04-28 08:47:50 +02:00 · 2023-04-28 08:47:50 +02:00 · 94cf2f8321
commit 94cf2f8321
parent 100d56261a
22 changed files with 1887 additions and 1587 deletions
--- a/docs/ops/activation/GELU_2.md
+++ b/docs/ops/activation/GELU_2.md
@ -1,5 +1,9 @@
 # GELU {#openvino_docs_ops_activation_GELU_2}

+
+@sphinxdirective
+
+
 **Versioned name**: *Gelu-2*

 **Category**: *Activation function*
@ -8,30 +12,31 @@

 **Detailed description**

-*Gelu* operation is introduced in this [article](https://arxiv.org/abs/1606.08415).
+*Gelu* operation is introduced in this `article <https://arxiv.org/abs/1606.08415>`__.
 It performs element-wise activation function on a given input tensor, based on the following mathematical formula:

-\f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
-\f]
+.. math::
+
+   Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]

 where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

-Additionally, *Gelu* function may be approximated as follows:
+Additionally, the *Gelu* function may be approximated as follows:
+
+.. math:: 
+
+   Gelu(x) \approx 0.5\cdot x\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)

-\f[
-    Gelu(x) \approx 0.5\cdot x\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
-\f]

 **Attributes**: *Gelu* operation has no attributes.

 **Inputs**:

-*   **1**: A tensor of type *T* and arbitrary shape. **Required.**
+* **1**: A tensor of type *T* and arbitrary shape. **Required.**

 **Outputs**:

-*   **1**: The result of element-wise *Gelu* function applied to the input tensor. A tensor of type *T* and the same shape as input tensor.
+* **1**: The result of element-wise *Gelu* function applied to the input tensor. A tensor of type *T* and the same shape as input tensor.

 **Types**

@ -39,20 +44,23 @@ Additionally, *Gelu* function may be approximated as follows:

 **Example**

-```xml
-<layer ... type="Gelu">
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="1">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
+.. code-block:: cpp
+
+   <layer ... type="Gelu">
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="1">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective

-```
--- a/docs/ops/activation/GELU_7.md
+++ b/docs/ops/activation/GELU_7.md
@ -1,5 +1,8 @@
 # GELU {#openvino_docs_ops_activation_GELU_7}

+@sphinxdirective
+
+
 **Versioned name**: *Gelu-7*

 **Category**: *Activation function*
@ -8,28 +11,30 @@

 **Detailed description**:

-*Gelu* operation is introduced in this [article](https://arxiv.org/abs/1606.08415).
+*Gelu* operation is introduced in this `article <https://arxiv.org/abs/1606.08415>`__.
 It performs element-wise activation function on a given input tensor, based on the following mathematical formula:

-\f[
-    Gelu(x) = x\cdot\Phi(x)
-\f]
+.. math:: 

-where `Φ(x)` is the Cumulative Distribution Function for Gaussian Distribution.
+   Gelu(x) = x\cdot\Phi(x)
+
+where ``Φ(x)`` is the Cumulative Distribution Function for Gaussian Distribution.

 The *Gelu* function may be approximated in two different ways based on *approximation_mode* attribute.

-For `erf` approximation mode, *Gelu* function is represented as:
+For ``erf`` approximation mode, *Gelu* function is represented as:

-\f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
-\f]
+.. math:: 

-For `tanh` approximation mode, *Gelu* function is represented as:
+   Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
+
+
+For ``tanh`` approximation mode, *Gelu* function is represented as:
+
+.. math:: 
+
+   Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)

-\f[
-    Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
-\f]

 **Attributes**

@ -37,10 +42,12 @@ For `tanh` approximation mode, *Gelu* function is represented as:

  * **Description**: Specifies the formulae to calculate the *Gelu* function.
  * **Range of values**:
-    * `erf` - calculate output using the Gauss error function
-    * `tanh` - calculate output using tanh approximation
-  * **Type**: `string`
-  * **Default value**: `erf`
+
+    * ``erf`` - calculate output using the Gauss error function
+    * ``tanh`` - calculate output using tanh approximation
+
+  * **Type**: ``string``
+  * **Default value**: ``erf``
  * **Required**: *no*

 **Inputs**:
@ -57,45 +64,49 @@ For `tanh` approximation mode, *Gelu* function is represented as:

 **Examples**

-*Example: `tanh` approximation mode*
+*Example*: ``tanh`` approximation mode

-```xml
-<layer ... type="Gelu">
-    <data approximation_mode="tanh"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="1">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp

-*Example: `erf` approximation mode*
+   <layer ... type="Gelu">
+       <data approximation_mode="tanh"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="1">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>

-```xml
-<layer ... type="Gelu">
-    <data approximation_mode="erf"/>
-    <input>
-        <port id="0">
-            <dim>3</dim>
-            <dim>7</dim>
-            <dim>9</dim>
-        </port>
-    </input>
-    <output>
-        <port id="1">
-            <dim>3</dim>
-            <dim>7</dim>
-            <dim>9</dim>
-        </port>
-    </output>
-</layer>

-```
+*Example:* ``erf`` approximation mode
+
+.. code-block:: cpp
+
+   <layer ... type="Gelu">
+       <data approximation_mode="erf"/>
+       <input>
+           <port id="0">
+               <dim>3</dim>
+               <dim>7</dim>
+               <dim>9</dim>
+           </port>
+       </input>
+       <output>
+           <port id="1">
+               <dim>3</dim>
+               <dim>7</dim>
+               <dim>9</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/arithmetic/FloorMod_1.md
+++ b/docs/ops/arithmetic/FloorMod_1.md
@ -1,19 +1,23 @@
 # FloorMod  {#openvino_docs_ops_arithmetic_FloorMod_1}
 
+@sphinxdirective
+
 **Versioned name**: *FloorMod-1*

 **Category**: *Arithmetic binary*

 **Short description**: *FloorMod* performs an element-wise floor modulo operation with two given tensors applying broadcasting rule specified in the *auto_broadcast* attribute.

-**Detailed description**
+**Detailed description**:
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:

-\f[
-o_{i} = a_{i} \mod b_{i}
-\f]
+.. math::

-*FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.
+   o_{i} = a_{i} \mod b_{i}
+
+
+*FloorMod* operation computes a reminder of a floored division. It is the same behavior like in 
+Python programming language: :math:`floor(x / y) * y + floor\_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.

 **Attributes**:

@ -21,8 +25,10 @@ o_{i} = a_{i} \mod b_{i}

  * **Description**: specifies rules used for auto-broadcasting of input tensors.
  * **Range of values**:
+
    * *none* - no auto-broadcasting is allowed, all input shapes must match
-    * *numpy* - numpy broadcasting rules, description is available in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md)
+    * *numpy* - numpy broadcasting rules, description is available in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`
+
  * **Type**: string
  * **Default value**: "numpy"
  * **Required**: *no*
@ -44,52 +50,56 @@ o_{i} = a_{i} \mod b_{i}

 *Example 1 - no broadcasting*

-```xml
-<layer ... type="FloorMod">
-    <data auto_broadcast="none"/>
-    <input>
-        <port id="0">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-        <port id="1">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block::
+
+   <layer ... type="FloorMod">
+       <data auto_broadcast="none"/>
+       <input>
+           <port id="0">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+           <port id="1">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </output>
+   </layer>

 *Example 2: numpy broadcasting*
-```xml
-<layer ... type="FloorMod">
-    <data auto_broadcast="numpy"/>
-    <input>
-        <port id="0">
-            <dim>8</dim>
-            <dim>1</dim>
-            <dim>6</dim>
-            <dim>1</dim>
-        </port>
-        <port id="1">
-            <dim>7</dim>
-            <dim>1</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>8</dim>
-            <dim>7</dim>
-            <dim>6</dim>
-            <dim>5</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer ... type="FloorMod">
+       <data auto_broadcast="numpy"/>
+       <input>
+           <port id="0">
+               <dim>8</dim>
+               <dim>1</dim>
+               <dim>6</dim>
+               <dim>1</dim>
+           </port>
+           <port id="1">
+               <dim>7</dim>
+               <dim>1</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>8</dim>
+               <dim>7</dim>
+               <dim>6</dim>
+               <dim>5</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
--- a/docs/ops/arithmetic/Floor_1.md
+++ b/docs/ops/arithmetic/Floor_1.md
@ -1,5 +1,8 @@
 # Floor  {#openvino_docs_ops_arithmetic_Floor_1}
 
+@sphinxdirective
+
+
 **Versioned name**: *Floor-1*

 **Category**: *Arithmetic unary*
@ -9,9 +12,9 @@
 **Detailed description**: For each element from the input tensor calculates corresponding
 element in the output tensor with the following formula:

-\f[
-a_{i} = \lfloor a_{i} \rfloor
-\f]
+.. math::
+
+   a_{i} = \lfloor a_{i} \rfloor

 **Attributes**: *Floor* operation has no attributes.

@ -32,19 +35,22 @@ a_{i} = \lfloor a_{i} \rfloor

 *Example 1*

-```xml
-<layer ... type="Floor">
-    <input>
-        <port id="0">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </input>
-    <output>
-        <port id="1">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="Floor">
+       <input>
+           <port id="0">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </input>
+       <output>
+           <port id="1">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
--- a/docs/ops/comparison/GreaterEqual_1.md
+++ b/docs/ops/comparison/GreaterEqual_1.md
@ -1,19 +1,25 @@
 # GreaterEqual {#openvino_docs_ops_comparison_GreaterEqual_1}

+@sphinxdirective
+
+
 **Versioned name**: *GreaterEqual-1*

 **Category**: *Comparison binary*

-**Short description**: *GreaterEqual* performs element-wise comparison operation with two given tensors applying broadcast rules specified in the `auto_broadcast` attribute.
+**Short description**: *GreaterEqual* performs element-wise comparison operation with two given 
+tensors applying broadcast rules specified in the ``auto_broadcast`` attribute.

 **Detailed description**
-Before performing arithmetic operation, input tensors *a* and *b* are broadcasted if their shapes are different and `auto_broadcast` attribute is not `none`. Broadcasting is performed according to `auto_broadcast` value.
+Before performing arithmetic operation, input tensors *a* and *b* are broadcasted if their shapes are 
+different and ``auto_broadcast`` attribute is not ``none``. Broadcasting is performed according to ``auto_broadcast`` value.

 After broadcasting, *GreaterEqual* does the following with the input tensors *a* and *b*:

-\f[
-o_{i} = a_{i} \geq b_{i}
-\f]
+.. math::
+
+   o_{i} = a_{i} \geq b_{i}
+

 **Attributes**:

@ -21,74 +27,85 @@ o_{i} = a_{i} \geq b_{i}

  * **Description**: specifies rules used for auto-broadcasting of input tensors.
  * **Range of values**:
+  
    * *none* - no auto-broadcasting is allowed, all input shapes should match
-    * *numpy* - numpy broadcasting rules, description is available in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md)
-    * *pdpd* - PaddlePaddle-style implicit broadcasting, description is available in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md)
+    * *numpy* - numpy broadcasting rules, description is available in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`
+    * *pdpd* - PaddlePaddle-style implicit broadcasting, description is available in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`
+  
  * **Type**: string
  * **Default value**: "numpy"
  * **Required**: *no*

 **Inputs**
+
 * **1**: A tensor of type *T* and arbitrary shape. **Required.**
 * **2**: A tensor of type *T* and arbitrary shape. **Required.**

 **Outputs**

-* **1**: The result of element-wise *GreaterEqual* operation applied to the input tensors. A tensor of type *T_BOOL* and shape equal to broadcasted shape of two inputs.
+* **1**: The result of element-wise *GreaterEqual* operation applied to the input tensors. 
+  A tensor of type *T_BOOL* and shape equal to broadcasted shape of two inputs.

 **Types**

 * *T*: arbitrary supported type.
-* *T_BOOL*: `boolean`.
+* *T_BOOL*: ``boolean``.

 **Examples**

 *Example 1: no broadcast*

-```xml
-<layer ... type="GreaterEqual">
-    <input>
-        <port id="0">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-        <port id="1">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="GreaterEqual">
+       <input>
+           <port id="0">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+           <port id="1">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </output>
+   </layer>
+

 *Example 2: numpy broadcast*
-```xml
-<layer ... type="GreaterEqual">
-    <input>
-        <port id="0">
-            <dim>8</dim>
-            <dim>1</dim>
-            <dim>6</dim>
-            <dim>1</dim>
-        </port>
-        <port id="1">
-            <dim>7</dim>
-            <dim>1</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>8</dim>
-            <dim>7</dim>
-            <dim>6</dim>
-            <dim>5</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer ... type="GreaterEqual">
+       <input>
+           <port id="0">
+               <dim>8</dim>
+               <dim>1</dim>
+               <dim>6</dim>
+               <dim>1</dim>
+           </port>
+           <port id="1">
+               <dim>7</dim>
+               <dim>1</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>8</dim>
+               <dim>7</dim>
+               <dim>6</dim>
+               <dim>5</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
+
--- a/docs/ops/comparison/Greater_1.md
+++ b/docs/ops/comparison/Greater_1.md
@ -1,19 +1,26 @@
 # Greater {#openvino_docs_ops_comparison_Greater_1}

+@sphinxdirective
+
+
 **Versioned name**: *Greater-1*

 **Category**: *Comparison binary*

-**Short description**: *Greater* performs element-wise comparison operation with two given tensors applying broadcast rules specified in the `auto_broadcast` attribute.
+**Short description**: *Greater* performs element-wise comparison operation with two 
+given tensors applying broadcast rules specified in the ``auto_broadcast`` attribute.

 **Detailed description**
-Before performing arithmetic operation, input tensors *a* and *b* are broadcasted if their shapes are different and `auto_broadcast` attribute is not `none`. Broadcasting is performed according to `auto_broadcast` value.
+Before performing arithmetic operation, input tensors *a* and *b* are broadcasted if 
+their shapes are different and ``auto_broadcast`` attribute is not ``none``. 
+Broadcasting is performed according to ``auto_broadcast`` value.

 After broadcasting, *Greater* does the following with the input tensors *a* and *b*:

-\f[
-o_{i} = a_{i} > b_{i}
-\f]
+.. math:: 
+   
+   o_{i} = a_{i} > b_{i}
+

 **Attributes**:

@ -21,9 +28,11 @@ o_{i} = a_{i} > b_{i}

  * **Description**: specifies rules used for auto-broadcasting of input tensors.
  * **Range of values**:
+  * 
    * *none* - no auto-broadcasting is allowed, all input shapes should match
-    * *numpy* - numpy broadcasting rules, description is available in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md),
-    * *pdpd* - PaddlePaddle-style implicit broadcasting, description is available in [Broadcast Rules For Elementwise Operations](../broadcast_rules.md).
+    * *numpy* - numpy broadcasting rules, description is available in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`
+    * *pdpd* - PaddlePaddle-style implicit broadcasting, description is available in :doc:`Broadcast Rules For Elementwise Operations <openvino_docs_ops_broadcast_rules>`.
+  
  * **Type**: string
  * **Default value**: "numpy"
  * **Required**: *no*
@ -35,63 +44,72 @@ o_{i} = a_{i} > b_{i}

 **Outputs**

-* **1**: The result of element-wise *Greater* operation applied to the input tensors. A tensor of type *T_BOOL* and  shape equal to broadcasted shape of two inputs.
+* **1**: The result of element-wise *Greater* operation applied to the input tensors. 
+  A tensor of type *T_BOOL* and  shape equal to broadcasted shape of two inputs.

 **Types**

 * *T*: arbitrary supported type.
-* *T_BOOL*: `boolean`.
+* *T_BOOL*: ``boolean``.

 **Examples**

 *Example 1: no broadcast*

-```xml
-<layer ... type="Greater">
-    <data auto_broadcast="none"/>
-    <input>
-        <port id="0">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-        <port id="1">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>256</dim>
-            <dim>56</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="Greater">
+       <data auto_broadcast="none"/>
+       <input>
+           <port id="0">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+           <port id="1">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>256</dim>
+               <dim>56</dim>
+           </port>
+       </output>
+   </layer>
+

 *Example 2: numpy broadcast*
-```xml
-<layer ... type="Greater">
-    <data auto_broadcast="numpy"/>
-    <input>
-        <port id="0">
-            <dim>8</dim>
-            <dim>1</dim>
-            <dim>6</dim>
-            <dim>1</dim>
-        </port>
-        <port id="1">
-            <dim>7</dim>
-            <dim>1</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>8</dim>
-            <dim>7</dim>
-            <dim>6</dim>
-            <dim>5</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer ... type="Greater">
+       <data auto_broadcast="numpy"/>
+       <input>
+           <port id="0">
+               <dim>8</dim>
+               <dim>1</dim>
+               <dim>6</dim>
+               <dim>1</dim>
+           </port>
+           <port id="1">
+               <dim>7</dim>
+               <dim>1</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>8</dim>
+               <dim>7</dim>
+               <dim>6</dim>
+               <dim>5</dim>
+           </port>
+       </output>
+   </layer>
+
+
+
+@endsphinxdirective
+
+
--- a/docs/ops/convolution/GroupConvolutionBackpropData_1.md
+++ b/docs/ops/convolution/GroupConvolutionBackpropData_1.md
@ -1,81 +1,101 @@
 # GroupConvolutionBackpropData {#openvino_docs_ops_convolution_GroupConvolutionBackpropData_1}

+
+@sphinxdirective
+
+
 **Versioned name**: *GroupConvolutionBackpropData-1*

 **Category**: *Convolution*

 **Short description**: Computes 1D, 2D or 3D *GroupConvolutionBackpropData* of input and kernel tensors.

-**Detailed description**: Splits input and filters into multiple groups, computes *ConvolutionBackpropData* on them and concatenates the results. It is equivalent to GroupConvolution and Convolution relationship.
+**Detailed description**: Splits input and filters into multiple groups, computes *ConvolutionBackpropData* 
+on them and concatenates the results. It is equivalent to GroupConvolution and Convolution relationship.
+
+**Attributes**: The operation has the same attributes as a *ConvolutionBackpropData*. Number of groups 
+is derived from the kernel shape.

-**Attributes**: The operation has the same attributes as a *ConvolutionBackpropData*. Number of groups is derived from the kernel shape.

 * *strides*

-  * **Description**: *strides* has the same definition as *strides* for a regular Convolution but applied in the backward way, for the output tensor.
+  * **Description**: *strides* has the same definition as *strides* for a regular Convolution but applied in 
+    the backward way, for the output tensor.
  * **Range of values**: positive integers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*

 * *pads_begin*

-  * **Description**: *pads_begin* has the same definition as *pads_begin* for a regular Convolution but applied in the backward way, for the output tensor. May be omitted, in which case pads are calculated automatically.
+  * **Description**: *pads_begin* has the same definition as *pads_begin* for a regular Convolution but applied in 
+    the backward way, for the output tensor. May be omitted, in which case pads are calculated automatically.
  * **Range of values**: non-negative integers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*
  * **Note**: the attribute is ignored when *auto_pad* attribute is specified.

 * *pads_end*

-  * **Description**: *pads_end* has the same definition as *pads_end* for a regular Convolution but applied in the backward way, for the output tensor. May be omitted, in which case pads are calculated automatically.
+  * **Description**: *pads_end* has the same definition as *pads_end* for a regular Convolution but applied
+    in the backward way, for the output tensor. May be omitted, in which case pads are calculated automatically.
  * **Range of values**: non-negative integers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*
  * **Note**: the attribute is ignored when *auto_pad* attribute is specified.

 * *dilations*

-  * **Description**: *dilations* has the same definition as *dilations* for a regular Convolution but applied in the backward way, for the output tensor.
+  * **Description**: *dilations* has the same definition as *dilations* for a regular Convolution but applied 
+    in the backward way, for the output tensor.
  * **Range of values**: positive integers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*

 * *auto_pad*

-  * **Description**: *auto_pad* has the same definition as *auto_pad* for a regular Convolution but applied in the backward way, for the output tensor.
+  * **Description**: *auto_pad* has the same definition as *auto_pad* for a regular Convolution but applied 
+    in the backward way, for the output tensor.
+
    * *explicit* - use explicit padding values from *pads_begin* and *pads_end*.
    * *same_upper* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the end.
    * *same_lower* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning.
    * *valid* - do not use padding.
-  * **Type**: `string`
+
+  * **Type**: ``string``
  * **Default value**: explicit
  * **Required**: *no*
  * **Note**: *pads_begin* and *pads_end* attributes are ignored when *auto_pad* is specified.

 * *output_padding*

-  * **Description**: *output_padding* adds additional amount of paddings per each spatial axis in the output tensor. It unlocks more elements in the output allowing them to be computed. Elements are added at the higher coordinate indices for the spatial dimensions. Number of elements in *output_padding* list matches the number of spatial dimensions in input and output tensors.
+  * **Description**: *output_padding* adds additional amount of paddings per each spatial axis in the output tensor. 
+    It unlocks more elements in the output allowing them to be computed. Elements are added at the higher coordinate 
+    indices for the spatial dimensions. Number of elements in *output_padding* list matches the number of spatial 
+    dimensions in input and output tensors.
  * **Range of values**: non-negative integer values
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Default value**: all zeros
  * **Required**: *no*

 **Inputs**:

-*   **1**: Input tensor of type `T1` and rank 3, 4 or 5. Layout is `[N, GROUPS * C_IN, Z, Y, X]` (number of batches, number of channels, spatial axes Z, Y, X). **Required.**
+* **1**: Input tensor of type ``T1`` and rank 3, 4 or 5. Layout is ``[N, GROUPS * C_IN, Z, Y, X]`` 
+  (number of batches, number of channels, spatial axes Z, Y, X). **Required.**
+* **2**: Kernel tensor of type ``T1`` and rank 4, 5 or 6. Layout is ``[GROUPS, C_IN, C_OUT, X, Y, Z]`` 
+  (number of groups, number of input channels, number of output channels, spatial axes X, Y, Z). **Required.**

-*   **2**: Kernel tensor of type `T1` and rank 4, 5 or 6. Layout is `[GROUPS, C_IN, C_OUT, X, Y, Z]` (number of groups, number of input channels, number of output channels, spatial axes X, Y, Z). **Required.**
+* **3**: Output shape tensor of type ``T2`` and rank 1. It specifies spatial shape of the output. **Optional.**
+* **Note** Number of groups is derived from the shape of the kernel and not specified by any attribute.
+* **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute:

-*   **3**: Output shape tensor of type `T2` and rank 1. It specifies spatial shape of the output. **Optional.**
-*   **Note** Number of groups is derived from the shape of the kernel and not specified by any attribute.
-*   **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute:
      * 1D convolution (input tensors rank 3) means that there is only one spatial axis X
      * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X
      * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X

 **Outputs**:

-*   **1**: Output tensor of type `T1` and rank 3, 4 or 5 (the same as input *1*). Layout is `[N, GROUPS * C_OUT, Z, Y, X]` (number of batches, number of kernel output channels, spatial axes Z, Y, X).
+* **1**: Output tensor of type ``T1`` and rank 3, 4 or 5 (the same as input *1*). Layout is ``[N, GROUPS * C_OUT, Z, Y, X]`` 
+  (number of batches, number of kernel output channels, spatial axes Z, Y, X).

 **Types**:

@ -85,91 +105,100 @@
 **Example**

 1D GroupConvolutionBackpropData
-```xml
-<layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
-    <data dilations="1" pads_begin="1" pads_end="1" strides="2"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>20</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>5</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="0" precision="FP32">
-            <dim>1</dim>
-            <dim>8</dim>
-            <dim>447</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
+       <data dilations="1" pads_begin="1" pads_end="1" strides="2"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>20</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>5</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="0" precision="FP32">
+               <dim>1</dim>
+               <dim>8</dim>
+               <dim>447</dim>
+           </port>
+       </output>
+   </layer>
+

 2D GroupConvolutionBackpropData
-```xml
-<layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
-    <data dilations="1,1" pads_begin="1,1" pads_end="1,1" strides="2,2"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>20</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>5</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="0" precision="FP32">
-            <dim>1</dim>
-            <dim>8</dim>
-            <dim>447</dim>
-            <dim>447</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
+       <data dilations="1,1" pads_begin="1,1" pads_end="1,1" strides="2,2"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>20</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>5</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="0" precision="FP32">
+               <dim>1</dim>
+               <dim>8</dim>
+               <dim>447</dim>
+               <dim>447</dim>
+           </port>
+       </output>
+   </layer>
+

 3D GroupConvolutionBackpropData
-```xml
-<layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
-    <data dilations="1,1,1" pads_begin="1,1,1" pads_end="1,1,1" strides="2,2,2"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>20</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>5</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-            <dim>3</dim>
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="0" precision="FP32">
-            <dim>1</dim>
-            <dim>8</dim>
-            <dim>447</dim>
-            <dim>447</dim>
-            <dim>447</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer id="5" name="upsampling_node" type="GroupConvolutionBackpropData">
+       <data dilations="1,1,1" pads_begin="1,1,1" pads_end="1,1,1" strides="2,2,2"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>20</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>5</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+               <dim>3</dim>
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="0" precision="FP32">
+               <dim>1</dim>
+               <dim>8</dim>
+               <dim>447</dim>
+               <dim>447</dim>
+               <dim>447</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/convolution/GroupConvolution_1.md
+++ b/docs/ops/convolution/GroupConvolution_1.md
@ -1,160 +1,187 @@
 # GroupConvolution  {#openvino_docs_ops_convolution_GroupConvolution_1}

+@sphinxdirective
+
 **Versioned name**: *GroupConvolution-1*

 **Category**: *Convolution*

 **Short description**: Computes 1D, 2D or 3D GroupConvolution of input and kernel tensors.

-**Detailed description**: Splits input into multiple groups, convolves them with group filters as in regular convolution and concatenates the results. More thorough explanation can be found in [ImageNet Classification with Deep Convolutional
-Neural Networks](https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
+**Detailed description**: Splits input into multiple groups, convolves them with group filters 
+as in regular convolution and concatenates the results. More thorough explanation can be found in 
+`ImageNet Classification with Deep Convolutional Neural Networks <https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf>`__

 **Attributes**: The operation has the same attributes as a regular _Convolution_. Number of groups is derived from the kernel shape.

 * *strides*

-  * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the `(z, y, x)` axes for 3D convolutions and `(y, x)` axes for 2D convolutions. For example, *strides* equal `4,2,1` means sliding the filter 4 pixel at a time over depth dimension, 2 over height dimension and 1 over width dimension.
+  * **Description**: *strides* is a distance (in pixels) to slide the filter on the feature map over the ``(z, y, x)`` 
+    axes for 3D convolutions and ``(y, x)`` axes for 2D convolutions. For example, *strides* equal ``4,2,1`` means sliding 
+    the filter 4 pixel at a time over depth dimension, 2 over height dimension and 1 over width dimension.
  * **Range of values**: positive integer numbers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*

 * *pads_begin*

-  * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, *pads_begin* equal `1,2` means adding 1 pixel to the top of the input and 2 to the left of the input.
+  * **Description**: *pads_begin* is a number of pixels to add to the beginning along each axis. For example, 
+    *pads_begin* equal ``1,2`` means adding 1 pixel to the top of the input and 2 to the left of the input.
  * **Range of values**: positive integer numbers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*
  * **Note**: the attribute is ignored when *auto_pad* attribute is specified.

 * *pads_end*

-  * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, *pads_end* equal `1,2` means adding 1 pixel to the bottom of the input and 2 to the right of the input.
+  * **Description**: *pads_end* is a number of pixels to add to the ending along each axis. For example, 
+    *pads_end* equal ``1,2`` means adding 1 pixel to the bottom of the input and 2 to the right of the input.
  * **Range of values**: positive integer numbers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*
  * **Note**: the attribute is ignored when *auto_pad* attribute is specified.

 * *dilations*

-  * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. For example, *dilation* equal `1,1` means that all the elements in the filter are neighbors, so it is the same as for the usual convolution. *dilation* equal `2,2` means that all the elements in the filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
+  * **Description**: *dilations* denotes the distance in width and height between elements (weights) in the filter. 
+    For example, *dilation* equal ``1,1`` means that all the elements in the filter are neighbors, 
+    so it is the same as for the usual convolution. *dilation* equal ``2,2`` means that all the elements in the 
+    filter are matched not to adjacent elements in the input matrix, but to those that are adjacent with distance 1.
  * **Range of values**: positive integer numbers
-  * **Type**: `int[]`
+  * **Type**: ``int[]``
  * **Required**: *yes*

 * *auto_pad*

  * **Description**: *auto_pad* how the padding is calculated. Possible values:
+  
    * *explicit* - use explicit padding values from *pads_begin* and *pads_end*.
    * *same_upper* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the end.
    * *same_lower* - the input is padded to match the output size. In case of odd padding value an extra padding is added at the beginning.
    * *valid* - do not use padding.
-  * **Type**: `string`
+  
+  * **Type**: ``string``
  * **Default value**: explicit
  * **Required**: *no*
  * **Note**: *pads_begin* and *pads_end* attributes are ignored when *auto_pad* is specified.

 **Inputs**:

-*   **1**: Input tensor of type *T* and rank 3, 4 or 5. Layout is `[N, GROUPS * C_IN, Z, Y, X]` (number of batches, number of channels, spatial axes Z, Y, X). **Required.**
-*   **2**: Convolution kernel tensor of type *T* and rank 4, 5 or 6. Layout is `[GROUPS, C_OUT, C_IN, Z, Y, X]` (number of groups, number of output channels, number of input channels, spatial axes Z, Y, X),
-  *   **Note** Number of groups is derived from the shape of the kernel and not specified by any attribute.
-  *   **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute:
-      * 1D convolution (input tensors rank 3) means that there is only one spatial axis X
-      * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X
-      * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X
+* **1**: Input tensor of type *T* and rank 3, 4 or 5. Layout is ``[N, GROUPS * C_IN, Z, Y, X]`` 
+  (number of batches, number of channels, spatial axes Z, Y, X). **Required.**
+* **2**: Convolution kernel tensor of type *T* and rank 4, 5 or 6. Layout is ``[GROUPS, C_OUT, C_IN, Z, Y, X]`` 
+  (number of groups, number of output channels, number of input channels, spatial axes Z, Y, X),
+
+  * **Note** Number of groups is derived from the shape of the kernel and not specified by any attribute.
+  * **Note**: Type of the convolution (1D, 2D or 3D) is derived from the rank of the input tensors and not specified by any attribute:
+
+    * 1D convolution (input tensors rank 3) means that there is only one spatial axis X
+    * 2D convolution (input tensors rank 4) means that there are two spatial axes Y, X
+    * 3D convolution (input tensors rank 5) means that there are three spatial axes Z, Y, X

 **Outputs**:

-*   **1**: Output tensor of type *T* and rank 3, 4 or 5. Layout is `[N, GROUPS * C_OUT, Z, Y, X]` (number of batches, number of output channels, spatial axes Z, Y, X).
+* **1**: Output tensor of type *T* and rank 3, 4 or 5. Layout is ``[N, GROUPS * C_OUT, Z, Y, X]`` 
+  (number of batches, number of output channels, spatial axes Z, Y, X).

 **Types**:

 * *T*: any numeric type.

 **Example**:
+
 1D GroupConvolution
-```xml
-<layer type="GroupConvolution" ...>
-    <data dilations="1" pads_begin="2" pads_end="2" strides="1" auto_pad="explicit"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>12</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2" precision="FP32">
-            <dim>1</dim>
-            <dim>4</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-```
+
+.. code-block:: cpp
+
+   <layer type="GroupConvolution" ...>
+       <data dilations="1" pads_begin="2" pads_end="2" strides="1" auto_pad="explicit"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>12</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2" precision="FP32">
+               <dim>1</dim>
+               <dim>4</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+

 2D GroupConvolution
-```xml
-<layer type="GroupConvolution" ...>
-    <data dilations="1,1" pads_begin="2,2" pads_end="2,2" strides="1,1" auto_pad="explicit"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>12</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>5</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2" precision="FP32">
-            <dim>1</dim>
-            <dim>4</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-```
+
+.. code-block:: cpp
+
+   <layer type="GroupConvolution" ...>
+       <data dilations="1,1" pads_begin="2,2" pads_end="2,2" strides="1,1" auto_pad="explicit"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>12</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>5</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2" precision="FP32">
+               <dim>1</dim>
+               <dim>4</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+

 3D GroupConvolution
-```xml
-<layer type="GroupConvolution" ...>
-    <data dilations="1,1,1" pads_begin="2,2,2" pads_end="2,2,2" strides="1,1,1" auto_pad="explicit"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>12</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">
-            <dim>4</dim>
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>5</dim>
-            <dim>5</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2" precision="FP32">
-            <dim>1</dim>
-            <dim>4</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-```
+
+.. code-block:: cpp
+
+   <layer type="GroupConvolution" ...>
+       <data dilations="1,1,1" pads_begin="2,2,2" pads_end="2,2,2" strides="1,1,1" auto_pad="explicit"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>12</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">
+               <dim>4</dim>
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>5</dim>
+               <dim>5</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2" precision="FP32">
+               <dim>1</dim>
+               <dim>4</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+
+
+@endsphinxdirective
+
--- a/docs/ops/detection/GenerateProposals_9.md
+++ b/docs/ops/detection/GenerateProposals_9.md
@ -1,5 +1,7 @@
 # GenerateProposals {#openvino_docs_ops_detection_GenerateProposals_9}

+@sphinxdirective
+
 **Versioned name**: *GenerateProposals-9*

 **Category**: *Object detection*
@ -9,18 +11,20 @@ based on input data for each image in the batch.

 **Detailed description**: The operation performs the following steps for each image:

-1.  Transposes and reshapes predicted bounding boxes deltas and scores to get them into the same dimension order as the
-anchors.
-2.  Transforms anchors and deltas into proposal bboxes and clips proposal bboxes to an image. The attribute *normalized*
-indicates whether the proposal bboxes are normalized or not.
-3.  Sorts all `(proposal, score)` pairs by score from highest to lowest; order of pairs with equal scores is undefined.
-4.  Takes top *pre_nms_count* proposals, if total number of proposals is less than *pre_nms_count* takes all proposals.
-5.  Removes predicted boxes with either height or width < *min_size*.
-6.  Applies non-maximum suppression with *adaptive_nms_threshold*. The initial value of *adaptive_nms_threshold* is
-*nms_threshold*. If `nms_eta < 1` and `adaptive_threshold > 0.5`, update `adaptive_threshold *= nms_eta`.
-7.  Takes and returns top proposals after nms operation. The number of returned proposals in each image is dynamic and is specified by output port 3 `rpnroisnum`. And the max number of proposals in each image is specified by attribute *post_nms_count*.
+1. Transposes and reshapes predicted bounding boxes deltas and scores to get them into the same dimension order as the
+   anchors.
+2. Transforms anchors and deltas into proposal bboxes and clips proposal bboxes to an image. The attribute *normalized*
+   indicates whether the proposal bboxes are normalized or not.
+3. Sorts all ``(proposal, score)`` pairs by score from highest to lowest; order of pairs with equal scores is undefined.
+4. Takes top *pre_nms_count* proposals, if total number of proposals is less than *pre_nms_count* takes all proposals.
+5. Removes predicted boxes with either height or width < *min_size*.
+6. Applies non-maximum suppression with *adaptive_nms_threshold*. The initial value of *adaptive_nms_threshold* is
+   *nms_threshold*. If ``nms_eta < 1`` and ``adaptive_threshold > 0.5``, update ``adaptive_threshold *= nms_eta``.
+7. Takes and returns top proposals after nms operation. The number of returned proposals in each image is dynamic 
+   and is specified by output port 3 ``rpnroisnum``. And the max number of proposals in each image is specified 
+   by attribute *post_nms_count*.

-All proposals of the whole batch are concated image by image, and distinguishable through outputs.
+All proposals of the whole batch are concatenated image by image, and distinguishable through outputs.

 **Attributes**:

@ -65,38 +69,40 @@ All proposals of the whole batch are concated image by image, and distinguishabl
 * *nms_eta*

    * **Description**: eta parameter for adaptive NMS.
-    * **Range of values**: a floating-point number in closed range `[0, 1.0]`.
+    * **Range of values**: a floating-point number in closed range ``[0, 1.0]``.
    * **Type**: float
-    * **Default value**: `1.0`
+    * **Default value**: ``1.0``
    * **Required**: *no*

 * *roi_num_type*

-    * **Description**: the type of element of output 3 `rpnroisnum`.
+    * **Description**: the type of element of output 3 ``rpnroisnum``.
    * **Range of values**: i32, i64
    * **Type**: string
-    * **Default value**: `i64`
+    * **Default value**: ``i64``
    * **Required**: *no*

 **Inputs**

-* **1**: `im_info` - tensor of type *T* and shape `[num_batches, 3]` or `[num_batches, 4]` providing input image info. The image info is layout as `[image_height, image_width, scale_height_and_width]` or as `[image_height, image_width, scale_height, scale_width]`. **Required.**
+* **1**: ``im_info`` - tensor of type *T* and shape ``[num_batches, 3]`` or ``[num_batches, 4]`` providing 
+  input image info. The image info is layout as ``[image_height, image_width, scale_height_and_width]`` or as 
+  ``[image_height, image_width, scale_height, scale_width]``. **Required.**
+* **2**: ``anchors`` - tensor of type *T* with shape ``[height, width, number_of_anchors, 4]`` providing anchors. 
+  Each anchor is layouted as ``[xmin, ymin, xmax, ymax]``. **Required.**
+* **3**: ``boxesdeltas`` - tensor of type *T* with shape ``[num_batches, number_of_anchors * 4, height, width]`` 
+  providing deltas for anchors. The delta consists of 4 element tuples with layout ``[dx, dy, log(dw), log(dh)]``. **Required.**
+* **4**: ``scores`` - tensor of type *T* with shape ``[num_batches, number_of_anchors, height, width]`` providing proposals scores. **Required.**

-* **2**: `anchors` - tensor of type *T* with shape `[height, width, number_of_anchors, 4]` providing anchors. Each anchor is layouted as `[xmin, ymin, xmax, ymax]`. **Required.**
-
-* **3**: `boxesdeltas` - tensor of type *T* with shape `[num_batches, number_of_anchors * 4, height, width]` providing deltas for anchors. The delta consists of 4 element tuples with layout `[dx, dy, log(dw), log(dh)]`. **Required.**
-
-* **4**: `scores` - tensor of type *T* with shape `[num_batches, number_of_anchors, height, width]` providing proposals scores.  **Required.**
-
-The `height` and `width` from inputs `anchors`, `boxesdeltas` and `scores` are the height and width of feature maps.
+The ``height`` and ``width`` from inputs ``anchors``, ``boxesdeltas`` and ``scores`` are the height and width of feature maps.

 **Outputs**

-* **1**: `rpnrois` - tensor of type *T* with shape `[num_rois, 4]` providing proposed ROIs. The proposals are layouted as `[xmin, ymin, xmax, ymax]`. The `num_rois` means the total proposals number of all the images in one batch. `num_rois` is a dynamic dimension.
-
-* **2**: `rpnscores` - tensor of type *T* with shape `[num_rois]` providing proposed ROIs scores.
-
-* **3**: `rpnroisnum` - tensor of type *roi_num_type* with shape `[num_batches]` providing the number of proposed ROIs in each image.
+* **1**: ``rpnrois`` - tensor of type *T* with shape ``[num_rois, 4]`` providing proposed ROIs. 
+  The proposals are layouted as ``[xmin, ymin, xmax, ymax]``. The ``num_rois`` means the total proposals 
+  number of all the images in one batch. ``num_rois`` is a dynamic dimension.
+* **2**: ``rpnscores`` - tensor of type *T* with shape ``[num_rois]`` providing proposed ROIs scores.
+* **3**: ``rpnroisnum`` - tensor of type *roi_num_type* with shape ``[num_batches]`` providing the number 
+  of proposed ROIs in each image.

 **Types**

@ -104,44 +110,48 @@ The `height` and `width` from inputs `anchors`, `boxesdeltas` and `scores` are t

 **Example**

-```xml
-<layer ... type="GenerateProposals" version="opset9">
-    <data min_size="0.0" nms_threshold="0.699999988079071" post_nms_count="1000" pre_nms_count="1000" roi_num_type="i32"/>
-    <input>
-        <port id="0">
-            <dim>8</dim>
-            <dim>3</dim>
-        </port>
-        <port id="1">
-            <dim>50</dim>
-            <dim>84</dim>
-            <dim>3</dim>
-            <dim>4</dim>
-        </port>
-        <port id="2">
-            <dim>8</dim>
-            <dim>12</dim>
-            <dim>50</dim>
-            <dim>84</dim>
-        </port>
-        <port id="3">
-            <dim>8</dim>
-            <dim>3</dim>
-            <dim>50</dim>
-            <dim>84</dim>
-        </port>
-    </input>
-    <output>
-        <port id="4" precision="FP32">
-            <dim>-1</dim>
-            <dim>4</dim>
-        </port>
-        <port id="5" precision="FP32">
-            <dim>-1</dim>
-        </port>
-        <port id="6" precision="I32">
-            <dim>8</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="GenerateProposals" version="opset9">
+       <data min_size="0.0" nms_threshold="0.699999988079071" post_nms_count="1000" pre_nms_count="1000" roi_num_type="i32"/>
+       <input>
+           <port id="0">
+               <dim>8</dim>
+               <dim>3</dim>
+           </port>
+           <port id="1">
+               <dim>50</dim>
+               <dim>84</dim>
+               <dim>3</dim>
+               <dim>4</dim>
+           </port>
+           <port id="2">
+               <dim>8</dim>
+               <dim>12</dim>
+               <dim>50</dim>
+               <dim>84</dim>
+           </port>
+           <port id="3">
+               <dim>8</dim>
+               <dim>3</dim>
+               <dim>50</dim>
+               <dim>84</dim>
+           </port>
+       </input>
+       <output>
+           <port id="4" precision="FP32">
+               <dim>-1</dim>
+               <dim>4</dim>
+           </port>
+           <port id="5" precision="FP32">
+               <dim>-1</dim>
+           </port>
+           <port id="6" precision="I32">
+               <dim>8</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
+
--- a/docs/ops/image/GridSample_9.md
+++ b/docs/ops/image/GridSample_9.md
@ -1,30 +1,34 @@
 # GridSample {#openvino_docs_ops_image_GridSample_9}

-**Versioned name**: *GridSample-9*
+@sphinxdirective

-**Category**: *Image processing*
+**Versioned name:** *GridSample-9*

-**Short description**: *GridSample* performs interpolated sampling of pixels from the input image using normalized, non-integer coordinates passed in as one of its inputs.
+**Category:** *Image processing*

-**Detailed description**: *GridSample* operates on a 4D input tensor representing an image. It calculates the output by selecting a location in the input image based on the values of the `grid` input. The latter contains a pair of float numbers for each output element that the operator is supposed to produce. Conceptually the operator behaves like *Gather* or *GatherND* but the difference is that the pixels to be selected are denoted by pairs of floats which belong to the range `[-1, 1]`. Those values have to be denormalized first (mapped to the integer coordinates of the input tensor) and then the output value is calculated according to the interpolation `mode`.
+**Short description:** *GridSample* performs interpolated sampling of pixels from the input image, using normalized, non-integer coordinates passed in as one of its inputs.
+
+**Detailed description:** *GridSample* operates on a 4D input tensor representing an image. It calculates the output by selecting a location in the input image based on the values of the ``grid`` input. The latter contains a pair of float numbers for each output element that the operator is supposed to produce. Conceptually the operator behaves like *Gather* or *GatherND* but the difference is that the pixels to be selected are denoted by pairs of floats which belong to the range ``[-1, 1]``. Those values have to be denormalized first (mapped to the integer coordinates of the input tensor) and then the output value is calculated according to the interpolation ``mode``.

 **Attributes**

 * *align_corners*

-  * **Description**: controls how the extrema values in the `grid` input tensor map to the border pixels of the input image. The value of -1 for both width and height can map either to the center of the border pixels or their left/top border. Similarily the value of 1 can also map to the center of the border pixels or their right/bottom border. Inherently this means that the `GridSample` operation treats pixels as squares rather than infinitely small points.
-  * **Range of values**:
-    * `false` - map extrema values to the center of pixels
-    * `true` - map extrema values to the borders of pixels
-  * **Type**: `boolean`
+  * **Description:** controls how the extrema values in the ``grid`` input tensor map to the border pixels of the input image. The value of -1 for both width and height can map either to the center of the border pixels or their left/top border. Similarly, the value of 1 can also map to the center of the border pixels or their right/bottom border. Inherently this means that the ``GridSample`` operation treats pixels as squares rather than infinitely small points.
+  * **Range of values:**
+
+    * ``false`` - map extrema values to the center of pixels
+    * ``true`` - map extrema values to the borders of pixels
+
+  * **Type**: ``boolean``
  * **Default value**: false
  * **Required**: *no*

 * *mode*

  * **Description**: specifies the interpolation type used to calculate the output elements
-  * **Range of values**: one of: `bilinear`, `bicubic` or `nearest`
-  * **Type**: `string`
+  * **Range of values**: one of: ``bilinear``, ``bicubic`` or ``nearest``
+  * **Type**: ``string``
  * **Default value**: bilinear
  * **Required**: *no*

@ -32,54 +36,61 @@

  * **Description**: controls the handling of out-of-bounds coordinates. The denormalized coordinates might fall outside of the input tensor's area(values outside the grid).
  * **Range of values**: 
-    * `zeros` - consider values in the padding to be zeros
-    * `border` - the operator is supposed to select the nearest in-bounds pixel
-    * `reflection` - repeatedly reflect the out-of bounds value until it points to a pixel that belongs to the image
-  * **Type**: `string`
+
+    * ``zeros`` - consider values in the padding to be zeros
+    * ``border`` - the operator is supposed to select the nearest in-bounds pixel
+    * ``reflection`` - repeatedly reflect the out-of bounds value until it points to a pixel that belongs to the image
+
+  * **Type**: ``string``
  * **Default value**: zeros
  * **Required**: *no*

 **Inputs**

-*   **1**: `data` - Input tensor of type `T` with data to be sampled. This input is expected to be a 4-dimensional tensor with NCHW layout. **Required.**
-
-*   **2**: `grid` - A 4-dimensional tensor containing normalized sampling coordinates(pairs of floats). The shape of this tensor is `[N, H_out, W_out, 2]` and the data type is `T1`. **Required.**
+* **1**: ``data`` - Input tensor of type ``T`` with data to be sampled. This input is expected to 
+  be a 4-dimensional tensor with NCHW layout. **Required.**
+* **2**: ``grid`` - A 4-dimensional tensor containing normalized sampling coordinates(pairs of floats). 
+  The shape of this tensor is ``[N, H_out, W_out, 2]`` and the data type is ``T1``. **Required.**

 **Outputs**

-*   **1**: A 4-dimensional tensor of type `T` with `[N, C, H_out, W_out]` shape. It contains the interpolated values calculated by this operator.
+* **1**: A 4-dimensional tensor of type ``T`` with ``[N, C, H_out, W_out]`` shape. 
+  It contains the interpolated values calculated by this operator.

 **Types**

-*   **T**: any type supported by OpenVINO.
-*   **T1**: any supported floating-point type.
+* **T**: any type supported by OpenVINO.
+* **T1**: any supported floating-point type.

 **Example**

-```xml
-<layer ... type="GridSample" ...>
-    <data align_corners="true" mode="nearest" padding_mode="border"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>100</dim>
-            <dim>100</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>10</dim>
-            <dim>10</dim>
-            <dim>2</dim>
-        </port>
-    </input>
-    <output>
-        <port id="0">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>10</dim>
-            <dim>10</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="GridSample" ...>
+       <data align_corners="true" mode="nearest" padding_mode="border"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>100</dim>
+               <dim>100</dim>
+           </port>
+           <port id="1">
+               <dim>1</dim>
+               <dim>10</dim>
+               <dim>10</dim>
+               <dim>2</dim>
+           </port>
+       </input>
+       <output>
+           <port id="0">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>10</dim>
+               <dim>10</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective
+
--- a/docs/ops/movement/GatherElements_6.md
+++ b/docs/ops/movement/GatherElements_6.md
@ -1,122 +1,138 @@
 # GatherElements {#openvino_docs_ops_movement_GatherElements_6}

+
+@sphinxdirective
+
 **Versioned name**: *GatherElements-6*

 **Category**: *Data movement*

-**Short description**: *GatherElements* takes elements from the input `data` tensor at positions specified in the `indices` tensor.
+**Short description**: *GatherElements* takes elements from the input ``data`` tensor at positions specified in the ``indices`` tensor.

-**Detailed description** *GatherElements* takes elements from the `data` tensor at positions specified in the `indices` tensor.
-The `data` and `indices` tensors have the same rank `r >= 1`. Optional attribute `axis` determines
-along which axis elements with indices specified in `indices` are taken. The `indices` tensor has the same shape as
-the `data` tensor except for the `axis` dimension. Output consists of values (gathered from the `data` tensor) for each
-element in the `indices` tensor and has the same shape as `indices`.
+**Detailed description** *GatherElements* takes elements from the ``data`` tensor at positions specified 
+in the ``indices`` tensor. The ``data`` and ``indices`` tensors have the same rank ``r >= 1``. Optional 
+attribute ``axis`` determines along which axis elements with indices specified in ``indices`` are taken. 
+The ``indices`` tensor has the same shape as the ``data`` tensor except for the ``axis`` dimension. 
+Output consists of values (gathered from the ``data`` tensor) for each element in the ``indices`` tensor
+and has the same shape as ``indices``.
+
+For instance, in the 3D case (``r = 3``), the output is determined by the following equations:
+
+.. code-block::
+
+   out[i][j][k] = data[indices[i][j][k]][j][k] if axis = 0
+   out[i][j][k] = data[i][indices[i][j][k]][k] if axis = 1
+   out[i][j][k] = data[i][j][indices[i][j][k]] if axis = 2

-For instance, in the 3D case (`r = 3`), the output is determined by the following equations:
-```
-  out[i][j][k] = data[indices[i][j][k]][j][k] if axis = 0
-  out[i][j][k] = data[i][indices[i][j][k]][k] if axis = 1
-  out[i][j][k] = data[i][j][indices[i][j][k]] if axis = 2
-```
 Example 1 with concrete values:
-```
-data = [
-    [1, 2],
-    [3, 4],
-]
-indices = [
-    [0, 1],
-    [0, 0],
-]
-axis = 0
-output = [
-    [1, 4],
-    [1, 2],
-]
-```
-Example 2 with `axis` = 1 and `indices` having greater (than `data`) shape:
-```
-data = [
-    [1, 7],
-    [4, 3],
-]
-indices = [
-    [1, 1, 0],
-    [1, 0, 1],
-]
-axis = 1
-output = [
-    [7, 7, 1],
-    [3, 4, 3],
-]
-```

-Example 3 `indices` has lesser (than `data`) shape:
-```
-data = [
-    [1, 2, 3],
-    [4, 5, 6],
-    [7, 8, 9],
-]
-indices = [
-    [1, 0, 1],
-    [1, 2, 0],
-]
-axis = 0
-output = [
-    [4, 2, 6],
-    [4, 8, 3],
-]
-```
+.. code-block::
+
+   data = [
+       [1, 2],
+       [3, 4],
+   ]
+   indices = [
+       [0, 1],
+       [0, 0],
+   ]
+   axis = 0
+   output = [
+       [1, 4],
+       [1, 2],
+   ]
+
+Example 2 with ``axis`` = 1 and ``indices`` having greater (than ``data``) shape:
+
+.. code-block::
+
+   data = [
+       [1, 7],
+       [4, 3],
+   ]
+   indices = [
+       [1, 1, 0],
+       [1, 0, 1],
+   ]
+   axis = 1
+   output = [
+       [7, 7, 1],
+       [3, 4, 3],
+   ]
+
+
+Example 3 ``indices`` has lesser (than ``data``) shape:
+
+.. code-block::
+
+   data = [
+       [1, 2, 3],
+       [4, 5, 6],
+       [7, 8, 9],
+   ]
+   indices = [
+       [1, 0, 1],
+       [1, 2, 0],
+   ]
+   axis = 0
+   output = [
+       [4, 2, 6],
+       [4, 8, 3],
+   ]
+

 **Attributes**:
+
 * *axis*
  * **Description**: Which axis to gather on. Negative value means counting dimensions from the back.
-  * **Range of values**: `[-r, r-1]` where `r = rank(data)`.
+  * **Range of values**: ``[-r, r-1]`` where ``r = rank(data)``.
  * **Type**: int
  * **Required**: *yes*


 **Inputs**:

-* **1**:  Tensor of type *T*. This is a tensor of a `rank >= 1`. **Required.**
-
-* **2**:  Tensor of type *T_IND* with the same rank as the input. All index values are expected to be within
- bounds `[0, s-1]`, where `s` is size along `axis` dimension of the `data` tensor. **Required.**
+* **1**:  Tensor of type *T*. This is a tensor of a ``rank >= 1``. **Required.**
+* **2**:  Tensor of type *T_IND* with the same rank as the input. All index values are expected to be 
+  within bounds ``[0, s-1]``, where ``s`` is size along ``axis`` dimension of the ``data`` tensor. **Required.**

 **Outputs**:

-*   **1**: Tensor with gathered values of type *T*. Tensor has the same shape as `indices`.
+* **1**: Tensor with gathered values of type *T*. Tensor has the same shape as ``indices``.

 **Types**

 * *T*: any supported type.
-
-* *T_IND*: `int32` or `int64`.
+* *T_IND*: ``int32`` or ``int64``.

 **Example**

-```xml
-<... type="GatherElements" ...>
-    <data axis="1" />
-    <input>
-        <port id="0">
-            <dim>3</dim>
-            <dim>7</dim>
-            <dim>5</dim>
-        </port>
-        <port id="1">
-            <dim>3</dim>
-            <dim>10</dim>
-            <dim>5</dim>
-        </port>
-    </input>
-    <output>
-        <port id="2">
-            <dim>3</dim>
-            <dim>10</dim>
-            <dim>5</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <... type="GatherElements" ...>
+       <data axis="1" />
+       <input>
+           <port id="0">
+               <dim>3</dim>
+               <dim>7</dim>
+               <dim>5</dim>
+           </port>
+           <port id="1">
+               <dim>3</dim>
+               <dim>10</dim>
+               <dim>5</dim>
+           </port>
+       </input>
+       <output>
+           <port id="2">
+               <dim>3</dim>
+               <dim>10</dim>
+               <dim>5</dim>
+           </port>
+       </output>
+   </layer>
+
+
+
+@endsphinxdirective
+
--- a/docs/ops/movement/GatherND_5.md
+++ b/docs/ops/movement/GatherND_5.md
@ -1,17 +1,21 @@
 # GatherND {#openvino_docs_ops_movement_GatherND_5}

+
+@sphinxdirective
+
 **Versioned name**: *GatherND-5*

 **Category**: *Data movement*

 **Short description**: *GatherND* gathers slices from input tensor into a tensor of a shape specified by indices.

-**Detailed description**: *GatherND* gathers slices from `data` by `indices` and forms a tensor of a shape specified by `indices`.
+**Detailed description**: *GatherND* gathers slices from ``data`` by ``indices`` and forms a tensor of a shape specified by ``indices``.

-`indices` is `K`-dimensional integer tensor or `K-1`-dimensional tensor of tuples with indices by which the operation gathers elements or slices
-from `data` tensor. A position `i_0, ..., i_{K-2}` in the `indices` tensor corresponds to a tuple with indices `indices[i_0, ..., i_{K-2}]`
-of a length equal to `indices.shape[-1]`. By this tuple with indices the operation gathers a slice or an element from `data` tensor and
-insert it into the output at position `i_0, ..., i_{K-2}` as the following formula:
+``indices`` is ``K``-dimensional integer tensor or ``K-1``-dimensional tensor of tuples with indices by which 
+the operation gathers elements or slices from ``data`` tensor. A position ``i_0, ..., i_{K-2}`` in the ``indices`` 
+tensor corresponds to a tuple with indices ``indices[i_0, ..., i_{K-2}]`` of a length equal to ``indices.shape[-1]``. 
+By this tuple with indices the operation gathers a slice or an element from ``data`` tensor and insert it into the 
+output at position ``i_0, ..., i_{K-2}`` as the following formula:

 output[i_0, ..., i_{K-2},:,...,:] = data[indices[i_0, ..., i_{K-2}],:,...,:]

@ -20,164 +24,176 @@ The shape of the output can be computed as `indices.shape[:-1] + data.shape[indi

 Example 1 shows how *GatherND* operates with elements from `data` tensor:

-```
-indices = [[0, 0],
-           [1, 0]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [1, 3]
-```
+.. code-block::

-Example 2 shows how *GatherND* operates with slices from `data` tensor:
+   indices = [[0, 0],
+              [1, 0]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [1, 3]
+
+
+Example 2 shows how *GatherND* operates with slices from ``data`` tensor:
+
+.. code-block::
+
+   indices = [[1], [0]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [[3, 4],
+              [1, 2]]

-```
-indices = [[1], [0]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [[3, 4],
-           [1, 2]]
-```

 Example 3 shows how *GatherND* operates when `indices` tensor has leading dimensions:

-```
-indices = [[[1]], [[0]]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [[[3, 4]],
-           [[1, 2]]]
-```
+.. code-block::
+
+   indices = [[[1]], [[0]]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [[[3, 4]],
+              [[1, 2]]]
+

 **Attributes**:

 * *batch_dims*

-  * **Description**: *batch_dims* (denoted as `b`) is a leading number of dimensions of `data` tensor 
-    and `indices` representing the batches, and *GatherND* starts to gather from the `b+1` dimension.
-    It requires the first `b` dimensions in `data` and `indices` tensors to be equal.
+  * **Description**: *batch_dims* (denoted as ``b``) is a leading number of dimensions of ``data`` tensor 
+    and ``indices`` representing the batches, and *GatherND* starts to gather from the ``b+1`` dimension.
+    It requires the first ``b`` dimensions in ``data`` and ``indices`` tensors to be equal.
    In case of non-default value for *batch_dims*, the output shape is calculated as
-`(multiplication of indices.shape[:b]) + indices.shape[b:-1] + data.shape[(indices.shape[-1] + b):]`.
+    ``(multiplication of indices.shape[:b]) + indices.shape[b:-1] + data.shape[(indices.shape[-1] + b):]``.
    
-    **NOTE:** The calculation of output shape is incorrect for non-default *batch_dims* value greater than one.
-    For correct calculations use [GatherND_8](GatherND_8.md) operation**
+    .. note::
        
-  * **Range of values**: integer number and belongs to `[0; min(data.rank, indices.rank))`
+       The calculation of output shape is incorrect for non-default *batch_dims* value greater than one.
+       For correct calculations use the :doc:`GatherND_8 <openvino_docs_ops_movement_Gather_8>` operation.
+
+  * **Range of values**: integer number and belongs to ``[0; min(data.rank, indices.rank))``
  * **Type**: int
  * **Default value**: 0
  * **Required**: *no*

 Example 4 shows how *GatherND* operates gathering elements for non-default *batch_dims* value:

-```
-batch_dims = 1
-indices = [[1],    <--- this is applied to the first batch
-           [0]]    <--- this is applied to the second batch, shape = (2, 1)
-data    = [[1, 2], <--- the first batch
-           [3, 4]] <--- the second batch, shape = (2, 2)
-output  = [2, 3], shape = (2)
-```
+.. code-block::
+
+   batch_dims = 1
+   indices = [[1],    <--- this is applied to the first batch
+              [0]]    <--- this is applied to the second batch, shape = (2, 1)
+   data    = [[1, 2], <--- the first batch
+              [3, 4]] <--- the second batch, shape = (2, 2)
+   output  = [2, 3], shape = (2)
+

 Example 5 shows how *GatherND* operates gathering slices for non-default *batch_dims* value:

-```
-batch_dims = 1
-indices = [[1], <--- this is applied to the first batch
-           [0]] <--- this is applied to the second batch, shape = (2, 1)
-data    = [[[1,   2,  3,  4], [ 5,  6,  7,  8], [ 9, 10, 11, 12]]  <--- the first batch
-           [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]] <--- the second batch, shape = (2, 3, 4)
-output  = [[ 5,  6,  7,  8], [13, 14, 15, 16]], shape = (2, 4)
-```
+.. code-block::

-More complex example 6 shows how *GatherND* operates gathering slices with leading dimensions for non-default *batch_dims* value:
+   batch_dims = 1
+   indices = [[1], <--- this is applied to the first batch
+              [0]] <--- this is applied to the second batch, shape = (2, 1)
+   data    = [[[1,   2,  3,  4], [ 5,  6,  7,  8], [ 9, 10, 11, 12]]  <--- the first batch
+              [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]] <--- the second batch, shape = (2, 3, 4)
+   output  = [[ 5,  6,  7,  8], [13, 14, 15, 16]], shape = (2, 4)
+
+
+More complex, example 6 shows how *GatherND* operates gathering slices with leading dimensions 
+for non-default *batch_dims* value:
+
+.. code-block::
+
+   batch_dims = 2
+   indices = [[[[1]], <--- this is applied to the first batch
+               [[0]],
+               [[2]]],
+              [[[0]],
+               [[2]],
+               [[2]]] <--- this is applied to the sixth batch
+             ], shape = (2, 3, 1, 1)
+   data    = [[[1,   2,  3,  4], <--- this is the first batch
+               [ 5,  6,  7,  8],
+               [ 9, 10, 11, 12]]
+              [[13, 14, 15, 16],
+               [17, 18, 19, 20],
+               [21, 22, 23, 24]] <--- this is the sixth batch
+             ] <--- the second batch, shape = (2, 3, 4)
+   output  = [[2], [5], [11], [13], [19], [23]], shape = (6, 1)

-```
-batch_dims = 2
-indices = [[[[1]], <--- this is applied to the first batch
-            [[0]],
-            [[2]]],
-           [[[0]],
-            [[2]],
-            [[2]]] <--- this is applied to the sixth batch
-          ], shape = (2, 3, 1, 1)
-data    = [[[1,   2,  3,  4], <--- this is the first batch
-            [ 5,  6,  7,  8],
-            [ 9, 10, 11, 12]]
-           [[13, 14, 15, 16],
-            [17, 18, 19, 20],
-            [21, 22, 23, 24]] <--- this is the sixth batch
-          ] <--- the second batch, shape = (2, 3, 4)
-output  = [[2], [5], [11], [13], [19], [23]], shape = (6, 1)
-```

 **Inputs**:

-* **1**:  `data` tensor of type *T*. This is a tensor of a rank not less than 1. **Required.**
-
-* **2**:  `indices` tensor of type *T_IND*. This is a tensor of a rank not less than 1.
-It requires that all indices from this tensor will be in a range `[0, s-1]` where `s` is corresponding dimension to which this index is applied.
-Required.
+* **1**: ``data`` tensor of type *T*. This is a tensor of a rank not lower than 1. **Required.**
+* **2**: ``indices`` tensor of type *T_IND*. This is a tensor of a rank not lower than 1.
+  It requires that all indices from this tensor are be in the range of ``[0, s-1]`` where ``s`` is 
+  corresponding dimension to which this index is applied. **Required**.

 **Outputs**:

-*   **1**: Tensor with gathered values of type *T*.
+* **1**: Tensor with gathered values of type *T*.

 **Types**

 * *T*: any supported type.
-
 * *T_IND*: any supported integer types.

 **Examples**

-```xml
-<layer id="1" type="GatherND">
-    <data batch_dims=0 />
-    <input>
-        <port id="0">
-            <dim>1000</dim>
-            <dim>256</dim>
-            <dim>10</dim>
-            <dim>15</dim>
-        </port>
-        <port id="1">
-            <dim>25</dim>
-            <dim>125</dim>
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>25</dim>
-            <dim>125</dim>
-            <dim>15</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer id="1" type="GatherND">
+       <data batch_dims=0 />
+       <input>
+           <port id="0">
+               <dim>1000</dim>
+               <dim>256</dim>
+               <dim>10</dim>
+               <dim>15</dim>
+           </port>
+           <port id="1">
+               <dim>25</dim>
+               <dim>125</dim>
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>25</dim>
+               <dim>125</dim>
+               <dim>15</dim>
+           </port>
+       </output>
+   </layer>
+
+
+.. code-block:: cpp
+
+   <layer id="1" type="GatherND">
+       <data batch_dims=2 />
+       <input>
+           <port id="0">
+               <dim>30</dim>
+               <dim>2</dim>
+               <dim>100</dim>
+               <dim>35</dim>
+           </port>
+           <port id="1">
+               <dim>30</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+               <dim>1</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>60</dim>
+               <dim>3</dim>
+               <dim>35</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective

-```xml
-<layer id="1" type="GatherND">
-    <data batch_dims=2 />
-    <input>
-        <port id="0">
-            <dim>30</dim>
-            <dim>2</dim>
-            <dim>100</dim>
-            <dim>35</dim>
-        </port>
-        <port id="1">
-            <dim>30</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-            <dim>1</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>60</dim>
-            <dim>3</dim>
-            <dim>35</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/movement/GatherND_8.md
+++ b/docs/ops/movement/GatherND_8.md
@ -1,36 +1,41 @@
 # GatherND {#openvino_docs_ops_movement_GatherND_8}

+
+@sphinxdirective
+
+
+
 **Versioned name**: *GatherND-8*

 **Category**: *Data movement*

 **Short description**: *GatherND* gathers slices from input tensor into a tensor of the shape specified by indices.

-**Detailed description**: *GatherND* gathers slices from `data` by `indices` and forms a tensor of the shape specified by `indices`.
+**Detailed description**: *GatherND* gathers slices from ``data`` by ``indices`` and forms a tensor of the shape specified by ``indices``.

-`indices` is `K`-dimensional integer tensor or `K-1`-dimensional tensor of tuples with indices by which the operation
-gathers elements or slices from `data` tensor. A position `i_0, ..., i_{K-2}` in the `indices` tensor corresponds to
-a tuple with indices `indices[i_0, ..., i_{K-2}]` of a length equal to `indices.shape[-1]`. By this tuple with indices
-the operation gathers a slice or an element from `data` tensor and inserts it into the output at the position
-`i_0, ..., i_{K-2}` as described in the following formula:
+``indices`` is ``K``-dimensional integer tensor or ``K-1``-dimensional tensor of tuples with indices by which the operation
+gathers elements or slices from ``data`` tensor. A position ``i_0, ..., i_{K-2}`` in the ``indices`` tensor corresponds to
+a tuple with indices ``indices[i_0, ..., i_{K-2}]`` of a length equal to ``indices.shape[-1]``. By this tuple with indices
+the operation gathers a slice or an element from ``data`` tensor and inserts it into the output at the position
+``i_0, ..., i_{K-2}`` as described in the following formula:

-`output[i_0, ..., i_{K-2},:,...,:] = data[indices[i_0, ..., i_{K-2}],:,...,:]`
+``output[i_0, ..., i_{K-2},:,...,:] = data[indices[i_0, ..., i_{K-2}],:,...,:]``

-The last dimension of `indices` tensor must be not greater than a rank of `data` tensor, meaning
-`indices.shape[-1] <= data.rank`.
+The last dimension of ``indices`` tensor must be not greater than a rank of ``data`` tensor, meaning
+``indices.shape[-1] <= data.rank``.

-The shape of the output is calculated as `indices.shape[:batch_dims] + indices.shape[batch_dims:-1]`
-if `indices.shape[-1] == data.rank - batch_dims` else
-`indices.shape[:batch_dims] + list(indices.shape)[batch_dims:-1] + list(data.shape)[batch_dims + indices.shape[-1]:]`.
+The shape of the output is calculated as ``indices.shape[:batch_dims] + indices.shape[batch_dims:-1]``
+if ``indices.shape[-1] == data.rank - batch_dims``, else
+``indices.shape[:batch_dims] + list(indices.shape)[batch_dims:-1] + list(data.shape)[batch_dims + indices.shape[-1]:]``.

 **Attributes**:

 * *batch_dims*

-  * **Description**: *batch_dims* (denoted as `b`) is a leading number of dimensions of `data` tensor and `indices`
-    representing the batches, and *GatherND* starts to gather from the `b+1` dimension. It requires the first `b`
-    dimensions in `data` and `indices` tensors to be equal.
-  * **Range of values**: integer number that belongs to `[0; min(data.rank, indices.rank))`
+  * **Description**: *batch_dims* (denoted as ``b``) is a leading number of dimensions of ``data`` tensor and ``indices``
+    representing the batches, and *GatherND* starts to gather from the ``b+1`` dimension. It requires the first ``b``
+    dimensions in ``data`` and ``indices`` tensors to be equal.
+  * **Range of values**: integer number that belongs to ``[0; min(data.rank, indices.rank))``
  * **Type**: int
  * **Default value**: 0
  * **Required**: *no*
@ -38,12 +43,10 @@ if `indices.shape[-1] == data.rank - batch_dims` else

 **Inputs**:

-* **1**: `data` tensor of type *T*. A tensor of a rank not less than 1. **Required.**
-
-* **2**: `indices` tensor of type *T_IND*. A tensor of a rank not less than 1.
-It requires all indices from this tensor to be in the range `[0, s-1]` where `s` is the corresponding dimension to 
-which this index is applied.
-**Required.**
+* **1**: ``data`` tensor of type *T*. A tensor of a rank not less than 1. **Required.**
+* **2**: ``indices`` tensor of type *T_IND*. A tensor of a rank not less than 1.
+  It requires all indices from this tensor to be in the range ``[0, s-1]`` where ``s`` is the corresponding dimension to 
+  which this index is applied. **Required.**


 **Outputs**:
@ -53,184 +56,196 @@ which this index is applied.
 **Types**

 * *T*: any supported type.
-
 * *T_IND*: any supported integer types.


-
 **Examples**

-Example 1 shows how *GatherND* operates with elements from `data` tensor:
+Example 1 shows how *GatherND* operates with elements from ``data`` tensor:

-```
-indices = [[0, 0],
-           [1, 0]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [1, 3]
-```
+.. code-block::

-Example 2 shows how *GatherND* operates with slices from `data` tensor:
+   indices = [[0, 0],
+              [1, 0]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [1, 3]

-```
-indices = [[1], [0]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [[3, 4],
-           [1, 2]]
-```

-Example 3 shows how *GatherND* operates when `indices` tensor has leading dimensions:
+Example 2 shows how *GatherND* operates with slices from ``data`` tensor:
+
+.. code-block::
+
+   indices = [[1], [0]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [[3, 4],
+              [1, 2]]
+
+
+Example 3 shows how *GatherND* operates when ``indices`` tensor has leading dimensions:
+
+.. code-block::
+
+   indices = [[[1]], [[0]]]
+   data    = [[1, 2],
+              [3, 4]]
+   output  = [[[3, 4]],
+              [[1, 2]]]

-```
-indices = [[[1]], [[0]]]
-data    = [[1, 2],
-           [3, 4]]
-output  = [[[3, 4]],
-           [[1, 2]]]
-```

 Example 4 shows how *GatherND* operates gathering elements for non-default *batch_dims* value:

-```
-batch_dims = 1
-indices = [[1],    <--- this is applied to the first batch
-           [0]]    <--- this is applied to the second batch, shape = (2, 1)
-data    = [[1, 2], <--- the first batch
-           [3, 4]] <--- the second batch, shape = (2, 2)
-output  = [2, 3], shape = (2)
-```
+.. code-block::
+
+   batch_dims = 1
+   indices = [[1],    <--- this is applied to the first batch
+              [0]]    <--- this is applied to the second batch, shape = (2, 1)
+   data    = [[1, 2], <--- the first batch
+              [3, 4]] <--- the second batch, shape = (2, 2)
+   output  = [2, 3], shape = (2)
+

 Example 5 shows how *GatherND* operates gathering slices for non-default *batch_dims* value:

-```
-batch_dims = 1
-indices = [[1], <--- this is applied to the first batch
-           [0]] <--- this is applied to the second batch, shape = (2, 1)
-data    = [[[1,   2,  3,  4], [ 5,  6,  7,  8], [ 9, 10, 11, 12]]  <--- the first batch
-           [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]] <--- the second batch, shape = (2, 3, 4)
-output  = [[ 5,  6,  7,  8], [13, 14, 15, 16]], shape = (2, 4)
-```
+.. code-block::

-More complex examples 6 and 7 show how *GatherND* operates gathering slices with leading dimensions for non-default
-*batch_dims* value:
+   batch_dims = 1
+   indices = [[1], <--- this is applied to the first batch
+              [0]] <--- this is applied to the second batch, shape = (2, 1)
+   data    = [[[1,   2,  3,  4], [ 5,  6,  7,  8], [ 9, 10, 11, 12]]  <--- the first batch
+              [[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]] <--- the second batch, shape = (2, 3, 4)
+   output  = [[ 5,  6,  7,  8], [13, 14, 15, 16]], shape = (2, 4)

-```
-batch_dims = 2
-indices = [[[[1]], <--- this is applied to the first batch
-            [[0]],
-            [[2]]],
-           [[[0]],
-            [[2]],
-            [[2]]] <--- this is applied to the sixth batch
-          ], shape = (2, 3, 1, 1)
-data    = [[[ 1,  2,  3,  4], <--- this is the first batch
-            [ 5,  6,  7,  8],
-            [ 9, 10, 11, 12]]
-           [[13, 14, 15, 16],
-            [17, 18, 19, 20],
-            [21, 22, 23, 24]] <--- this is the sixth batch
-          ] <--- the second batch, shape = (2, 3, 4)
-output  = [[[ 2], [ 5], [11]], [[13], [19], [23]]], shape = (2, 3, 1)

-```
+More complex examples 6 and 7 show how *GatherND* operates gathering slices with leading dimensions 
+for non-default *batch_dims* value:

-```
-batch_dims = 3
-indices = [[[[1],
-             [0]],
-            [[3],
-             [2]]]
-            ], shape = (1, 2, 2, 1)
-data    = [[[[ 1  2  3  4],
-             [ 5  6  7  8]],
-            [[ 9 10 11 12],
-             [13 14 15 16]]]
-          ], shape = (1, 2, 2, 4)
-output  = [[[ 2  5],
-            [12 15]]
-          ], shape = (1, 2, 2)
-```
+.. code-block::

-```xml
-<layer id="1" type="GatherND" version="opset8">
-    <data batch_dims="0" />
-    <input>
-        <port id="0">
-            <dim>1000</dim>
-            <dim>256</dim>
-            <dim>10</dim>
-            <dim>15</dim>
-        </port>
-        <port id="1">
-            <dim>25</dim>
-            <dim>125</dim>
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>25</dim>
-            <dim>125</dim>
-            <dim>15</dim>
-        </port>
-    </output>
-</layer>
-```
+   batch_dims = 2
+   indices = [[[[1]], <--- this is applied to the first batch
+               [[0]],
+               [[2]]],
+              [[[0]],
+               [[2]],
+               [[2]]] <--- this is applied to the sixth batch
+             ], shape = (2, 3, 1, 1)
+   data    = [[[ 1,  2,  3,  4], <--- this is the first batch
+               [ 5,  6,  7,  8],
+               [ 9, 10, 11, 12]]
+              [[13, 14, 15, 16],
+               [17, 18, 19, 20],
+               [21, 22, 23, 24]] <--- this is the sixth batch
+             ] <--- the second batch, shape = (2, 3, 4)
+   output  = [[[ 2], [ 5], [11]], [[13], [19], [23]]], shape = (2, 3, 1)
+   
+
+
+.. code-block::
+
+   batch_dims = 3
+   indices = [[[[1],
+                [0]],
+               [[3],
+                [2]]]
+               ], shape = (1, 2, 2, 1)
+   data    = [[[[ 1  2  3  4],
+                [ 5  6  7  8]],
+               [[ 9 10 11 12],
+                [13 14 15 16]]]
+             ], shape = (1, 2, 2, 4)
+   output  = [[[ 2  5],
+               [12 15]]
+             ], shape = (1, 2, 2)
+
+
+.. code-block:: cpp
+
+   <layer id="1" type="GatherND" version="opset8">
+       <data batch_dims="0" />
+       <input>
+           <port id="0">
+               <dim>1000</dim>
+               <dim>256</dim>
+               <dim>10</dim>
+               <dim>15</dim>
+           </port>
+           <port id="1">
+               <dim>25</dim>
+               <dim>125</dim>
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>25</dim>
+               <dim>125</dim>
+               <dim>15</dim>
+           </port>
+       </output>
+   </layer>
+
+
+.. code-block:: cpp
+
+   <layer id="1" type="GatherND" version="opset8">
+       <data batch_dims="2" />
+       <input>
+           <port id="0">
+               <dim>30</dim>
+               <dim>2</dim>
+               <dim>100</dim>
+               <dim>35</dim>
+           </port>
+           <port id="1">
+               <dim>30</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+               <dim>1</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>30</dim>
+               <dim>2</dim>
+               <dim>3</dim>
+               <dim>35</dim>
+           </port>
+       </output>
+   </layer>
+
+
+.. code-block:: cpp
+
+   <layer id="1" type="GatherND" version="opset8">
+       <data batch_dims="3" />
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>64</dim>
+               <dim>320</dim>
+           </port>
+           <port id="1">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>64</dim>
+               <dim>1</dim>          
+               <dim>1</dim>
+           </port>
+       </input>
+       <output>
+           <port id="3">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>64</dim>
+               <dim>1</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective

-```xml
-<layer id="1" type="GatherND" version="opset8">
-    <data batch_dims="2" />
-    <input>
-        <port id="0">
-            <dim>30</dim>
-            <dim>2</dim>
-            <dim>100</dim>
-            <dim>35</dim>
-        </port>
-        <port id="1">
-            <dim>30</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-            <dim>1</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>30</dim>
-            <dim>2</dim>
-            <dim>3</dim>
-            <dim>35</dim>
-        </port>
-    </output>
-</layer>
-```

-```xml
-<layer id="1" type="GatherND" version="opset8">
-    <data batch_dims="3" />
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>64</dim>
-            <dim>320</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>64</dim>
-            <dim>1</dim>          
-            <dim>1</dim>
-        </port>
-    </input>
-    <output>
-        <port id="3">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>64</dim>
-            <dim>1</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/movement/GatherTree_1.md
+++ b/docs/ops/movement/GatherTree_1.md
@ -1,5 +1,7 @@
 # GatherTree {#openvino_docs_ops_movement_GatherTree_1}

+@sphinxdirective
+
 **Versioned name**: *GatherTree-1*

 **Category**: *Data movement*
@ -8,50 +10,57 @@

 **Detailed description**

-*GatherTree* operation reorders token IDs of a given input tensor `step_id` representing IDs per each step of beam search, based on input tensor `parent_ids` representing the parent beam IDs. For a given beam, past the time step containing the first decoded `end_token` all values are filled in with end_token.
+*GatherTree* operation reorders token IDs of a given input tensor ``step_id`` representing IDs per each step of beam search, 
+based on input tensor ``parent_ids`` representing the parent beam IDs. For a given beam, past the time step containing the 
+first decoded ``end_token`` all values are filled in with ``end_token``.

 The algorithm in pseudocode is as follows:

-```python
-final_ids[ :, :, :] = end_token
-for batch in range(BATCH_SIZE):
-    for beam in range(BEAM_WIDTH):
-        max_sequence_in_beam = min(MAX_TIME, max_seq_len[batch])
+.. code-block:: python

-        parent = parent_ids[max_sequence_in_beam - 1, batch, beam]
+   final_ids[ :, :, :] = end_token
+   for batch in range(BATCH_SIZE):
+       for beam in range(BEAM_WIDTH):
+           max_sequence_in_beam = min(MAX_TIME, max_seq_len[batch])
   
-        final_ids[max_sequence_in_beam - 1, batch, beam] = step_ids[max_sequence_in_beam - 1, batch, beam]
+           parent = parent_ids[max_sequence_in_beam - 1, batch, beam]
   
-        for level in reversed(range(max_sequence_in_beam - 1)):
-            final_ids[level, batch, beam] = step_ids[level, batch, parent]
+           final_ids[max_sequence_in_beam - 1, batch, beam] = step_ids[max_sequence_in_beam - 1, batch, beam]
   
-            parent = parent_ids[level, batch, parent]
+           for level in reversed(range(max_sequence_in_beam - 1)):
+               final_ids[level, batch, beam] = step_ids[level, batch, parent]
   
-        # For a given beam, past the time step containing the first decoded end_token
-        # all values are filled in with end_token.
-        finished = False
-        for time in range(max_sequence_in_beam):
-            if(finished):
-                final_ids[time, batch, beam] = end_token
-            elif(final_ids[time, batch, beam] == end_token):
-                finished = True
-```
+               parent = parent_ids[level, batch, parent]
   
-*GatherTree* operation is equivalent to [GatherTree operation in TensorFlow](https://www.tensorflow.org/addons/api_docs/python/tfa/seq2seq/gather_tree).
+           # For a given beam, past the time step containing the first decoded end_token
+           # all values are filled in with end_token.
+           finished = False
+           for time in range(max_sequence_in_beam):
+               if(finished):
+                   final_ids[time, batch, beam] = end_token
+               elif(final_ids[time, batch, beam] == end_token):
+                   finished = True
+
+*GatherTree* operation is equivalent to `GatherTree operation in TensorFlow <https://www.tensorflow.org/addons/api_docs/python/tfa/seq2seq/gather_tree>`__.

 **Attributes**: *GatherTree* operation has no attributes.

 **Inputs**

-* **1**:  `step_ids` - Indices per each step. A tensor of type *T* and rank 3. Layout is `[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]`. **Required.**
-* **2**:  `parent_ids` - Parent beam indices. A tensor of type *T* and rank 3. Layout is `[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]`. **Required.**
-* **3**:  `max_seq_len` - Maximum lengths for each sequence in the batch. A tensor of type *T* and rank 1. Layout is `[BATCH_SIZE]`. **Required.**
-* **4**:  `end_token` - Value of the end marker in a sequence. A scalar of type *T*. **Required.**
+* **1**:  ``step_ids`` - Indices per each step. A tensor of type *T* and rank 3. 
+  Layout is ``[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]``. **Required.**
+* **2**:  ``parent_ids`` - Parent beam indices. A tensor of type *T* and rank 3. 
+  Layout is ``[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]``. **Required.**
+* **3**:  ``max_seq_len`` - Maximum lengths for each sequence in the batch. 
+  A tensor of type *T* and rank 1. Layout is ``[BATCH_SIZE]``. **Required.**
+* **4**:  ``end_token`` - Value of the end marker in a sequence. 
+  A scalar of type *T*. **Required.**
 * **Note**: Inputs should have integer values only.

 **Outputs**

-* **1**: `final_ids` - The reordered token IDs based on `parent_ids` input. A tensor of type *T* and rank 3. Layout is `[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]`.
+* **1**: ``final_ids`` - The reordered token IDs based on ``parent_ids`` input. 
+  A tensor of type *T* and rank 3. Layout is ``[MAX_TIME, BATCH_SIZE, BEAM_WIDTH]``.

 **Types**

@ -59,31 +68,35 @@ for batch in range(BATCH_SIZE):

 **Example**

-```xml
-<layer type="GatherTree" ...>
-    <input>
-        <port id="0">
-            <dim>100</dim>
-            <dim>1</dim>
-            <dim>10</dim>
-        </port>
-        <port id="1">
-            <dim>100</dim>
-            <dim>1</dim>
-            <dim>10</dim>
-        </port>
-        <port id="2">
-            <dim>1</dim>
-        </port>
-        <port id="3">
-        </port>
-    </input>
-    <output>
-        <port id="0">
-            <dim>100</dim>
-            <dim>1</dim>
-            <dim>10</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer type="GatherTree" ...>
+       <input>
+           <port id="0">
+               <dim>100</dim>
+               <dim>1</dim>
+               <dim>10</dim>
+           </port>
+           <port id="1">
+               <dim>100</dim>
+               <dim>1</dim>
+               <dim>10</dim>
+           </port>
+           <port id="2">
+               <dim>1</dim>
+           </port>
+           <port id="3">
+           </port>
+       </input>
+       <output>
+           <port id="0">
+               <dim>100</dim>
+               <dim>1</dim>
+               <dim>10</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/movement/Gather_1.md
+++ b/docs/ops/movement/Gather_1.md
@ -1,61 +1,72 @@
 # Gather {#openvino_docs_ops_movement_Gather_1}

-**Versioned name**: *Gather-1*
+@sphinxdirective

-**Category**: *Data movement*
+**Versioned name:** *Gather-1*

-**Short description**: *Gather* operation takes slices of data in the first input tensor according to the indices specified in the second input tensor and axis from the third input.
+**Category:** *Data movement*
+
+**Short description:** *Gather* operation takes slices of data in the first input tensor according 
+to the indices specified in the second input tensor and axis from the third input.

 **Detailed description**

-    output[p_0, p_1, ..., p_{axis-1}, i, ..., j, ...] =
-       input1[p_0, p_1, ..., p_{axis-1}, input2[i, ..., j], ...]
+.. code-block::

-Where `axis` is the value from the third input.
+    output[p_0, p_1, ..., p_{axis-1}, i, ..., j, ...] =
+      input1[p_0, p_1, ..., p_{axis-1}, input2[i, ..., j], ...]
+
+Where ``axis`` is the value from the third input.

 **Attributes**: *Gather* has no attributes

 **Inputs**

 * **1**:  Tensor with arbitrary data. **Required.**
-
-* **2**:  Tensor with indices to gather. The values for indices are in the range `[0, input1[axis] - 1]`. **Required.**
-
-* **3**:  Scalar or 1D tensor *axis* is a dimension index to gather data from. For example, *axis* equal to 1 means that gathering is performed over the first dimension. Negative value means reverse indexing. Allowed values are from `[-len(input1.shape), len(input1.shape) - 1]`. **Required.**
+* **2**:  Tensor with indices to gather. The values for indices are in the range ``[0, input1[axis] - 1]``. **Required.**
+* **3**:  Scalar or 1D tensor *axis* is a dimension index to gather data from. For example, *axis* equal 
+  to 1 means that gathering is performed over the first dimension. Negative value means reverse indexing. 
+  Allowed values are from ``[-len(input1.shape), len(input1.shape) - 1]``. **Required.**

 **Outputs**

-* **1**: The resulting tensor that consists of elements from the first input tensor gathered by indices from the second input tensor. Shape of the tensor is `[input1.shape[:axis], input2.shape, input1.shape[axis + 1:]]`
+* **1**: The resulting tensor that consists of elements from the first input tensor gathered by indices 
+  from the second input tensor. Shape of the tensor is ``[input1.shape[:axis], input2.shape, input1.shape[axis + 1:]]``

 **Example**

-```xml
-<layer id="1" type="Gather" ...>
-    <input>
-        <port id="0">
-            <dim>6</dim>
-            <dim>12</dim>
-            <dim>10</dim>
-            <dim>24</dim>
-        </port>
-        <port id="1">
-            <dim>15</dim>
-            <dim>4</dim>
-            <dim>20</dim>
-            <dim>28</dim>
-        </port>
-        <port id="2"/>   <!--  axis = 1  -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>6</dim>      <!-- embedded dimension from the 1st input -->
-            <dim>15</dim>     <!-- embedded dimension from the 2nd input -->
-            <dim>4</dim>      <!-- embedded dimension from the 2nd input -->
-            <dim>20</dim>     <!-- embedded dimension from the 2nd input -->
-            <dim>28</dim>     <!-- embedded dimension from the 2nd input -->
-            <dim>10</dim>     <!-- embedded dimension from the 1st input -->
-            <dim>24</dim>     <!-- embedded dimension from the 1st input -->
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer id="1" type="Gather" ...>
+       <input>
+           <port id="0">
+               <dim>6</dim>
+               <dim>12</dim>
+               <dim>10</dim>
+               <dim>24</dim>
+           </port>
+           <port id="1">
+               <dim>15</dim>
+               <dim>4</dim>
+               <dim>20</dim>
+               <dim>28</dim>
+           </port>
+           <port id="2"/>        < !--  axis = 1  -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>6</dim>      < !-- embedded dimension from the 1st input -->
+               <dim>15</dim>     < !-- embedded dimension from the 2nd input -->
+               <dim>4</dim>      < !-- embedded dimension from the 2nd input -->
+               <dim>20</dim>     < !-- embedded dimension from the 2nd input -->
+               <dim>28</dim>     < !-- embedded dimension from the 2nd input -->
+               <dim>10</dim>     < !-- embedded dimension from the 1st input -->
+               <dim>24</dim>     < !-- embedded dimension from the 1st input -->
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
+
--- a/docs/ops/movement/Gather_7.md
+++ b/docs/ops/movement/Gather_7.md
@ -1,189 +1,204 @@
 # Gather {#openvino_docs_ops_movement_Gather_7}

+@sphinxdirective
+
 **Versioned name**: *Gather-7*

 **Category**: *Data movement*

 **Short description**: *Gather* operation takes slices of data of the first input tensor according to the indices
 specified with the second input tensor and axis from the third input. Semantics of this operation is identical to
-TensorFlow\* [Gather](https://www.tensorflow.org/api_docs/python/tf/gather) operation.
+TensorFlow `Gather <https://www.tensorflow.org/api_docs/python/tf/gather>`__ operation.

 **Detailed description**

+.. code-block::
+
    output[p_0, p_1, ..., p_{axis-1}, i_b, ..., i_{M-1}, p_{axis+1}, ..., p_{N-1}] =
       data[p_0, p_1, ..., p_{axis-1}, indices[p_0, p_1, ..., p_{b-1}, i_b, ..., i_{M-1}], p_{axis+1}, ..., p_{N-1}]

-Where `data`, `indices` and `axis` are tensors from first, second and third inputs correspondingly, `b` is
-the number of batch dimensions. `N` and `M` are numbers of dimensions of `data` and `indices` tensors, respectively.
+Where ``data``, ``indices`` and ``axis`` are tensors from first, second and third inputs correspondingly, ``b`` is
+the number of batch dimensions. ``N`` and ``M`` are numbers of dimensions of ``data`` and ``indices`` tensors, respectively.

 **Attributes**:
+
 * *batch_dims*
-  * **Description**: *batch_dims* (also denoted as `b`) is a leading number of dimensions of `data` tensor and `indices`
-  representing the batches, and *Gather* starts to gather from the `b` dimension. It requires the first `b`
-  dimensions in `data` and `indices` tensors to be equal. If `batch_dims` is less than zero, normalized value is used
-  `batch_dims = indices.rank + batch_dims`.
-  * **Range of values**: `[-min(data.rank, indices.rank); min(data.rank, indices.rank)]` and `batch_dims' <= axis'`.
-  Where `batch_dims'` and `axis'` stand for normalized `batch_dims` and `axis` values.
+  
+  * **Description**: *batch_dims* (also denoted as ``b``) is a leading number of dimensions of ``data`` 
+    tensor and ``indices`` representing the batches, and *Gather* starts to gather from the ``b`` 
+    dimension. It requires the first ``b`` dimensions in `data` and `indices` tensors to be equal. 
+    If ``batch_dims`` is less than zero, the normalized value is used ``batch_dims = indices.rank + batch_dims``.
+  * **Range of values**: ``[-min(data.rank, indices.rank); min(data.rank, indices.rank)]`` and 
+    ``batch_dims' <= axis'``. Where ``batch_dims'`` and ``axis'`` stand for normalized ``batch_dims`` and ``axis`` values.
  * **Type**: *T_AXIS*
  * **Default value**: 0
  * **Required**: *no*

 Example 1 with default *batch_dims* value:
-```
-batch_dims = 0
-axis = 0

-indices = [0, 0, 4]
-data    = [1, 2, 3, 4, 5]
-output  = [1, 1, 5]
-```
+.. code-block::
+
+   batch_dims = 0
+   axis = 0
+   
+   indices = [0, 0, 4]
+   data    = [1, 2, 3, 4, 5]
+   output  = [1, 1, 5]
+

 Example 2 with non-default *batch_dims* value:
-```
-batch_dims = 1
-axis = 1

-indices = [[0, 0, 4], <-- this is applied to the first batch
-           [4, 0, 0]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data    = [[1, 2, 3, 4, 5],  <-- the first batch
-           [6, 7, 8, 9, 10]]  <-- the second batch
-data_shape = (2, 5)
+   batch_dims = 1
+   axis = 1
+   
+   indices = [[0, 0, 4], <-- this is applied to the first batch
+              [4, 0, 0]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
+   
+   data    = [[1, 2, 3, 4, 5],  <-- the first batch
+              [6, 7, 8, 9, 10]]  <-- the second batch
+   data_shape = (2, 5)
+   
+   output  = [[ 1, 1, 5],
+              [10, 6, 6]]
+   output_shape = (2, 3)

-output  = [[ 1, 1, 5],
-           [10, 6, 6]]
-output_shape = (2, 3)
-```

 Example 3 with non-default *batch_dims* value:
-```
-batch_dims = 2
-axis = 2

-indices = [[[0, 0, 4],  <-- this is applied to the first batch, index = (0, 0)
-            [4, 0, 0]],  <-- this is applied to the second batch, index = (0, 1)
+.. code-block::

-           [[1, 2, 4],  <-- this is applied to the third batch, index = (1, 0)
-            [4, 3, 2]]]  <-- this is applied to the fourth batch, index = (1, 1)
-indices_shape = (2, 2, 3)
+   batch_dims = 2
+   axis = 2
   
-data    = [[[1, 2, 3, 4, 5],  <-- the first batch, index = (0, 0)
-            [6, 7, 8, 9, 10]],  <-- the second batch, index = (0, 1)
+   indices = [[[0, 0, 4],  <-- this is applied to the first batch, index = (0, 0)
+               [4, 0, 0]],  <-- this is applied to the second batch, index = (0, 1)
   
-           [[11, 12, 13, 14, 15],  <-- the third batch, index = (1, 0)
-            [16, 17, 18, 19, 20]]]  <-- the fourth batch, index = (1, 1)
-data_shape = (2, 2, 5)
+              [[1, 2, 4],  <-- this is applied to the third batch, index = (1, 0)
+               [4, 3, 2]]]  <-- this is applied to the fourth batch, index = (1, 1)
+   indices_shape = (2, 2, 3)
   
-output  = [[[ 1, 1, 5],
-            [10, 6, 6]],
+   data    = [[[1, 2, 3, 4, 5],  <-- the first batch, index = (0, 0)
+               [6, 7, 8, 9, 10]],  <-- the second batch, index = (0, 1)
+   
+              [[11, 12, 13, 14, 15],  <-- the third batch, index = (1, 0)
+               [16, 17, 18, 19, 20]]]  <-- the fourth batch, index = (1, 1)
+   data_shape = (2, 2, 5)
+   
+   output  = [[[ 1, 1, 5],
+               [10, 6, 6]],
+   
+              [[12, 13, 15],
+               [20, 19, 18]]]
+   output_shape = (2, 2, 3)

-           [[12, 13, 15],
-            [20, 19, 18]]]
-output_shape = (2, 2, 3)
-```
 Example 4 with *axis* > *batch_dims*:
-```
-batch_dims = 1
-axis = 2

-indices = [[1, 2, 4],  <-- this is applied to the first batch
-           [4, 3, 2]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data = [[[[ 1,  2,  3,  4], <-- first batch
-          [ 5,  6,  7,  8],
-          [ 9, 10, 11, 12],
-          [13, 14, 15, 16],
-          [17, 18, 19, 20]]],
+   batch_dims = 1
+   axis = 2
   
-        [[[21, 22, 23, 24], <-- second batch
-          [25, 26, 27, 28],
-          [29, 30, 31, 32],
-          [33, 34, 35, 36],
-          [37, 38, 39, 40]]]]
-data_shape = (2, 1, 5, 4)
+   indices = [[1, 2, 4],  <-- this is applied to the first batch
+              [4, 3, 2]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
   
-output = [[[[ 5,  6,  7,  8],
-            [ 9, 10, 11, 12],
-            [17, 18, 19, 20]]],
+   data = [[[[ 1,  2,  3,  4], <-- first batch
+             [ 5,  6,  7,  8],
+             [ 9, 10, 11, 12],
+             [13, 14, 15, 16],
+             [17, 18, 19, 20]]],
+   
+           [[[21, 22, 23, 24], <-- second batch
+             [25, 26, 27, 28],
+             [29, 30, 31, 32],
+             [33, 34, 35, 36],
+             [37, 38, 39, 40]]]]
+   data_shape = (2, 1, 5, 4)
+   
+   output = [[[[ 5,  6,  7,  8],
+               [ 9, 10, 11, 12],
+               [17, 18, 19, 20]]],
+   
+             [[[37, 38, 39, 40],
+               [33, 34, 35, 36],
+               [29, 30, 31, 32]]]]
+   output_shape = (2, 1, 3, 4)

-          [[[37, 38, 39, 40],
-            [33, 34, 35, 36],
-            [29, 30, 31, 32]]]]
-output_shape = (2, 1, 3, 4)
-```

 Example 5 with negative *batch_dims* value:
-```
-batch_dims = -1  <-- normalized value will be indices.rank + batch_dims = 2 - 1 = 1
-axis = 1

-indices = [[0, 0, 4], <-- this is applied to the first batch
-           [4, 0, 0]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data    = [[1, 2, 3, 4, 5],  <-- the first batch
-           [6, 7, 8, 9, 10]]  <-- the second batch
-data_shape = (2, 5)
+   batch_dims = -1  <-- normalized value will be indices.rank + batch_dims = 2 - 1 = 1
+   axis = 1
+   
+   indices = [[0, 0, 4], <-- this is applied to the first batch
+              [4, 0, 0]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
+   
+   data    = [[1, 2, 3, 4, 5],  <-- the first batch
+              [6, 7, 8, 9, 10]]  <-- the second batch
+   data_shape = (2, 5)
+   
+   output  = [[ 1, 1, 5],
+              [10, 6, 6]]
+   output_shape = (2, 3)

-output  = [[ 1, 1, 5],
-           [10, 6, 6]]
-output_shape = (2, 3)
-```

 **Inputs**

-* **1**:  `data` tensor of type *T* with arbitrary data. **Required.**
-
-* **2**:  `indices` tensor of type *T_IND* with indices to gather. 0D tensor (scalar) for indices is also allowed.
-  The values for indices are in the range `[0, data[axis] - 1]`.
+* **1**:  ``data`` tensor of type *T* with arbitrary data. **Required.**
+* **2**:  ``indices`` tensor of type *T_IND* with indices to gather. 0D tensor (scalar) for indices is also allowed.
+  The values for indices are in the range ``[0, data[axis] - 1]``. **Required.**
+* **3**:  Scalar or 1D tensor ``axis`` of *T_AXIS* type is a dimension index to gather data from. For example,
+  *axis* equal to 1 means that gathering is performed over the first dimension. Negative ``axis`` means reverse indexing and
+  will be normalized to value ``axis = data.rank + axis``. Allowed values are from ``[-len(data.shape), len(data.shape) - 1]``
+  and ``axis' >= batch_dims'``. Where ``axis'`` and ``batch_dims'`` stand for normalized ``batch_dims`` and ``axis`` values.
  **Required.**

-* **3**:  Scalar or 1D tensor `axis` of *T_AXIS* type is a dimension index to gather data from. For example,
-*axis* equal to 1 means that gathering is performed over the first dimension. Negative `axis` means reverse indexing and
-  will be normalized to value `axis = data.rank + axis`. Allowed values are from `[-len(data.shape), len(data.shape) - 1]`
-  and `axis' >= batch_dims'`. Where `axis'` and `batch_dims'` stand for normalized `batch_dims` and `axis` values.
-**Required.**
-
 **Outputs**

-* **1**: The resulting tensor of type *T* that consists of elements from `data` tensor gathered by `indices`. The shape
-of the output tensor is `data.shape[:axis] + indices.shape[batch_dims:] + data.shape[axis + 1:]`
+* **1**: The resulting tensor of type *T* that consists of elements from ``data`` tensor gathered by ``indices``. 
+  The shape of the output tensor is ``data.shape[:axis] + indices.shape[batch_dims:] + data.shape[axis + 1:]``

 **Types**

 * *T*: any supported type.
-
 * *T_IND*: any supported integer types.
-
 * *T_AXIS*: any supported integer types.

 **Example**

-```xml
-<layer ... type="Gather" version="opset7">
-    <data batch_dims="1" />
-    <input>
-        <port id="0">
-            <dim>2</dim>
-            <dim>64</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">
-            <dim>2</dim>
-            <dim>32</dim>
-            <dim>21</dim>
-        </port>
-        <port id="2"/>   <!--  axis = 1  -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>2</dim>
-            <dim>32</dim>
-            <dim>21</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="Gather" version="opset7">
+       <data batch_dims="1" />
+       <input>
+           <port id="0">
+               <dim>2</dim>
+               <dim>64</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">
+               <dim>2</dim>
+               <dim>32</dim>
+               <dim>21</dim>
+           </port>
+           <port id="2"/>   < !--  axis = 1  -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>2</dim>
+               <dim>32</dim>
+               <dim>21</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/movement/Gather_8.md
+++ b/docs/ops/movement/Gather_8.md
@ -1,213 +1,231 @@
 ## Gather {#openvino_docs_ops_movement_Gather_8}

+
+@sphinxdirective
+
 **Versioned name**: *Gather-8*

 **Category**: *Data movement*

 **Short description**: *Gather* operation takes slices of data of the first input tensor according to the indices
- specified with the second input tensor and axis from the third input. Semantics of this operation is identical to
-TensorFlow\* [Gather](https://www.tensorflow.org/api_docs/python/tf/gather) operation but also includes
+specified with the second input tensor and axis from the third input. Semantics of this operation is identical to
+TensorFlow `Gather <https://www.tensorflow.org/api_docs/python/tf/gather>`__ operation but also includes
 support of negative indices.

 **Detailed description**

-    output[p_0, p_1, ..., p_{axis-1}, i_b, ..., i_{M-1}, p_{axis+1}, ..., p_{N-1}] =
-       data[p_0, p_1, ..., p_{axis-1}, indices[p_0, p_1, ..., p_{b-1}, i_b, ..., i_{M-1}], p_{axis+1}, ..., p_{N-1}]
+.. code-block::

-Where `data`, `indices` and `axis` are tensors from first, second and third inputs correspondingly, `b` is
-the number of batch dimensions. `N` and `M` are numbers of dimensions of `data` and `indices` tensors, respectively.
-Allowed values for indices are in the range `[-data.shape[axis], data.shape[axis] - 1]`. If index value exceed allowed
+   output[p_0, p_1, ..., p_{axis-1}, i_b, ..., i_{M-1}, p_{axis+1}, ..., p_{N-1}] =
+      data[p_0, p_1, ..., p_{axis-1}, indices[p_0, p_1, ..., p_{b-1}, i_b, ..., i_{M-1}], p_{axis+1}, ..., p_{N-1}]
+
+Where ``data``, ``indices`` and ``axis`` are tensors from first, second and third inputs correspondingly, ``b`` is
+the number of batch dimensions. ``N`` and ``M`` are numbers of dimensions of ``data`` and ``indices`` tensors, respectively.
+Allowed values for indices are in the range ``[-data.shape[axis], data.shape[axis] - 1]``. If index value exceed allowed
 range output data for corresponding index will be filled with zeros (Example 7).

 **Attributes**:
+
 * *batch_dims*
-  * **Description**: *batch_dims* (also denoted as `b`) is a leading number of dimensions of `data` tensor and `indices`
-  representing the batches, and *Gather* starts to gather from the `b` dimension. It requires the first `b`
-  dimensions in `data` and `indices` tensors to be equal. If `batch_dims` is less than zero, normalized value is used
-  `batch_dims = indices.rank + batch_dims`.
-  * **Range of values**: `[-min(data.rank, indices.rank); min(data.rank, indices.rank)]` and `batch_dims' <= axis'`.
-  Where `batch_dims'` and `axis'` stand for normalized `batch_dims` and `axis` values.
+  
+  * **Description**: *batch_dims* (also denoted as ``b``) is a leading number of dimensions of ``data`` tensor 
+    and ``indices`` representing the batches, and *Gather* starts to gather from the ``b`` dimension. 
+    It requires the first ``b`` dimensions in ``data`` and ``indices`` tensors to be equal. 
+    If ``batch_dims`` is less than zero, normalized value is used ``batch_dims = indices.rank + batch_dims``.
+  * **Range of values**: ``[-min(data.rank, indices.rank); min(data.rank, indices.rank)]`` and ``batch_dims' <= axis'``.
+    Where ``batch_dims'`` and ``axis'`` stand for normalized ``batch_dims`` and ``axis`` values.
  * **Type**: *T_AXIS*
  * **Default value**: 0
  * **Required**: *no*

 Example 1 with default *batch_dims* value:
-```
-batch_dims = 0
-axis = 0

-indices = [0, 0, 4]
-data    = [1, 2, 3, 4, 5]
-output  = [1, 1, 5]
-```
+.. code-block::
+
+   batch_dims = 0
+   axis = 0
+   
+   indices = [0, 0, 4]
+   data    = [1, 2, 3, 4, 5]
+   output  = [1, 1, 5]

 Example 2 with non-default *batch_dims* value:
-```
-batch_dims = 1
-axis = 1

-indices = [[0, 0, 4], <-- this is applied to the first batch
-           [4, 0, 0]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data    = [[1, 2, 3, 4, 5],  <-- the first batch
-           [6, 7, 8, 9, 10]]  <-- the second batch
-data_shape = (2, 5)
+   batch_dims = 1
+   axis = 1
+   
+   indices = [[0, 0, 4], <-- this is applied to the first batch
+              [4, 0, 0]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
+   
+   data    = [[1, 2, 3, 4, 5],  <-- the first batch
+              [6, 7, 8, 9, 10]]  <-- the second batch
+   data_shape = (2, 5)
+   
+   output  = [[ 1, 1, 5],
+              [10, 6, 6]]
+   output_shape = (2, 3)

-output  = [[ 1, 1, 5],
-           [10, 6, 6]]
-output_shape = (2, 3)
-```

 Example 3 with non-default *batch_dims* value:
-```
-batch_dims = 2
-axis = 2

-indices = [[[0, 0, 4],  <-- this is applied to the first batch, index = (0, 0)
-            [4, 0, 0]],  <-- this is applied to the second batch, index = (0, 1)
+.. code-block::

-           [[1, 2, 4],  <-- this is applied to the third batch, index = (1, 0)
-            [4, 3, 2]]]  <-- this is applied to the fourth batch, index = (1, 1)
-indices_shape = (2, 2, 3)
+   batch_dims = 2
+   axis = 2
   
-data    = [[[1, 2, 3, 4, 5],  <-- the first batch, index = (0, 0)
-            [6, 7, 8, 9, 10]],  <-- the second batch, index = (0, 1)
+   indices = [[[0, 0, 4],  <-- this is applied to the first batch, index = (0, 0)
+               [4, 0, 0]],  <-- this is applied to the second batch, index = (0, 1)
   
-           [[11, 12, 13, 14, 15],  <-- the third batch, index = (1, 0)
-            [16, 17, 18, 19, 20]]]  <-- the fourth batch, index = (1, 1)
-data_shape = (2, 2, 5)
+              [[1, 2, 4],  <-- this is applied to the third batch, index = (1, 0)
+               [4, 3, 2]]]  <-- this is applied to the fourth batch, index = (1, 1)
+   indices_shape = (2, 2, 3)
   
-output  = [[[ 1, 1, 5],
-            [10, 6, 6]],
+   data    = [[[1, 2, 3, 4, 5],  <-- the first batch, index = (0, 0)
+               [6, 7, 8, 9, 10]],  <-- the second batch, index = (0, 1)
+   
+              [[11, 12, 13, 14, 15],  <-- the third batch, index = (1, 0)
+               [16, 17, 18, 19, 20]]]  <-- the fourth batch, index = (1, 1)
+   data_shape = (2, 2, 5)
+   
+   output  = [[[ 1, 1, 5],
+               [10, 6, 6]],
+   
+              [[12, 13, 15],
+               [20, 19, 18]]]
+   output_shape = (2, 2, 3)

-           [[12, 13, 15],
-            [20, 19, 18]]]
-output_shape = (2, 2, 3)
-```
 Example 4 with *axis* > *batch_dims*:
-```
-batch_dims = 1
-axis = 2

-indices = [[1, 2, 4],  <-- this is applied to the first batch
-           [4, 3, 2]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data = [[[[ 1,  2,  3,  4], <-- first batch
-          [ 5,  6,  7,  8],
-          [ 9, 10, 11, 12],
-          [13, 14, 15, 16],
-          [17, 18, 19, 20]]],
+   batch_dims = 1
+   axis = 2
   
-        [[[21, 22, 23, 24], <-- second batch
-          [25, 26, 27, 28],
-          [29, 30, 31, 32],
-          [33, 34, 35, 36],
-          [37, 38, 39, 40]]]]
-data_shape = (2, 1, 5, 4)
+   indices = [[1, 2, 4],  <-- this is applied to the first batch
+              [4, 3, 2]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
   
-output = [[[[ 5,  6,  7,  8],
-            [ 9, 10, 11, 12],
-            [17, 18, 19, 20]]],
+   data = [[[[ 1,  2,  3,  4], <-- first batch
+             [ 5,  6,  7,  8],
+             [ 9, 10, 11, 12],
+             [13, 14, 15, 16],
+             [17, 18, 19, 20]]],
+   
+           [[[21, 22, 23, 24], <-- second batch
+             [25, 26, 27, 28],
+             [29, 30, 31, 32],
+             [33, 34, 35, 36],
+             [37, 38, 39, 40]]]]
+   data_shape = (2, 1, 5, 4)
+   
+   output = [[[[ 5,  6,  7,  8],
+               [ 9, 10, 11, 12],
+               [17, 18, 19, 20]]],
+   
+             [[[37, 38, 39, 40],
+               [33, 34, 35, 36],
+               [29, 30, 31, 32]]]]
+   output_shape = (2, 1, 3, 4)

-          [[[37, 38, 39, 40],
-            [33, 34, 35, 36],
-            [29, 30, 31, 32]]]]
-output_shape = (2, 1, 3, 4)
-```

 Example 5 with negative *batch_dims* value:
-```
-batch_dims = -1  <-- normalized value will be indices.rank + batch_dims = 2 - 1 = 1
-axis = 1

-indices = [[0, 0, 4], <-- this is applied to the first batch
-           [4, 0, 0]]  <-- this is applied to the second batch
-indices_shape = (2, 3)
+.. code-block::

-data    = [[1, 2, 3, 4, 5],  <-- the first batch
-           [6, 7, 8, 9, 10]]  <-- the second batch
-data_shape = (2, 5)
+   batch_dims = -1  <-- normalized value will be indices.rank + batch_dims = 2 - 1 = 1
+   axis = 1
+   
+   indices = [[0, 0, 4], <-- this is applied to the first batch
+              [4, 0, 0]]  <-- this is applied to the second batch
+   indices_shape = (2, 3)
+   
+   data    = [[1, 2, 3, 4, 5],  <-- the first batch
+              [6, 7, 8, 9, 10]]  <-- the second batch
+   data_shape = (2, 5)
+   
+   output  = [[ 1, 1, 5],
+              [10, 6, 6]]
+   output_shape = (2, 3)

-output  = [[ 1, 1, 5],
-           [10, 6, 6]]
-output_shape = (2, 3)
-```

 Example 6 with negative indices:
-```
-batch_dims = 0
-axis = 0

-indices = [0, -2, -1]
-data    = [1, 2, 3, 4, 5]
-output  = [1, 4, 5]
-```
+.. code-block::
+
+   batch_dims = 0
+   axis = 0
+   
+   indices = [0, -2, -1]
+   data    = [1, 2, 3, 4, 5]
+   output  = [1, 4, 5]
+

 Example 7 with indices out of the range:
-```
-batch_dims = 0
-axis = 0

-indices = [3, 10, -20] 
-data    = [1, 2, 3, 4, 5]
-output  = [4, 0, 0]
-```
+.. code-block::
+
+   batch_dims = 0
+   axis = 0
+   
+   indices = [3, 10, -20] 
+   data    = [1, 2, 3, 4, 5]
+   output  = [4, 0, 0]
+

 **Inputs**

-* **1**:  `data` tensor of type *T* with arbitrary data. **Required.**
-
-* **2**:  `indices` tensor of type *T_IND* with indices to gather. 0D tensor (scalar) for indices is also allowed.
-  The values for indices are in the range `[-data.shape[axis], data.shape[axis] - 1]`.
-  Negative values of indices indicate reverse indexing from `data.shape[axis]`.
-  **Required.**
-
-* **3**:  Scalar or 1D tensor `axis` of *T_AXIS* type is a dimension index to gather data from. For example,
-*axis* equal to 1 means that gathering is performed over the first dimension. Negative `axis` means reverse indexing and
-  will be normalized to value `axis = data.rank + axis`. Allowed values are from `[-len(data.shape), len(data.shape) - 1]`
-  and `axis' >= batch_dims'`. Where `axis'` and `batch_dims'` stand for normalized `batch_dims` and `axis` values.
-**Required.**
+* **1**:  ``data`` tensor of type *T* with arbitrary data. **Required.**
+* **2**:  ``indices`` tensor of type *T_IND* with indices to gather. 0D tensor (scalar) for indices is also allowed.
+  The values for indices are in the range ``[-data.shape[axis], data.shape[axis] - 1]``.
+  Negative values of indices indicate reverse indexing from ``data.shape[axis]``. **Required.**
+* **3**:  Scalar or 1D tensor ``axis`` of *T_AXIS* type is a dimension index to gather data from. For example,
+  *axis* equal to 1 means that gathering is performed over the first dimension. Negative ``axis`` means reverse indexing and
+  will be normalized to value ``axis = data.rank + axis``. Allowed values are from ``[-len(data.shape), len(data.shape) - 1]``
+  and ``axis' >= batch_dims'``. Where ``axis'`` and ``batch_dims'`` stand for normalized ``batch_dims`` and ``axis`` values. **Required.**

 **Outputs**

-* **1**: The resulting tensor of type *T* that consists of elements from `data` tensor gathered by `indices`. The shape
-of the output tensor is `data.shape[:axis] + indices.shape[batch_dims:] + data.shape[axis + 1:]`
+* **1**: The resulting tensor of type *T* that consists of elements from ``data`` tensor gathered by ``indices``. The shape
+of the output tensor is ``data.shape[:axis] + indices.shape[batch_dims:] + data.shape[axis + 1:]``

 **Types**

 * *T*: any supported type.
-
 * *T_IND*: any supported integer types.
-
 * *T_AXIS*: any supported integer types.

 **Example**

-```xml
-<layer ... type="Gather" version="opset8">
-    <data batch_dims="1" />
-    <input>
-        <port id="0">
-            <dim>2</dim>
-            <dim>64</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">
-            <dim>2</dim>
-            <dim>32</dim>
-            <dim>21</dim>
-        </port>
-        <port id="2"/>   <!--  axis = 1  -->
-    </input>
-    <output>
-        <port id="2">
-            <dim>2</dim>
-            <dim>32</dim>
-            <dim>21</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="Gather" version="opset8">
+       <data batch_dims="1" />
+       <input>
+           <port id="0">
+               <dim>2</dim>
+               <dim>64</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">
+               <dim>2</dim>
+               <dim>32</dim>
+               <dim>21</dim>
+           </port>
+           <port id="2"/>   < !--  axis = 1  -->
+       </input>
+       <output>
+           <port id="2">
+               <dim>2</dim>
+               <dim>32</dim>
+               <dim>21</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/normalization/GRN_1.md
+++ b/docs/ops/normalization/GRN_1.md
@ -1,5 +1,7 @@
 # GRN {#openvino_docs_ops_normalization_GRN_1}

+@sphinxdirective
+
 **Versioned name**: *GRN-1*

 **Category**: *Normalization*
@ -8,9 +10,12 @@

 **Detailed description**:

-*GRN* computes the L2 norm across channels for input tensor with shape `[N, C, ...]`. *GRN* does the following with the input tensor:
+*GRN* computes the L2 norm across channels for input tensor with shape ``[N, C, ...]``. *GRN* does the following with the input tensor:
+
+.. math::
+
+   output[i0, i1, ..., iN] = x[i0, i1, ..., iN] / sqrt(sum[j = 0..C-1](x[i0, j, ..., iN]**2) + bias)

-    output[i0, i1, ..., iN] = x[i0, i1, ..., iN] / sqrt(sum[j = 0..C-1](x[i0, j, ..., iN]**2) + bias)

 **Attributes**:

@ -18,16 +23,16 @@

  * **Description**: *bias* is added to the sum of squares.
  * **Range of values**: a positive floating-point number
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 **Inputs**

-* **1**:  `data` - A tensor of type *T* and `2 <= rank <= 4`. **Required.**
+* **1**:  ``data`` - A tensor of type *T* and ``2 <= rank <= 4``. **Required.**

 **Outputs**

-* **1**: The result of *GRN* function applied to `data` input tensor. Normalized tensor of the same type and shape as the data input.
+* **1**: The result of *GRN* function applied to ``data`` input tensor. Normalized tensor of the same type and shape as the data input.

 **Types**

@ -35,24 +40,29 @@

 **Example**

-```xml
-<layer ... type="GRN">
-    <data bias="1e-4"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>20</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </input>
-    <output>
-        <port id="0" precision="f32">
-            <dim>1</dim>
-            <dim>20</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer ... type="GRN">
+       <data bias="1e-4"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>20</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </input>
+       <output>
+           <port id="0" precision="f32">
+               <dim>1</dim>
+               <dim>20</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+
+
+@endsphinxdirective
+
--- a/docs/ops/quantization/FakeQuantize_1.md
+++ b/docs/ops/quantization/FakeQuantize_1.md
@ -1,26 +1,37 @@
 # FakeQuantize {#openvino_docs_ops_quantization_FakeQuantize_1}

+@sphinxdirective
+
 **Versioned name**: *FakeQuantize-1*

 **Category**: *Quantization*

 **Short description**: *FakeQuantize* is element-wise linear quantization of floating-point input values into a discrete set of floating-point values.

-**Detailed description**: Input and output ranges as well as the number of levels of quantization are specified by dedicated inputs and attributes. There can be different limits for each element or groups of elements (channels) of the input tensors. Otherwise, one limit applies to all elements. It depends on shape of inputs that specify limits and regular broadcasting rules applied for input tensors. The output of the operator is a floating-point number of the same type as the input tensor. In general, there are four values that specify quantization for each element: *input_low*, *input_high*, *output_low*, *output_high*. *input_low* and *input_high* attributes specify the input range of quantization. All input values that are outside this range are clipped to the range before actual quantization. *output_low* and *output_high* specify minimum and maximum quantized values at the output.
+**Detailed description**: Input and output ranges as well as the number of levels of quantization 
+are specified by dedicated inputs and attributes. There can be different limits for each element or 
+groups of elements (channels) of the input tensors. Otherwise, one limit applies to all elements. 
+It depends on shape of inputs that specify limits and regular broadcasting rules applied for input tensors. 
+The output of the operator is a floating-point number of the same type as the input tensor. 
+In general, there are four values that specify quantization for each element: *input_low*, *input_high*, *output_low*, *output_high*. 
+*input_low* and *input_high* attributes specify the input range of quantization. All input values that are 
+outside this range are clipped to the range before actual quantization. *output_low* and *output_high* 
+specify minimum and maximum quantized values at the output.

 *Fake* in *FakeQuantize* means the output tensor is of the same floating point type as an input tensor, not integer type.

 Each element of the output is defined as the result of the following expression:

-```python
-if x <= min(input_low, input_high):
-    output = output_low
-elif x > max(input_low, input_high):
-    output = output_high
-else:
-    # input_low < x <= input_high
-    output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low
-```
+.. code-block:: python
+
+   if x <= min(input_low, input_high):
+       output = output_low
+   elif x > max(input_low, input_high):
+       output = output_high
+   else:
+       # input_low < x <= input_high
+       output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low
+

 **Attributes**

@ -44,68 +55,69 @@ else:

 **Inputs**:

-*   **1**: `X` - tensor of type *T_F* and arbitrary shape. **Required.**
-
-*   **2**: `input_low` - tensor of type *T_F* with minimum limit for input value. The shape must be broadcastable to the shape of *X*. **Required.**
-
-*   **3**: `input_high` - tensor of type *T_F* with maximum limit for input value. Can be the same as `input_low` for binarization. The shape must be broadcastable to the shape of *X*. **Required.**
-
-*   **4**: `output_low` - tensor of type *T_F* with minimum quantized value. The shape must be broadcastable to the shape of *X*. **Required.**
-
-*   **5**: `output_high` - tensor of type *T_F* with maximum quantized value. The shape must be broadcastable to the of *X*. **Required.**
+* **1**: `X` - tensor of type *T_F* and arbitrary shape. **Required.**
+* **2**: `input_low` - tensor of type *T_F* with minimum limit for input value. The shape must be broadcastable to the shape of *X*. **Required.**
+* **3**: `input_high` - tensor of type *T_F* with maximum limit for input value. Can be the same as `input_low` for binarization. 
+  The shape must be broadcastable to the shape of *X*. **Required.**
+* **4**: `output_low` - tensor of type *T_F* with minimum quantized value. The shape must be broadcastable to the shape of *X*. **Required.**
+* **5**: `output_high` - tensor of type *T_F* with maximum quantized value. The shape must be broadcastable to the of *X*. **Required.**

 **Outputs**:

-*   **1**: output tensor of type *T_F* with shape and type matching the 1st input tensor *X*.
+* **1**: output tensor of type *T_F* with shape and type matching the 1st input tensor *X*.

 **Types**

-*   *T_F*: any supported floating point type.
+* *T_F*: any supported floating point type.

 **Example**

-```xml
-<layer … type="FakeQuantize"…>
-    <data levels="2"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>56</dim>
-            <dim>56</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-        </port>
-        <port id="2">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-        </port>
-        <port id="3">
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-        </port>
-        <port id="4">
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>1</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>64</dim>
-            <dim>56</dim>
-            <dim>56</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+
+   <layer … type="FakeQuantize"…>
+       <data levels="2"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>56</dim>
+               <dim>56</dim>
+           </port>
+           <port id="1">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+           </port>
+           <port id="2">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+           </port>
+           <port id="3">
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+           </port>
+           <port id="4">
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>1</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>64</dim>
+               <dim>56</dim>
+               <dim>56</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/ops/sequence/GRUCell_3.md
+++ b/docs/ops/sequence/GRUCell_3.md
@ -1,25 +1,28 @@
 # GRUCell  {#openvino_docs_ops_sequence_GRUCell_3}

+@sphinxdirective
+
 **Versioned name**: *GRUCell-3*

 **Category**: *Sequence processing*

-**Short description**: *GRUCell* represents a single GRU Cell that computes the output using the formula described in the [paper](https://arxiv.org/abs/1406.1078).
+**Short description**: *GRUCell* represents a single GRU Cell that computes the output using the formula described in the `paper <https://arxiv.org/abs/1406.1078>`__.

 **Detailed description**: *GRUCell* computes the output *Ht* for the current time step based on the followint formula:

-```
-Formula:
-  *  - matrix multiplication
- (.) - Hadamard product(element-wise)
- [,] - concatenation
-  f, g - are activation functions.
-   zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
-   rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
-   ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
-   ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
-   Ht = (1 - zt) (.) ht + zt (.) Ht-1
-```
+.. code-block::
+
+   Formula:
+     *  - matrix multiplication
+    (.) - Hadamard product(element-wise)
+    [,] - concatenation
+     f, g - are activation functions.
+      zt = f(Xt*(Wz^T) + Ht-1*(Rz^T) + Wbz + Rbz)
+      rt = f(Xt*(Wr^T) + Ht-1*(Rr^T) + Wbr + Rbr)
+      ht = g(Xt*(Wh^T) + (rt (.) Ht-1)*(Rh^T) + Rbh + Wbh) # default, when linear_before_reset = 0
+      ht = g(Xt*(Wh^T) + (rt (.) (Ht-1*(Rh^T) + Rbh)) + Wbh) # when linear_before_reset != 0
+      Ht = (1 - zt) (.) ht + zt (.) Ht-1
+

 **Attributes**

@ -27,7 +30,7 @@ Formula:

  * **Description**: *hidden_size* specifies hidden state size.
  * **Range of values**: a positive integer
-  * **Type**: `int`
+  * **Type**: ``int``
  * **Required**: *yes*

 * *activations*
@ -42,7 +45,7 @@ Formula:

  * **Description**: *activations_alpha, activations_beta* functions attributes
  * **Range of values**: a list of floating-point numbers
-  * **Type**: `float[]`
+  * **Type**: ``float[]``
  * **Default value**: None
  * **Required**: *no*

@ -50,68 +53,73 @@ Formula:

  * **Description**: *clip* specifies value for tensor clipping to be in *[-C, C]* before activations
  * **Range of values**: a positive floating-point number
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Default value**: *infinity* that means that the clipping is not applied
  * **Required**: *no*

 * *linear_before_reset*

-  * **Description**: *linear_before_reset* flag denotes if the layer behaves according to the modification of *GRUCell* described in the formula in the [ONNX documentation](https://github.com/onnx/onnx/blob/master/docs/Operators.md#GRU).
+  * **Description**: *linear_before_reset* flag denotes if the layer behaves according to the modification 
+    of *GRUCell* described in the formula in the `ONNX documentation <https://github.com/onnx/onnx/blob/master/docs/Operators.md#GRU>`__.
  * **Range of values**: true or false
-  * **Type**: `boolean`
+  * **Type**: ``boolean``
  * **Default value**: false
  * **Required**: *no*

 **Inputs**

-* **1**: `X` - 2D tensor of type *T* `[batch_size, input_size]`, input data. **Required.**
-
-* **2**: `initial_hidden_state` - 2D tensor of type *T* `[batch_size, hidden_size]`. **Required.**
-
-* **3**: `W` - 2D tensor of type *T* `[3 * hidden_size, input_size]`, the weights for matrix multiplication, gate order: zrh. **Required.**
-
-* **4**: `R` - 2D tensor of type *T* `[3 * hidden_size, hidden_size]`, the recurrence weights for matrix multiplication, gate order: zrh. **Required.**
-
-* **5**: `B` - 1D tensor of type *T*. If *linear_before_reset* is set to 1, then the shape is `[4 * hidden_size]` - the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. Otherwise the shape is `[3 * hidden_size]`, the sum of biases (weights and recurrence weights).  **Optional.**
+* **1**: ``X`` - 2D tensor of type *T* ``[batch_size, input_size]``, input data. **Required.**
+* **2**: ``initial_hidden_state`` - 2D tensor of type *T* ``[batch_size, hidden_size]``. **Required.**
+* **3**: ``W`` - 2D tensor of type *T* ``[3 * hidden_size, input_size]``, the weights for matrix multiplication, gate order: zrh. **Required.**
+* **4**: ``R`` - 2D tensor of type *T* ``[3 * hidden_size, hidden_size]``, the recurrence weights for matrix multiplication, gate order: zrh. **Required.**
+* **5**: ``B`` - 1D tensor of type *T*. If *linear_before_reset* is set to 1, then the shape is ``[4 * hidden_size]`` - 
+  the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. 
+  Otherwise the shape is ``[3 * hidden_size]``, the sum of biases (weights and recurrence weights).  **Optional.**

 **Outputs**

-* **1**: `Ho` - 2D tensor of type *T* `[batch_size, hidden_size]`, the last output value of hidden state.
+* **1**: ``Ho`` - 2D tensor of type *T* ``[batch_size, hidden_size]``, the last output value of hidden state.

 **Types**

 * *T*: any supported floating-point type.

 **Example**
-```xml
-<layer ... type="GRUCell" ...>
-    <data hidden_size="128" linear_before_reset="1"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>16</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-         <port id="2">
-            <dim>384</dim>
-            <dim>16</dim>
-        </port>
-         <port id="3">
-            <dim>384</dim>
-            <dim>128</dim>
-        </port>
-         <port id="4">
-            <dim>768</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer ... type="GRUCell" ...>
+       <data hidden_size="128" linear_before_reset="1"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>16</dim>
+           </port>
+           <port id="1">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+            <port id="2">
+               <dim>384</dim>
+               <dim>16</dim>
+           </port>
+            <port id="3">
+               <dim>384</dim>
+               <dim>128</dim>
+           </port>
+            <port id="4">
+               <dim>768</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>
+
+
+
+@endsphinxdirective
+
--- a/docs/ops/sequence/GRUSequence_5.md
+++ b/docs/ops/sequence/GRUSequence_5.md
@ -1,14 +1,21 @@
 # GRUSequence  {#openvino_docs_ops_sequence_GRUSequence_5}

+@sphinxdirective
+
 **Versioned name**: *GRUSequence-5*

 **Category**: *Sequence processing*

-**Short description**: *GRUSequence* operation represents a series of GRU cells. Each cell is implemented as <a href="#GRUCell">GRUCell</a> operation.
+**Short description**: *GRUSequence* operation represents a series of GRU cells. Each cell is implemented as GRUCell operation.

 **Detailed description**

-A single cell in the sequence is implemented in the same way as in <a href="#GRUCell">GRUCell</a> operation. *GRUSequence* represents a sequence of GRU cells. The sequence can be connected differently depending on `direction` attribute that specifies the direction of traversing of input data along sequence dimension or specifies whether it should be a bidirectional sequence. The most of the attributes are in sync with the specification of ONNX GRU operator defined <a href="https://github.com/onnx/onnx/blob/master/docs/Operators.md#gru">GRUCell</a>.
+A single cell in the sequence is implemented in the same way as in *GRUCell* operation. *GRUSequence* 
+represents a sequence of GRU cells. The sequence can be connected differently depending on 
+``direction`` attribute that specifies the direction of traversing of input data along sequence 
+dimension or specifies whether it should be a bidirectional sequence. The most of the attributes 
+are in sync with the specification of ONNX GRU operator defined 
+`GRUCell <https://github.com/onnx/onnx/blob/master/docs/Operators.md#gru>`__


 **Attributes**
@ -22,7 +29,8 @@ A single cell in the sequence is implemented in the same way as in <a href="#GRU

 * *activations*

-  * **Description**: *activations* specifies activation functions for gates, there are two gates, so two activation functions should be specified as a value for this attributes
+  * **Description**: *activations* specifies activation functions for gates, there are two gates, 
+    so two activation functions should be specified as a value for this attributes
  * **Range of values**: any combination of *relu*, *sigmoid*, *tanh*
  * **Type**: a list of strings
  * **Default value**: *sigmoid,tanh*
@ -30,9 +38,10 @@ A single cell in the sequence is implemented in the same way as in <a href="#GRU

 * *activations_alpha, activations_beta*

-  * **Description**: *activations_alpha, activations_beta* attributes of functions; applicability and meaning of these attributes depends on chosen activation functions
+  * **Description**: *activations_alpha, activations_beta* attributes of functions; 
+    applicability and meaning of these attributes depends on chosen activation functions
  * **Range of values**: a list of floating-point numbers
-  * **Type**: `float[]`
+  * **Type**: ``float[]``
  * **Default value**: None
  * **Required**: *no*

@ -46,38 +55,43 @@ A single cell in the sequence is implemented in the same way as in <a href="#GRU

 * *direction*

-  * **Description**: Specify if the RNN is forward, reverse, or bidirectional. If it is one of *forward* or *reverse* then `num_directions = 1`, if it is *bidirectional*, then `num_directions = 2`. This `num_directions` value specifies input/output shape requirements.
+  * **Description**: Specify if the RNN is forward, reverse, or bidirectional. If it is one of *forward* or *reverse* 
+    then ``num_directions = 1``, if it is *bidirectional*, then ``num_directions = 2``. This ``num_directions`` 
+    value specifies input/output shape requirements.
  * **Range of values**: *forward*, *reverse*, *bidirectional*
-  * **Type**: `string`
+  * **Type**: ``string``
  * **Required**: *yes*

 * *linear_before_reset*

-  * **Description**: *linear_before_reset* flag denotes if the layer behaves according to the modification of *GRUCell* described in the formula in the [ONNX documentation](https://github.com/onnx/onnx/blob/master/docs/Operators.md#GRU).
+  * **Description**: *linear_before_reset* flag denotes if the layer behaves according to the modification 
+    of *GRUCell* described in the formula in the `ONNX documentation <https://github.com/onnx/onnx/blob/master/docs/Operators.md#GRU>`__.
  * **Range of values**: True or False
-  * **Type**: `boolean`
+  * **Type**: ``boolean``
  * **Default value**: False
  * **Required**: *no*

 **Inputs**

-* **1**: `X` - 3D tensor of type *T1* `[batch_size, seq_length, input_size]`, input data. It differs from GRUCell 1st input only by additional axis with size `seq_length`. **Required.**
-
-* **2**: `initial_hidden_state` - 3D tensor of type *T1* `[batch_size, num_directions, hidden_size]`, input hidden state data. **Required.**
-
-* **3**: `sequence_lengths` - 1D tensor of type *T2* `[batch_size]`, specifies real sequence lengths for each batch element. In case of negative values in this input, the operation behavior is undefined. **Required.**
-
-* **4**: `W` - 3D tensor of type *T1* `[num_directions, 3 * hidden_size, input_size]`, the weights for matrix multiplication, gate order: zrh. **Required.**
-
-* **5**: `R` - 3D tensor of type *T1* `[num_directions, 3 * hidden_size, hidden_size]`, the recurrence weights for matrix multiplication, gate order: zrh. **Required.**
-
-* **6**: `B` - 2D tensor of type *T*. If *linear_before_reset* is set to 1, then the shape is `[num_directions, 4 * hidden_size]` - the sum of biases for z and r gates (weights and recurrence weights), the biases for h gate are placed separately. Otherwise the shape is `[num_directions, 3 * hidden_size]`, the sum of biases (weights and recurrence weights). **Required.**
+* **1**: ``X`` - 3D tensor of type *T1* ``[batch_size, seq_length, input_size]``, input data. 
+  It differs from GRUCell 1st input only by additional axis with size ``seq_length``. **Required.**
+* **2**: ``initial_hidden_state`` - 3D tensor of type *T1* ``[batch_size, num_directions, hidden_size]``, 
+  input hidden state data. **Required.**
+* **3**: ``sequence_lengths`` - 1D tensor of type *T2* ``[batch_size]``, specifies real sequence lengths 
+  for each batch element. In case of negative values in this input, the operation behavior is undefined. **Required.**
+* **4**: ``W`` - 3D tensor of type *T1* ``[num_directions, 3 * hidden_size, input_size]``, 
+  the weights for matrix multiplication, gate order: zrh. **Required.**
+* **5**: ``R`` - 3D tensor of type *T1* ``[num_directions, 3 * hidden_size, hidden_size]``, 
+  the recurrence weights for matrix multiplication, gate order: zrh. **Required.**
+* **6**: ``B`` - 2D tensor of type *T*. If *linear_before_reset* is set to 1, then the shape 
+  is ``[num_directions, 4 * hidden_size]`` - the sum of biases for z and r gates (weights and recurrence weights), 
+  the biases for h gate are placed separately. Otherwise the shape is ``[num_directions, 3 * hidden_size]``, 
+  the sum of biases (weights and recurrence weights). **Required.**

 **Outputs**

-* **1**: `Y` - 4D tensor of type *T1* `[batch_size, num_directions, seq_len, hidden_size]`, concatenation of all the intermediate output values of the hidden.
-
-* **2**: `Ho` - 3D tensor of type *T1* `[batch_size, num_directions, hidden_size]`, the last output value of hidden state.
+* **1**: ``Y`` - 4D tensor of type *T1* ``[batch_size, num_directions, seq_len, hidden_size]``, concatenation of all the intermediate output values of the hidden.
+* **2**: ``Ho`` - 3D tensor of type *T1* ``[batch_size, num_directions, hidden_size]``, the last output value of hidden state.

 **Types**

@ -85,50 +99,55 @@ A single cell in the sequence is implemented in the same way as in <a href="#GRU
 * *T2*: any supported integer type.

 **Example**
-```xml
-<layer ... type="GRUSequence" ...>
-    <data hidden_size="128"/>
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>4</dim>
-            <dim>16</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-        <port id="2">
-            <dim>1</dim>
-        </port>
-         <port id="3">
-            <dim>1</dim>
-            <dim>384</dim>
-            <dim>16</dim>
-        </port>
-         <port id="4">
-            <dim>1</dim>
-            <dim>384</dim>
-            <dim>128</dim>
-        </port>
-         <port id="5">
-            <dim>1</dim>
-            <dim>384</dim>
-        </port>
-    </input>
-    <output>
-        <port id="6">
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>4</dim>
-            <dim>128</dim>
-        </port>
-        <port id="7">
-            <dim>1</dim>
-            <dim>1</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+
+.. code-block:: cpp
+
+   <layer ... type="GRUSequence" ...>
+       <data hidden_size="128"/>
+       <input>
+           <port id="0">
+               <dim>1</dim>
+               <dim>4</dim>
+               <dim>16</dim>
+           </port>
+           <port id="1">
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+           <port id="2">
+               <dim>1</dim>
+           </port>
+            <port id="3">
+               <dim>1</dim>
+               <dim>384</dim>
+               <dim>16</dim>
+           </port>
+            <port id="4">
+               <dim>1</dim>
+               <dim>384</dim>
+               <dim>128</dim>
+           </port>
+            <port id="5">
+               <dim>1</dim>
+               <dim>384</dim>
+           </port>
+       </input>
+       <output>
+           <port id="6">
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>4</dim>
+               <dim>128</dim>
+           </port>
+           <port id="7">
+               <dim>1</dim>
+               <dim>1</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>
+
+
+@endsphinxdirective
+
--- a/docs/requirements.txt
+++ b/docs/requirements.txt
@ -19,7 +19,7 @@ mistune==2.0.3
 packaging==23.0
 pluggy==0.13.1
 pydata-sphinx-theme==0.7.2
-Pygments==2.14.0
+Pygments==2.15.1
 pyparsing==3.0.9
 pytest==6.2.5
 pytest-html==3.1.1