DOCS shift to rst - Opsets B (#17169)

* Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_1.md * Update BatchNormInference_5.md * Update BatchToSpace_2.md * Update BinaryConvolution_1.md * Update Broadcast_1.md * Update Broadcast_3.md * Update Bucketize_3.md * fix * fix-2
2023-04-25 16:06:17 +02:00
parent acd424bb5e
commit 49b5d039db
7 changed files with 647 additions and 565 deletions
--- a/docs/ops/normalization/BatchNormInference_1.md
+++ b/docs/ops/normalization/BatchNormInference_1.md
@@ -1,78 +1,96 @@
 # BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_1}

-**Versioned name**: *BatchNormInference-1*
+@sphinxdirective
+
+**Versioned name**: *BatchNormInference-5*

 **Category**: *Normalization*

-**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
+**Short description**: *BatchNormInference* performs Batch Normalization operation described in the `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167v2>`__ article.

 **Detailed Description**

-*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
+*BatchNormInference* performs the following operations on a given data batch input tensor ``data``:

-* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
-\f[
-   \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
-\f]
-where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
+* Normalizes each activation :math:`x^{(k)}` by the mean and variance.
+  
+  .. math::
+     
+     \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}

-* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
-\f[
-   \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
-\f]
-where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
+  where :math:`E[x^{(k)}]` and :math:`Var(x^{(k)})` are the mean and variance, calculated per channel axis of ``data`` input, and correspond to ``mean`` and ``variance`` inputs, respectively. Additionally, :math:`\epsilon` is a value added to the variance for numerical stability and corresponds to ``epsilon`` attribute.
+
+* Performs linear transformation of each normalized activation based on ``gamma`` and ``beta`` input, representing the scaling factor and shift, respectively.
+  
+  .. math::
+     
+     \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
+  
+  where :math:`\gamma^{(k)}` and :math:`\beta^{(k)}` are learnable parameters, calculated per channel axis, and correspond to ``gamma`` and ``beta`` inputs.

 **Mathematical Formulation**

-Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
+Let ``x`` be a *d*-dimensional input, :math:`x=(x_{1}\dotsc x_{d})`. Since normalization is applied to each activation :math:`E[x^{(k)}]`, you can focus on a particular activation and omit k.

-For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+For a particular activation, consider a mini-batch :math:`\mathcal{B}` of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:

-*   **Input**: Values of \f$x\f$ over a mini-batch:
-    \f[
-    \mathcal{B} = \{ x_{1...m} \}
-    \f]
-*   **Parameters to learn**: \f$ \gamma, \beta\f$
-*   **Output**:
-    \f[
-    \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}
-    \f]
-*   **Mini-batch mean**:
-    \f[
-    \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
-    \f]
-*   **Mini-batch variance**:
-    \f[
-    \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
-    \f]
-*   **Normalize**:
-    \f[
-    \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
-    \f]
-*   **Scale and shift**:
-    \f[
-    o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
-    \f]
+* **Input**: Values of :math:`x` over a mini-batch:
+  
+  .. math::
+     
+     \mathcal{B} = {x_{1...m}}
+
+* **Parameters to learn**: :math:`\gamma, \beta`
+* **Output**:
+  
+  .. math::
+     
+     {o_{i} = BN_{\gamma, \beta} ( b_{i} )}
+
+* **Mini-batch mean**:
+  
+  .. math::
+     
+     \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
+
+* **Mini-batch variance**:
+  
+  .. math::
+     
+     \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
+
+* **Normalize**:
+  
+  .. math::
+     
+     \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
+
+* **Scale and shift**:
+  
+  .. math::
+     
+     o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )

 **Attributes**:

 * *epsilon*
+  
  * **Description**: *epsilon* is a constant added to the variance for numerical stability.
  * **Range of values**: a floating-point number greater than or equal to zero
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 **Inputs**

-* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
-* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
+* **1**: ``data`` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
+* **2**: ``gamma`` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **3**: ``beta`` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **4**: ``mean`` - Value for mean normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **5**: ``variance`` - Value for variance normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**

 **Outputs**

-* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
+* **1**: The result of element-wise Batch Normalization operation applied to the input tensor ``data``. A tensor of type *T* and the same shape as ``data`` input tensor.

 **Types**

@@ -80,70 +98,73 @@ For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values

 **Examples**

-*Example: 2D input tensor `data`*
+Example: 2D input tensor ``data`` 

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>128</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>128</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>128</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>128</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>128</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>128</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>

-*Example: 4D input tensor `data`*
+Example: 4D input tensor ``data``
+
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>3</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>3</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>3</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>3</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>3</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>3</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```
--- a/docs/ops/normalization/BatchNormInference_5.md
+++ b/docs/ops/normalization/BatchNormInference_5.md
@@ -1,78 +1,97 @@
 # BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_5}

+@sphinxdirective
+
 **Versioned name**: *BatchNormInference-5*

 **Category**: *Normalization*

-**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
+**Short description**: *BatchNormInference* performs Batch Normalization operation described in the `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift <https://arxiv.org/abs/1502.03167v2>`__ article.

 **Detailed Description**

-*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
+*BatchNormInference* performs the following operations on a given data batch input tensor ``data``:

-* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
-\f[
-   \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
-\f]
-where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
+* Normalizes each activation :math:`x^{(k)}` by the mean and variance.

-* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
-\f[
-   \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
-\f]
-where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
+  .. math::
+     
+     \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
+  
+  where :math:`E[x^{(k)}]` and :math:`Var(x^{(k)})` are the mean and variance, calculated per channel axis of ``data`` input, and correspond to ``mean`` and ``variance`` inputs, respectively. Additionally, :math:`\epsilon` is a value added to the variance for numerical stability and corresponds to ``epsilon`` attribute.
+
+* Performs linear transformation of each normalized activation based on ``gamma`` and ``beta`` input, representing the scaling factor and shift, respectively.
+
+  .. math::
+     
+     \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
+  
+  where :math:`\gamma^{(k)}` and :math:`\beta^{(k)}` are learnable parameters, calculated per channel axis, and correspond to ``gamma`` and ``beta`` inputs.

 **Mathematical Formulation**

-Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
+Let ``x`` be a *d*-dimensional input, :math:`x=(x_{1}\dotsc x_{d})`. Since normalization is applied to each activation :math:`E[x^{(k)}]`, you can focus on a particular activation and omit k.

-For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+For a particular activation, consider a mini-batch :math:`\mathcal{B}` of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
+
+* **Input**: Values of :math:`x` over a mini-batch:
+  
+  .. math::
+     
+     \mathcal{B} = {x_{1...m}}
+    
+* **Parameters to learn**: :math:`\gamma, \beta`
+* **Output**:
+  
+  .. math::
+     
+     {o_{i} = BN_{\gamma, \beta} ( b_{i} )}
+    
+* **Mini-batch mean**:
+  
+  .. math::
+     
+     \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
+
+* **Mini-batch variance**:
+  
+  .. math::
+     
+     \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
+
+* **Normalize**:
+  
+  .. math::
+     
+     \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
+
+* **Scale and shift**:
+  
+  .. math::
+     
+     o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )

-*   **Input**: Values of \f$x\f$ over a mini-batch:
-    \f[
-    \mathcal{B} = \{ x_{1...m} \}
-    \f]
-*   **Parameters to learn**: \f$ \gamma, \beta\f$
-*   **Output**:
-    \f[
-    \{ o_{i} = BN_{\gamma, \beta} ( b_{i} ) \}
-    \f]
-*   **Mini-batch mean**:
-    \f[
-    \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
-    \f]
-*   **Mini-batch variance**:
-    \f[
-    \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
-    \f]
-*   **Normalize**:
-    \f[
-    \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
-    \f]
-*   **Scale and shift**:
-    \f[
-    o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
-    \f]

 **Attributes**:

 * *epsilon*
+  
  * **Description**: *epsilon* is a constant added to the variance for numerical stability.
  * **Range of values**: a floating-point number greater than or equal to zero
-  * **Type**: `float`
+  * **Type**: ``float``
  * **Required**: *yes*

 **Inputs**

-* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
-* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
-* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
+* **1**: ``data`` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
+* **2**: ``gamma`` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **3**: ``beta`` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **4**: ``mean`` - Value for mean normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**
+* **5**: ``variance`` - Value for variance normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.**

 **Outputs**

-* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
+* **1**: The result of element-wise Batch Normalization operation applied to the input tensor ``data``. A tensor of type *T* and the same shape as ``data`` input tensor.

 **Types**

@@ -80,70 +99,73 @@ For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values

 **Examples**

-*Example: 2D input tensor `data`*
+Example: 2D input tensor ``data``

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>128</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>128</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>128</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>128</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>10</dim>
-            <dim>128</dim>
-        </port>
-    </output>
-</layer>
-```
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>128</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>128</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>128</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>128</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>10</dim>
+               <dim>128</dim>
+           </port>
+       </output>
+   </layer>

-*Example: 4D input tensor `data`*
+Example: 4D input tensor ``data``
+
+.. code-block:: cpp
+   
+   <layer ... type="BatchNormInference" ...>
+       <data epsilon="9.99e-06" />
+       <input>
+           <port id="0">  < !-- input -->
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+           <port id="1">  < !-- gamma -->
+               <dim>3</dim>
+           </port>
+           <port id="2">  < !-- beta -->
+               <dim>3</dim>
+           </port>
+           <port id="3">  < !-- mean -->
+               <dim>3</dim>
+           </port>
+           <port id="4">  < !-- variance -->
+               <dim>3</dim>
+           </port>
+       </input>
+       <output>
+           <port id="5">
+               <dim>1</dim>
+               <dim>3</dim>
+               <dim>224</dim>
+               <dim>224</dim>
+           </port>
+       </output>
+   </layer>
+
+@endsphinxdirective

-```xml
-<layer ... type="BatchNormInference" ...>
-    <data epsilon="9.99e-06" />
-    <input>
-        <port id="0">  <!-- input -->
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-        <port id="1">  <!-- gamma -->
-            <dim>3</dim>
-        </port>
-        <port id="2">  <!-- beta -->
-            <dim>3</dim>
-        </port>
-        <port id="3">  <!-- mean -->
-            <dim>3</dim>
-        </port>
-        <port id="4">  <!-- variance -->
-            <dim>3</dim>
-        </port>
-    </input>
-    <output>
-        <port id="5">
-            <dim>1</dim>
-            <dim>3</dim>
-            <dim>224</dim>
-            <dim>224</dim>
-        </port>
-    </output>
-</layer>
-```