BatchNormInference specification refactoring (#5489)
* BatchNormInference specification refactoring * Address review comments * Remove he term Transform from definition * Add title of the paper where this operation is introduced * Add missing backticks * Remove redundant information in attribute epsilon range of values * Refinement of spec Remove more mentions to transformation to avoid confusion * Corrected typos and added changes to improve readability * Use third person to express operation steps
This commit is contained in:
parent
102e95f7f5
commit
b9fe465cf0
@ -4,39 +4,33 @@
|
|||||||
|
|
||||||
**Category**: *Normalization*
|
**Category**: *Normalization*
|
||||||
|
|
||||||
**Short description**: *BatchNormInference* layer normalizes a `input` tensor by `mean` and `variance`, and applies a scale (`gamma`) to it, as well as an offset (`beta`).
|
**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
|
||||||
|
|
||||||
**Attributes**:
|
**Detailed Description**
|
||||||
|
|
||||||
* *epsilon*
|
*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
|
||||||
* **Description**: *epsilon* is the number to be added to the variance to avoid division by zero when normalizing a value. For example, *epsilon* equal to 0.001 means that 0.001 is added to the variance.
|
|
||||||
* **Range of values**: a positive floating-point number
|
|
||||||
* **Type**: `float`
|
|
||||||
* **Default value**: None
|
|
||||||
* **Required**: *yes*
|
|
||||||
|
|
||||||
**Inputs**
|
* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
|
||||||
|
\f[
|
||||||
|
\hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
|
||||||
|
\f]
|
||||||
|
where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
|
||||||
|
|
||||||
* **1**: `input` - input tensor with data for normalization. At least a 2D tensor of type T, the second dimension represents the channel axis and must have a span of at least 1. **Required.**
|
* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
|
||||||
* **2**: `gamma` - gamma scaling for normalized value. A 1D tensor of type T with the same span as input's channel axis. **Required.**
|
\f[
|
||||||
* **3**: `beta` - bias added to the scaled normalized value. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
\hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
|
||||||
* **4**: `mean` - value for mean normalization. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
\f]
|
||||||
* **5**: `variance` - value for variance normalization. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
|
||||||
|
|
||||||
**Outputs**
|
|
||||||
|
|
||||||
* **1**: The result of normalization. A tensor of the same type and shape with 1st input tensor.
|
|
||||||
|
|
||||||
**Types**
|
|
||||||
|
|
||||||
* *T*: any numeric type.
|
|
||||||
|
|
||||||
**Mathematical Formulation**
|
**Mathematical Formulation**
|
||||||
|
|
||||||
*BatchNormInference* normalizes the output in each hidden layer.
|
Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
|
||||||
|
|
||||||
|
For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
|
||||||
|
|
||||||
* **Input**: Values of \f$x\f$ over a mini-batch:
|
* **Input**: Values of \f$x\f$ over a mini-batch:
|
||||||
\f[
|
\f[
|
||||||
\beta = \{ x_{1...m} \}
|
\mathcal{B} = \{ x_{1...m} \}
|
||||||
\f]
|
\f]
|
||||||
* **Parameters to learn**: \f$ \gamma, \beta\f$
|
* **Parameters to learn**: \f$ \gamma, \beta\f$
|
||||||
* **Output**:
|
* **Output**:
|
||||||
@ -45,22 +39,81 @@
|
|||||||
\f]
|
\f]
|
||||||
* **Mini-batch mean**:
|
* **Mini-batch mean**:
|
||||||
\f[
|
\f[
|
||||||
\mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
|
\mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
|
||||||
\f]
|
\f]
|
||||||
* **Mini-batch variance**:
|
* **Mini-batch variance**:
|
||||||
\f[
|
\f[
|
||||||
\sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\beta} )^{2}
|
\sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
|
||||||
\f]
|
\f]
|
||||||
* **Normalize**:
|
* **Normalize**:
|
||||||
\f[
|
\f[
|
||||||
\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }}
|
\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
|
||||||
\f]
|
\f]
|
||||||
* **Scale and shift**:
|
* **Scale and shift**:
|
||||||
\f[
|
\f[
|
||||||
o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
|
o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Example**
|
**Attributes**:
|
||||||
|
|
||||||
|
* *epsilon*
|
||||||
|
* **Description**: *epsilon* is a constant added to the variance for numerical stability.
|
||||||
|
* **Range of values**: a positive floating-point number
|
||||||
|
* **Type**: `float`
|
||||||
|
* **Default value**: none
|
||||||
|
* **Required**: *yes*
|
||||||
|
|
||||||
|
**Inputs**
|
||||||
|
|
||||||
|
* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
|
||||||
|
* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
|
||||||
|
**Outputs**
|
||||||
|
|
||||||
|
* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
|
||||||
|
|
||||||
|
**Types**
|
||||||
|
|
||||||
|
* *T*: any supported floating-point type.
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
*Example: 2D input tensor `data`*
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<layer ... type="BatchNormInference" ...>
|
||||||
|
<data epsilon="9.99e-06" />
|
||||||
|
<input>
|
||||||
|
<port id="0"> <!-- input -->
|
||||||
|
<dim>10</dim>
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="1"> <!-- gamma -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="2"> <!-- beta -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="3"> <!-- mean -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="4"> <!-- variance -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</input>
|
||||||
|
<output>
|
||||||
|
<port id="5">
|
||||||
|
<dim>10</dim>
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</output>
|
||||||
|
</layer>
|
||||||
|
```
|
||||||
|
|
||||||
|
*Example: 4D input tensor `data`*
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
<layer ... type="BatchNormInference" ...>
|
<layer ... type="BatchNormInference" ...>
|
||||||
|
@ -1,42 +1,36 @@
|
|||||||
## BatchNormInference <a name="BatchNormInference"></a> {#openvino_docs_ops_normalization_BatchNormInference_5}
|
## BatchNormInference <a name="BatchNormInference"></a> {#openvino_docs_ops_normalization_BatchNormInference_5}
|
||||||
|
|
||||||
**Versioned name**: *BatchNormInference-5
|
**Versioned name**: *BatchNormInference-5*
|
||||||
|
|
||||||
**Category**: *Normalization*
|
**Category**: *Normalization*
|
||||||
|
|
||||||
**Short description**: *BatchNormInference* layer normalizes a `input` tensor by `mean` and `variance`, and applies a scale (`gamma`) to it, as well as an offset (`beta`).
|
**Short description**: *BatchNormInference* performs Batch Normalization operation described in the [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167v2) article.
|
||||||
|
|
||||||
**Attributes**:
|
**Detailed Description**
|
||||||
|
|
||||||
* *epsilon*
|
*BatchNormInference* performs the following operations on a given data batch input tensor `data`:
|
||||||
* **Description**: *epsilon* is the number to be added to the variance to avoid division by zero when normalizing a value. For example, *epsilon* equal to 0.001 means that 0.001 is added to the variance.
|
|
||||||
* **Range of values**: a positive floating-point number
|
|
||||||
* **Type**: `float`
|
|
||||||
* **Default value**: None
|
|
||||||
* **Required**: *yes*
|
|
||||||
|
|
||||||
**Inputs**
|
* Normalizes each activation \f$x^{(k)}\f$ by the mean and variance.
|
||||||
|
\f[
|
||||||
|
\hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}}
|
||||||
|
\f]
|
||||||
|
where \f$E[x^{(k)}]\f$ and \f$Var(x^{(k)})\f$ are the mean and variance, calculated per channel axis of `data` input, and correspond to `mean` and `variance` inputs, respectively. Additionally, \f$\epsilon\f$ is a value added to the variance for numerical stability and corresponds to `epsilon` attribute.
|
||||||
|
|
||||||
* **1**: `input` - input tensor with data for normalization. At least a 2D tensor of type T, the second dimension represents the channel axis and must have a span of at least 1. **Required.**
|
* Performs linear transformation of each normalized activation based on `gamma` and `beta` input, representing the scaling factor and shift, respectively.
|
||||||
* **2**: `gamma` - gamma scaling for normalized value. A 1D tensor of type T with the same span as input's channel axis. **Required.**
|
\f[
|
||||||
* **3**: `beta` - bias added to the scaled normalized value. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
\hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)}
|
||||||
* **4**: `mean` - value for mean normalization. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
\f]
|
||||||
* **5**: `variance` - value for variance normalization. A 1D tensor of type T with the same span as input's channel axis.. **Required.**
|
where \f$\gamma^{(k)}\f$ and \f$\beta^{(k)}\f$ are learnable parameters, calculated per channel axis, and correspond to `gamma` and `beta` inputs.
|
||||||
|
|
||||||
**Outputs**
|
|
||||||
|
|
||||||
* **1**: The result of normalization. A tensor of the same type and shape with 1st input tensor.
|
|
||||||
|
|
||||||
**Types**
|
|
||||||
|
|
||||||
* *T*: any numeric type.
|
|
||||||
|
|
||||||
**Mathematical Formulation**
|
**Mathematical Formulation**
|
||||||
|
|
||||||
*BatchNormInference* normalizes the output in each hidden layer.
|
Let `x` be a *d*-dimensional input, \f$x=(x_{1}\dotsc x_{d})\f$. Since normalization is applied to each activation \f$E[x^{(k)}]\f$, you can focus on a particular activation and omit k.
|
||||||
|
|
||||||
|
For a particular activation, consider a mini-batch \f$\mathcal{B}\f$ of m values. *BatchNormInference* performs Batch Normalization algorithm as follows:
|
||||||
|
|
||||||
* **Input**: Values of \f$x\f$ over a mini-batch:
|
* **Input**: Values of \f$x\f$ over a mini-batch:
|
||||||
\f[
|
\f[
|
||||||
\beta = \{ x_{1...m} \}
|
\mathcal{B} = \{ x_{1...m} \}
|
||||||
\f]
|
\f]
|
||||||
* **Parameters to learn**: \f$ \gamma, \beta\f$
|
* **Parameters to learn**: \f$ \gamma, \beta\f$
|
||||||
* **Output**:
|
* **Output**:
|
||||||
@ -45,22 +39,81 @@
|
|||||||
\f]
|
\f]
|
||||||
* **Mini-batch mean**:
|
* **Mini-batch mean**:
|
||||||
\f[
|
\f[
|
||||||
\mu_{\beta} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
|
\mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i}
|
||||||
\f]
|
\f]
|
||||||
* **Mini-batch variance**:
|
* **Mini-batch variance**:
|
||||||
\f[
|
\f[
|
||||||
\sigma_{\beta }^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\beta} )^{2}
|
\sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2}
|
||||||
\f]
|
\f]
|
||||||
* **Normalize**:
|
* **Normalize**:
|
||||||
\f[
|
\f[
|
||||||
\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\beta}}{\sqrt{\sigma_{\beta }^{2} + \epsilon }}
|
\hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }}
|
||||||
\f]
|
\f]
|
||||||
* **Scale and shift**:
|
* **Scale and shift**:
|
||||||
\f[
|
\f[
|
||||||
o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
|
o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} )
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Example**
|
**Attributes**:
|
||||||
|
|
||||||
|
* *epsilon*
|
||||||
|
* **Description**: *epsilon* is a constant added to the variance for numerical stability.
|
||||||
|
* **Range of values**: a positive floating-point number
|
||||||
|
* **Type**: `float`
|
||||||
|
* **Default value**: none
|
||||||
|
* **Required**: *yes*
|
||||||
|
|
||||||
|
**Inputs**
|
||||||
|
|
||||||
|
* **1**: `data` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.**
|
||||||
|
* **2**: `gamma` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **3**: `beta` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **4**: `mean` - Value for mean normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
* **5**: `variance` - Value for variance normalization. A 1D tensor of type *T* with the same span as `data` channel axis. **Required.**
|
||||||
|
|
||||||
|
**Outputs**
|
||||||
|
|
||||||
|
* **1**: The result of element-wise Batch Normalization operation applied to the input tensor `data`. A tensor of type *T* and the same shape as `data` input tensor.
|
||||||
|
|
||||||
|
**Types**
|
||||||
|
|
||||||
|
* *T*: any supported floating-point type.
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
|
*Example: 2D input tensor `data`*
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<layer ... type="BatchNormInference" ...>
|
||||||
|
<data epsilon="9.99e-06" />
|
||||||
|
<input>
|
||||||
|
<port id="0"> <!-- input -->
|
||||||
|
<dim>10</dim>
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="1"> <!-- gamma -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="2"> <!-- beta -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="3"> <!-- mean -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
<port id="4"> <!-- variance -->
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</input>
|
||||||
|
<output>
|
||||||
|
<port id="5">
|
||||||
|
<dim>10</dim>
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</output>
|
||||||
|
</layer>
|
||||||
|
```
|
||||||
|
|
||||||
|
*Example: 4D input tensor `data`*
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
<layer ... type="BatchNormInference" ...>
|
<layer ... type="BatchNormInference" ...>
|
||||||
@ -95,4 +148,3 @@
|
|||||||
</output>
|
</output>
|
||||||
</layer>
|
</layer>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user