# BatchNormInference {#openvino_docs_ops_normalization_BatchNormInference_5} @sphinxdirective **Versioned name**: *BatchNormInference-5* **Category**: *Normalization* **Short description**: *BatchNormInference* performs Batch Normalization operation described in the `Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift `__ article. **Detailed Description** *BatchNormInference* performs the following operations on a given data batch input tensor ``data``: * Normalizes each activation :math:`x^{(k)}` by the mean and variance. .. math:: \hat{x}^{(k)}=\frac{x^{(k)} - E[x^{(k)}]}{\sqrt{Var(x^{(k)}) + \epsilon}} where :math:`E[x^{(k)}]` and :math:`Var(x^{(k)})` are the mean and variance, calculated per channel axis of ``data`` input, and correspond to ``mean`` and ``variance`` inputs, respectively. Additionally, :math:`\epsilon` is a value added to the variance for numerical stability and corresponds to ``epsilon`` attribute. * Performs linear transformation of each normalized activation based on ``gamma`` and ``beta`` input, representing the scaling factor and shift, respectively. .. math:: \hat{y}^{(k)}=\gamma^{(k)}\hat{x}^{(k)} + \beta^{(k)} where :math:`\gamma^{(k)}` and :math:`\beta^{(k)}` are learnable parameters, calculated per channel axis, and correspond to ``gamma`` and ``beta`` inputs. **Mathematical Formulation** Let ``x`` be a *d*-dimensional input, :math:`x=(x_{1}\dotsc x_{d})`. Since normalization is applied to each activation :math:`E[x^{(k)}]`, you can focus on a particular activation and omit k. For a particular activation, consider a mini-batch :math:`\mathcal{B}` of m values. *BatchNormInference* performs Batch Normalization algorithm as follows: * **Input**: Values of :math:`x` over a mini-batch: .. math:: \mathcal{B} = {x_{1...m}} * **Parameters to learn**: :math:`\gamma, \beta` * **Output**: .. math:: {o_{i} = BN_{\gamma, \beta} ( b_{i} )} * **Mini-batch mean**: .. math:: \mu_{\mathcal{B}} \leftarrow \frac{1}{m}\sum_{i=1}^{m}b_{i} * **Mini-batch variance**: .. math:: \sigma_{\mathcal{B}}^{2}\leftarrow \frac{1}{m}\sum_{i=1}^{m} ( b_{i} - \mu_{\mathcal{B}})^{2} * **Normalize**: .. math:: \hat{b_{i}} \leftarrow \frac{b_{i} - \mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2} + \epsilon }} * **Scale and shift**: .. math:: o_{i} \leftarrow \gamma\hat{b_{i}} + \beta = BN_{\gamma ,\beta } ( b_{i} ) **Attributes**: * *epsilon* * **Description**: *epsilon* is a constant added to the variance for numerical stability. * **Range of values**: a floating-point number greater than or equal to zero * **Type**: ``float`` * **Required**: *yes* **Inputs** * **1**: ``data`` - A tensor of type *T* and at least rank 2. The second dimension represents the channel axis and must have a span of at least 1. **Required.** * **2**: ``gamma`` - Scaling factor for normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.** * **3**: ``beta`` - Bias added to the scaled normalized value. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.** * **4**: ``mean`` - Value for mean normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.** * **5**: ``variance`` - Value for variance normalization. A 1D tensor of type *T* with the same span as ``data`` channel axis. **Required.** **Outputs** * **1**: The result of element-wise Batch Normalization operation applied to the input tensor ``data``. A tensor of type *T* and the same shape as ``data`` input tensor. **Types** * *T*: any supported floating-point type. **Examples** Example: 2D input tensor ``data`` .. code-block:: cpp < !-- input --> 10 128 < !-- gamma --> 128 < !-- beta --> 128 < !-- mean --> 128 < !-- variance --> 128 10 128 Example: 4D input tensor ``data`` .. code-block:: cpp < !-- input --> 1 3 224 224 < !-- gamma --> 3 < !-- beta --> 3 < !-- mean --> 3 < !-- variance --> 3 1 3 224 224 @endsphinxdirective