diff --git a/docs/ops/activation/Clamp_1.md b/docs/ops/activation/Clamp_1.md
index d168ae8ce57..bc6b7edd3c9 100644
--- a/docs/ops/activation/Clamp_1.md
+++ b/docs/ops/activation/Clamp_1.md
@@ -15,7 +15,7 @@
 
 Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
 \f[
-clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big)
+clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big)
 \f]
 
 **Attributes**:
diff --git a/docs/ops/activation/GELU_2.md b/docs/ops/activation/GELU_2.md
index c61905191a4..3d2adaa14de 100644
--- a/docs/ops/activation/GELU_2.md
+++ b/docs/ops/activation/GELU_2.md
@@ -12,7 +12,7 @@
 It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
 
 \f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
+    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
 \f]
 
 where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.
diff --git a/docs/ops/activation/GELU_7.md b/docs/ops/activation/GELU_7.md
index 44f182a9ab3..f11a4813a07 100644
--- a/docs/ops/activation/GELU_7.md
+++ b/docs/ops/activation/GELU_7.md
@@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim
 For `erf` approximation mode, *Gelu* function is represented as:
 
 \f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
+    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
 \f]
 
 For `tanh` approximation mode, *Gelu* function is represented as:
 
 \f[
-    Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
+    Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
 \f]
 
 **Attributes**
diff --git a/docs/ops/activation/HSigmoid_5.md b/docs/ops/activation/HSigmoid_5.md
index 2470ccb00da..367327a4f85 100644
--- a/docs/ops/activation/HSigmoid_5.md
+++ b/docs/ops/activation/HSigmoid_5.md
@@ -10,7 +10,7 @@
 element in the output tensor with the following formula:
 
 \f[
-HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6}
+HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6}
 \f]
 
 The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
diff --git a/docs/ops/activation/HSwish_4.md b/docs/ops/activation/HSwish_4.md
index a9ae8168a1d..3f27517a44b 100644
--- a/docs/ops/activation/HSwish_4.md
+++ b/docs/ops/activation/HSwish_4.md
@@ -10,7 +10,7 @@
 element in the output tensor with the following formula:
 
 \f[
-HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6}
+HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6}
 \f]
 
 The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
diff --git a/docs/ops/activation/HardSigmoid_1.md b/docs/ops/activation/HardSigmoid_1.md
index 03c5c11606e..8403ca8a1ec 100644
--- a/docs/ops/activation/HardSigmoid_1.md
+++ b/docs/ops/activation/HardSigmoid_1.md
@@ -12,10 +12,13 @@
 
 For each element from the input tensor calculates corresponding
  element in the output tensor with the following formula:
+
  \f[
- y = max(0, min(1, alpha * x + beta))
+ y = max(0,\ min(1,\ \alpha x + \beta))
  \f]
 
+ where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input.
+
 **Inputs**
 
 * **1**: An tensor of type *T*. **Required.**
diff --git a/docs/ops/activation/LogSoftmax_5.md b/docs/ops/activation/LogSoftmax_5.md
index 60035120417..d26488fa968 100644
--- a/docs/ops/activation/LogSoftmax_5.md
+++ b/docs/ops/activation/LogSoftmax_5.md
@@ -8,8 +8,8 @@
 
 **Note**:  This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
 \f[
-t = (x - ReduceMax(x, axis)) \\
-LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis))
+t = (x - ReduceMax(x,\ axis)) \\
+LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis))
 \f]
 
 **Attributes**
diff --git a/docs/ops/activation/ReLU_1.md b/docs/ops/activation/ReLU_1.md
index b3edf994e01..5b401dbc908 100644
--- a/docs/ops/activation/ReLU_1.md
+++ b/docs/ops/activation/ReLU_1.md
@@ -15,7 +15,7 @@
 For each element from the input tensor calculates corresponding
  element in the output tensor with the following formula:
  \f[
- Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )})
+ Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )})
  \f]
 
 **Inputs**:
diff --git a/docs/ops/arithmetic/Abs_1.md b/docs/ops/arithmetic/Abs_1.md
index 426daee3806..1dc73dee933 100644
--- a/docs/ops/arithmetic/Abs_1.md
+++ b/docs/ops/arithmetic/Abs_1.md
@@ -25,7 +25,7 @@
 *Abs* does the following with the input tensor *a*:
 
 \f[
-a_{i} = abs(a_{i})
+a_{i} = \vert a_{i} \vert
 \f]
 
 **Examples**
diff --git a/docs/ops/arithmetic/Ceiling_1.md b/docs/ops/arithmetic/Ceiling_1.md
index 4d4cfeb9450..e091824c96d 100644
--- a/docs/ops/arithmetic/Ceiling_1.md
+++ b/docs/ops/arithmetic/Ceiling_1.md
@@ -10,7 +10,7 @@
 element in the output tensor with the following formula:
 
 \f[
-a_{i} = ceiling(a_{i})
+a_{i} = \lceil a_{i} \rceil
 \f]
 
 **Attributes**: *Ceiling* operation has no attributes.
diff --git a/docs/ops/arithmetic/Divide_1.md b/docs/ops/arithmetic/Divide_1.md
index b16198a05ad..b69a07454a1 100644
--- a/docs/ops/arithmetic/Divide_1.md
+++ b/docs/ops/arithmetic/Divide_1.md
@@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:
 
 \f[
-o_{i} = a_{i} / b_{i}
+o_{i} = \frac{a_{i}}{b_{i}}
 \f]
 
 The result of division by zero is undefined.
diff --git a/docs/ops/arithmetic/FloorMod_1.md b/docs/ops/arithmetic/FloorMod_1.md
index 27c77ade3fa..c573dee8304 100644
--- a/docs/ops/arithmetic/FloorMod_1.md
+++ b/docs/ops/arithmetic/FloorMod_1.md
@@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
 
 \f[
-o_{i} = a_{i} % b_{i}
+o_{i} = a_{i} \mod b_{i}
 \f]
 
 *FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.
diff --git a/docs/ops/arithmetic/Floor_1.md b/docs/ops/arithmetic/Floor_1.md
index 910ce43d590..06690f06df8 100644
--- a/docs/ops/arithmetic/Floor_1.md
+++ b/docs/ops/arithmetic/Floor_1.md
@@ -10,7 +10,7 @@
 element in the output tensor with the following formula:
 
 \f[
-a_{i} = floor(a_{i})
+a_{i} = \lfloor a_{i} \rfloor
 \f]
 
 **Attributes**: *Floor* operation has no attributes.
diff --git a/docs/ops/arithmetic/Maximum_1.md b/docs/ops/arithmetic/Maximum_1.md
index d16db0e0d77..18eb0e757b9 100644
--- a/docs/ops/arithmetic/Maximum_1.md
+++ b/docs/ops/arithmetic/Maximum_1.md
@@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ
 After broadcasting *Maximum* does the following with the input tensors *a* and *b*:
 
 \f[
-o_{i} = max(a_{i}, b_{i})
+o_{i} = max(a_{i},\ b_{i})
 \f]
 
 **Attributes**:
diff --git a/docs/ops/arithmetic/Minimum_1.md b/docs/ops/arithmetic/Minimum_1.md
index 69d5e8d85ef..30204e136dc 100644
--- a/docs/ops/arithmetic/Minimum_1.md
+++ b/docs/ops/arithmetic/Minimum_1.md
@@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
 
 \f[
-o_{i} = min(a_{i}, b_{i})
+o_{i} = min(a_{i},\ b_{i})
 \f]
 
 **Attributes**:
diff --git a/docs/ops/arithmetic/Mod_1.md b/docs/ops/arithmetic/Mod_1.md
index 7daf20d565c..df414c0f4fe 100644
--- a/docs/ops/arithmetic/Mod_1.md
+++ b/docs/ops/arithmetic/Mod_1.md
@@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
 
 \f[
-o_{i} = a_{i} % b_{i}
+o_{i} = a_{i} \mod b_{i}
 \f]
 
 *Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.
diff --git a/docs/ops/arithmetic/Multiply_1.md b/docs/ops/arithmetic/Multiply_1.md
index 6b8273922f5..a713c9c0eac 100644
--- a/docs/ops/arithmetic/Multiply_1.md
+++ b/docs/ops/arithmetic/Multiply_1.md
@@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:
 
 \f[
-o_{i} = a_{i} * b_{i}
+o_{i} = a_{i} \cdot b_{i}
 \f]
 
 **Attributes**:
diff --git a/docs/ops/comparison/GreaterEqual_1.md b/docs/ops/comparison/GreaterEqual_1.md
index 5acf4cbe6d6..f4a29c667fe 100644
--- a/docs/ops/comparison/GreaterEqual_1.md
+++ b/docs/ops/comparison/GreaterEqual_1.md
@@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:
 
 \f[
-o_{i} = a_{i} >= b_{i}
+o_{i} = a_{i} \geq b_{i}
 \f]
 
 **Examples**
diff --git a/docs/ops/comparison/LessEqual_1.md b/docs/ops/comparison/LessEqual_1.md
index a8b7c810181..bb7eed13793 100644
--- a/docs/ops/comparison/LessEqual_1.md
+++ b/docs/ops/comparison/LessEqual_1.md
@@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:
 
 \f[
-o_{i} = a_{i} <= b_{i}
+o_{i} = a_{i} \leq b_{i}
 \f]
 
 **Attributes**:
diff --git a/docs/ops/comparison/NotEqual_1.md b/docs/ops/comparison/NotEqual_1.md
index 456aeb7a785..448f4bcb66a 100644
--- a/docs/ops/comparison/NotEqual_1.md
+++ b/docs/ops/comparison/NotEqual_1.md
@@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:
 
 \f[
-o_{i} = a_{i} != b_{i}
+o_{i} = a_{i} \neq b_{i}
 \f]
 
 **Examples**
diff --git a/docs/ops/convolution/Convolution_1.md b/docs/ops/convolution/Convolution_1.md
index e77967e4130..431575b99c3 100644
--- a/docs/ops/convolution/Convolution_1.md
+++ b/docs/ops/convolution/Convolution_1.md
@@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
 The receptive field in each layer is calculated using the formulas:
 *   Jump in the output feature map:
   \f[
-  j_{out} = j_{in} * s
+  j_{out} = j_{in} \cdot s
   \f]
 *   Size of the receptive field of output feature:
   \f[
-  r_{out} = r_{in} + ( k - 1 ) * j_{in}
+  r_{out} = r_{in} + ( k - 1 ) \cdot j_{in}
   \f]
 *   Center position of the receptive field of the first output feature:
   \f[
-  start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in}
+  start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in}
   \f]
 *   Output is calculated using the following formula:
   \f[
diff --git a/docs/ops/convolution/DeformableConvolution_1.md b/docs/ops/convolution/DeformableConvolution_1.md
index 77140cb30c7..6c73e202be5 100644
--- a/docs/ops/convolution/DeformableConvolution_1.md
+++ b/docs/ops/convolution/DeformableConvolution_1.md
@@ -12,7 +12,7 @@ Output is calculated using the following formula:
 
   \f[
 
-  y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k})
+  y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k})
 
   \f]
 
diff --git a/docs/ops/convolution/DeformableConvolution_8.md b/docs/ops/convolution/DeformableConvolution_8.md
index 0474a71193d..fc7c05a235c 100644
--- a/docs/ops/convolution/DeformableConvolution_8.md
+++ b/docs/ops/convolution/DeformableConvolution_8.md
@@ -14,7 +14,7 @@ Output is calculated using the following formula:
 
   \f[
 
-  y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k}
+  y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k}
 
   \f]
 Where
diff --git a/docs/ops/logical/LogicalNot_1.md b/docs/ops/logical/LogicalNot_1.md
index 9dd9132383f..97c41ddb14c 100644
--- a/docs/ops/logical/LogicalNot_1.md
+++ b/docs/ops/logical/LogicalNot_1.md
@@ -25,7 +25,7 @@
 *LogicalNot* does the following with the input tensor *a*:
 
 \f[
-a_{i} = not(a_{i})
+a_{i} = \lnot a_{i}
 \f]
 
 **Examples**
diff --git a/docs/ops/logical/LogicalXor_1.md b/docs/ops/logical/LogicalXor_1.md
index 61bfa9bc25c..16072f01183 100644
--- a/docs/ops/logical/LogicalXor_1.md
+++ b/docs/ops/logical/LogicalXor_1.md
@@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i
 After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:
 
 \f[
-o_{i} = a_{i} xor b_{i}
+o_{i} = a_{i} \oplus b_{i}
 \f]
 
 **Examples**
diff --git a/docs/ops/pooling/AdaptiveAvgPool_8.md b/docs/ops/pooling/AdaptiveAvgPool_8.md
index cff1e91e92c..3c6193045ca 100644
--- a/docs/ops/pooling/AdaptiveAvgPool_8.md
+++ b/docs/ops/pooling/AdaptiveAvgPool_8.md
@@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
 
 \f[
 \begin{array}{lcl}
-d_{start} &=& floor(i*D_{in}/D_{out})\\
-d_{end}   &=& ceil((i+1)*D_{in}/D_{out})\\
-h_{start} &=& floor(j*H_{in}/H_{out})\\
-h_{end}   &=& ceil((j+1)*H_{in}/H_{out})\\
-w_{start} &=& floor(k*W_{in}/W_{out})\\
-w_{end}   &=& ceil((k+1)*W_{in}/W_{out})
+d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
+d_{end}   &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
+h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
+h_{end}   &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
+w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
+w_{end}   &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
 \end{array}
 \f]
 
 The output is calculated with the following formula:
 
 \f[
-Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})}
+Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})}
 \f]
 
 **Inputs**:
diff --git a/docs/ops/pooling/AdaptiveMaxPool_8.md b/docs/ops/pooling/AdaptiveMaxPool_8.md
index a86c3f67ac0..c34629351b8 100644
--- a/docs/ops/pooling/AdaptiveMaxPool_8.md
+++ b/docs/ops/pooling/AdaptiveMaxPool_8.md
@@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
 
 \f[
 \begin{array}{lcl}
-d_{start} &=& floor(i*D_{in}/D_{out})\\
-d_{end}   &=& ceil((i+1)*D_{in}/D_{out})\\
-h_{start} &=& floor(j*H_{in}/H_{out})\\
-h_{end}   &=& ceil((j+1)*H_{in}/H_{out})\\
-w_{start} &=& floor(k*W_{in}/W_{out})\\
-w_{end}   &=& ceil((k+1)*W_{in}/W_{out})
+d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
+d_{end}   &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
+h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
+h_{end}   &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
+w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
+w_{end}   &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
 \end{array}
 \f]