Beautify operator specifications (#6958)

* beautify operator specifications * further update ops specs * update FloorMod spec * update adaptive pool spec * update HSwish spec * bringg back old erf version
2021-08-12 12:11:30 +02:00 · 2021-08-12 12:11:30 +02:00 · 273c7188a4
commit 273c7188a4
parent f26ecdd53f
27 changed files with 45 additions and 42 deletions
--- a/docs/ops/activation/Clamp_1.md
+++ b/docs/ops/activation/Clamp_1.md
@ -15,7 +15,7 @@

 Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
 \f[
-clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big)
+clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big)
 \f]

 **Attributes**:
--- a/docs/ops/activation/GELU_2.md
+++ b/docs/ops/activation/GELU_2.md
@ -12,7 +12,7 @@
 It performs element-wise activation function on a given input tensor, based on the following mathematical formula:

 \f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
+    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
 \f]

 where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.
--- a/docs/ops/activation/GELU_7.md
+++ b/docs/ops/activation/GELU_7.md
@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim
 For `erf` approximation mode, *Gelu* function is represented as:

 \f[
-    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
+    Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
 \f]

 For `tanh` approximation mode, *Gelu* function is represented as:

 \f[
-    Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
+    Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
 \f]

 **Attributes**
--- a/docs/ops/activation/HSigmoid_5.md
+++ b/docs/ops/activation/HSigmoid_5.md
@ -10,7 +10,7 @@
 element in the output tensor with the following formula:

 \f[
-HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6}
+HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6}
 \f]

 The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
--- a/docs/ops/activation/HSwish_4.md
+++ b/docs/ops/activation/HSwish_4.md
@ -10,7 +10,7 @@
 element in the output tensor with the following formula:

 \f[
-HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6}
+HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6}
 \f]

 The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
--- a/docs/ops/activation/HardSigmoid_1.md
+++ b/docs/ops/activation/HardSigmoid_1.md
@ -12,10 +12,13 @@

 For each element from the input tensor calculates corresponding
 element in the output tensor with the following formula:
+
 \f[
- y = max(0, min(1, alpha * x + beta))
+ y = max(0,\ min(1,\ \alpha x + \beta))
 \f]

+ where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input.
+
 **Inputs**

 * **1**: An tensor of type *T*. **Required.**
--- a/docs/ops/activation/LogSoftmax_5.md
+++ b/docs/ops/activation/LogSoftmax_5.md
@ -8,8 +8,8 @@

 **Note**:  This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
 \f[
-t = (x - ReduceMax(x, axis)) \\
-LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis))
+t = (x - ReduceMax(x,\ axis)) \\
+LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis))
 \f]

 **Attributes**
--- a/docs/ops/activation/ReLU_1.md
+++ b/docs/ops/activation/ReLU_1.md
@ -15,7 +15,7 @@
 For each element from the input tensor calculates corresponding
 element in the output tensor with the following formula:
 \f[
- Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )})
+ Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )})
 \f]

 **Inputs**:
--- a/docs/ops/arithmetic/Abs_1.md
+++ b/docs/ops/arithmetic/Abs_1.md
@ -25,7 +25,7 @@
 *Abs* does the following with the input tensor *a*:

 \f[
-a_{i} = abs(a_{i})
+a_{i} = \vert a_{i} \vert
 \f]

 **Examples**
--- a/docs/ops/arithmetic/Ceiling_1.md
+++ b/docs/ops/arithmetic/Ceiling_1.md
@ -10,7 +10,7 @@
 element in the output tensor with the following formula:

 \f[
-a_{i} = ceiling(a_{i})
+a_{i} = \lceil a_{i} \rceil
 \f]

 **Attributes**: *Ceiling* operation has no attributes.
--- a/docs/ops/arithmetic/Divide_1.md
+++ b/docs/ops/arithmetic/Divide_1.md
@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:

 \f[
-o_{i} = a_{i} / b_{i}
+o_{i} = \frac{a_{i}}{b_{i}}
 \f]

 The result of division by zero is undefined.
--- a/docs/ops/arithmetic/FloorMod_1.md
+++ b/docs/ops/arithmetic/FloorMod_1.md
@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:

 \f[
-o_{i} = a_{i} % b_{i}
+o_{i} = a_{i} \mod b_{i}
 \f]

 *FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.
--- a/docs/ops/arithmetic/Floor_1.md
+++ b/docs/ops/arithmetic/Floor_1.md
@ -10,7 +10,7 @@
 element in the output tensor with the following formula:

 \f[
-a_{i} = floor(a_{i})
+a_{i} = \lfloor a_{i} \rfloor
 \f]

 **Attributes**: *Floor* operation has no attributes.
--- a/docs/ops/arithmetic/Maximum_1.md
+++ b/docs/ops/arithmetic/Maximum_1.md
@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ
 After broadcasting *Maximum* does the following with the input tensors *a* and *b*:

 \f[
-o_{i} = max(a_{i}, b_{i})
+o_{i} = max(a_{i},\ b_{i})
 \f]

 **Attributes**:
--- a/docs/ops/arithmetic/Minimum_1.md
+++ b/docs/ops/arithmetic/Minimum_1.md
@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:

 \f[
-o_{i} = min(a_{i}, b_{i})
+o_{i} = min(a_{i},\ b_{i})
 \f]

 **Attributes**:
--- a/docs/ops/arithmetic/Mod_1.md
+++ b/docs/ops/arithmetic/Mod_1.md
@ -10,7 +10,7 @@
 As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:

 \f[
-o_{i} = a_{i} % b_{i}
+o_{i} = a_{i} \mod b_{i}
 \f]

 *Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.
--- a/docs/ops/arithmetic/Multiply_1.md
+++ b/docs/ops/arithmetic/Multiply_1.md
@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:

 \f[
-o_{i} = a_{i} * b_{i}
+o_{i} = a_{i} \cdot b_{i}
 \f]

 **Attributes**:
--- a/docs/ops/comparison/GreaterEqual_1.md
+++ b/docs/ops/comparison/GreaterEqual_1.md
@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:

 \f[
-o_{i} = a_{i} >= b_{i}
+o_{i} = a_{i} \geq b_{i}
 \f]

 **Examples**
--- a/docs/ops/comparison/LessEqual_1.md
+++ b/docs/ops/comparison/LessEqual_1.md
@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:

 \f[
-o_{i} = a_{i} <= b_{i}
+o_{i} = a_{i} \leq b_{i}
 \f]

 **Attributes**:
--- a/docs/ops/comparison/NotEqual_1.md
+++ b/docs/ops/comparison/NotEqual_1.md
@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
 After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:

 \f[
-o_{i} = a_{i} != b_{i}
+o_{i} = a_{i} \neq b_{i}
 \f]

 **Examples**
--- a/docs/ops/convolution/Convolution_1.md
+++ b/docs/ops/convolution/Convolution_1.md
@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
 The receptive field in each layer is calculated using the formulas:
 *   Jump in the output feature map:
  \f[
-  j_{out} = j_{in} * s
+  j_{out} = j_{in} \cdot s
  \f]
 *   Size of the receptive field of output feature:
  \f[
-  r_{out} = r_{in} + ( k - 1 ) * j_{in}
+  r_{out} = r_{in} + ( k - 1 ) \cdot j_{in}
  \f]
 *   Center position of the receptive field of the first output feature:
  \f[
-  start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in}
+  start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in}
  \f]
 *   Output is calculated using the following formula:
  \f[
--- a/docs/ops/convolution/DeformableConvolution_1.md
+++ b/docs/ops/convolution/DeformableConvolution_1.md
@ -12,7 +12,7 @@ Output is calculated using the following formula:

  \f[

-  y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k})
+  y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k})

  \f]

--- a/docs/ops/convolution/DeformableConvolution_8.md
+++ b/docs/ops/convolution/DeformableConvolution_8.md
@ -14,7 +14,7 @@ Output is calculated using the following formula:

  \f[

-  y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k}
+  y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k}

  \f]
 Where
--- a/docs/ops/logical/LogicalNot_1.md
+++ b/docs/ops/logical/LogicalNot_1.md
@ -25,7 +25,7 @@
 *LogicalNot* does the following with the input tensor *a*:

 \f[
-a_{i} = not(a_{i})
+a_{i} = \lnot a_{i}
 \f]

 **Examples**
--- a/docs/ops/logical/LogicalXor_1.md
+++ b/docs/ops/logical/LogicalXor_1.md
@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i
 After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:

 \f[
-o_{i} = a_{i} xor b_{i}
+o_{i} = a_{i} \oplus b_{i}
 \f]

 **Examples**
--- a/docs/ops/pooling/AdaptiveAvgPool_8.md
+++ b/docs/ops/pooling/AdaptiveAvgPool_8.md
@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW

 \f[
 \begin{array}{lcl}
-d_{start} &=& floor(i*D_{in}/D_{out})\\
-d_{end}   &=& ceil((i+1)*D_{in}/D_{out})\\
-h_{start} &=& floor(j*H_{in}/H_{out})\\
-h_{end}   &=& ceil((j+1)*H_{in}/H_{out})\\
-w_{start} &=& floor(k*W_{in}/W_{out})\\
-w_{end}   &=& ceil((k+1)*W_{in}/W_{out})
+d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
+d_{end}   &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
+h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
+h_{end}   &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
+w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
+w_{end}   &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
 \end{array}
 \f]

 The output is calculated with the following formula:

 \f[
-Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})}
+Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})}
 \f]

 **Inputs**:
--- a/docs/ops/pooling/AdaptiveMaxPool_8.md
+++ b/docs/ops/pooling/AdaptiveMaxPool_8.md
@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW

 \f[
 \begin{array}{lcl}
-d_{start} &=& floor(i*D_{in}/D_{out})\\
-d_{end}   &=& ceil((i+1)*D_{in}/D_{out})\\
-h_{start} &=& floor(j*H_{in}/H_{out})\\
-h_{end}   &=& ceil((j+1)*H_{in}/H_{out})\\
-w_{start} &=& floor(k*W_{in}/W_{out})\\
-w_{end}   &=& ceil((k+1)*W_{in}/W_{out})
+d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
+d_{end}   &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
+h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
+h_{end}   &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
+w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
+w_{end}   &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
 \end{array}
 \f]