diff --git a/docs/ops/activation/Clamp_1.md b/docs/ops/activation/Clamp_1.md index d168ae8ce57..bc6b7edd3c9 100644 --- a/docs/ops/activation/Clamp_1.md +++ b/docs/ops/activation/Clamp_1.md @@ -15,7 +15,7 @@ Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows: \f[ -clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big) +clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big) \f] **Attributes**: diff --git a/docs/ops/activation/GELU_2.md b/docs/ops/activation/GELU_2.md index c61905191a4..3d2adaa14de 100644 --- a/docs/ops/activation/GELU_2.md +++ b/docs/ops/activation/GELU_2.md @@ -12,7 +12,7 @@ It performs element-wise activation function on a given input tensor, based on the following mathematical formula: \f[ - Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right] + Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right] \f] where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution. diff --git a/docs/ops/activation/GELU_7.md b/docs/ops/activation/GELU_7.md index 44f182a9ab3..f11a4813a07 100644 --- a/docs/ops/activation/GELU_7.md +++ b/docs/ops/activation/GELU_7.md @@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim For `erf` approximation mode, *Gelu* function is represented as: \f[ - Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right] + Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right] \f] For `tanh` approximation mode, *Gelu* function is represented as: \f[ - Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right) + Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right) \f] **Attributes** diff --git a/docs/ops/activation/HSigmoid_5.md b/docs/ops/activation/HSigmoid_5.md index 2470ccb00da..367327a4f85 100644 --- a/docs/ops/activation/HSigmoid_5.md +++ b/docs/ops/activation/HSigmoid_5.md @@ -10,7 +10,7 @@ element in the output tensor with the following formula: \f[ -HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6} +HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6} \f] The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf). diff --git a/docs/ops/activation/HSwish_4.md b/docs/ops/activation/HSwish_4.md index a9ae8168a1d..3f27517a44b 100644 --- a/docs/ops/activation/HSwish_4.md +++ b/docs/ops/activation/HSwish_4.md @@ -10,7 +10,7 @@ element in the output tensor with the following formula: \f[ -HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6} +HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6} \f] The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf). diff --git a/docs/ops/activation/HardSigmoid_1.md b/docs/ops/activation/HardSigmoid_1.md index 03c5c11606e..8403ca8a1ec 100644 --- a/docs/ops/activation/HardSigmoid_1.md +++ b/docs/ops/activation/HardSigmoid_1.md @@ -12,10 +12,13 @@ For each element from the input tensor calculates corresponding element in the output tensor with the following formula: + \f[ - y = max(0, min(1, alpha * x + beta)) + y = max(0,\ min(1,\ \alpha x + \beta)) \f] + where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input. + **Inputs** * **1**: An tensor of type *T*. **Required.** diff --git a/docs/ops/activation/LogSoftmax_5.md b/docs/ops/activation/LogSoftmax_5.md index 60035120417..d26488fa968 100644 --- a/docs/ops/activation/LogSoftmax_5.md +++ b/docs/ops/activation/LogSoftmax_5.md @@ -8,8 +8,8 @@ **Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as: \f[ -t = (x - ReduceMax(x, axis)) \\ -LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis)) +t = (x - ReduceMax(x,\ axis)) \\ +LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis)) \f] **Attributes** diff --git a/docs/ops/activation/ReLU_1.md b/docs/ops/activation/ReLU_1.md index b3edf994e01..5b401dbc908 100644 --- a/docs/ops/activation/ReLU_1.md +++ b/docs/ops/activation/ReLU_1.md @@ -15,7 +15,7 @@ For each element from the input tensor calculates corresponding element in the output tensor with the following formula: \f[ - Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) + Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )}) \f] **Inputs**: diff --git a/docs/ops/arithmetic/Abs_1.md b/docs/ops/arithmetic/Abs_1.md index 426daee3806..1dc73dee933 100644 --- a/docs/ops/arithmetic/Abs_1.md +++ b/docs/ops/arithmetic/Abs_1.md @@ -25,7 +25,7 @@ *Abs* does the following with the input tensor *a*: \f[ -a_{i} = abs(a_{i}) +a_{i} = \vert a_{i} \vert \f] **Examples** diff --git a/docs/ops/arithmetic/Ceiling_1.md b/docs/ops/arithmetic/Ceiling_1.md index 4d4cfeb9450..e091824c96d 100644 --- a/docs/ops/arithmetic/Ceiling_1.md +++ b/docs/ops/arithmetic/Ceiling_1.md @@ -10,7 +10,7 @@ element in the output tensor with the following formula: \f[ -a_{i} = ceiling(a_{i}) +a_{i} = \lceil a_{i} \rceil \f] **Attributes**: *Ceiling* operation has no attributes. diff --git a/docs/ops/arithmetic/Divide_1.md b/docs/ops/arithmetic/Divide_1.md index b16198a05ad..b69a07454a1 100644 --- a/docs/ops/arithmetic/Divide_1.md +++ b/docs/ops/arithmetic/Divide_1.md @@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below: \f[ -o_{i} = a_{i} / b_{i} +o_{i} = \frac{a_{i}}{b_{i}} \f] The result of division by zero is undefined. diff --git a/docs/ops/arithmetic/FloorMod_1.md b/docs/ops/arithmetic/FloorMod_1.md index 27c77ade3fa..c573dee8304 100644 --- a/docs/ops/arithmetic/FloorMod_1.md +++ b/docs/ops/arithmetic/FloorMod_1.md @@ -10,7 +10,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: \f[ -o_{i} = a_{i} % b_{i} +o_{i} = a_{i} \mod b_{i} \f] *FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined. diff --git a/docs/ops/arithmetic/Floor_1.md b/docs/ops/arithmetic/Floor_1.md index 910ce43d590..06690f06df8 100644 --- a/docs/ops/arithmetic/Floor_1.md +++ b/docs/ops/arithmetic/Floor_1.md @@ -10,7 +10,7 @@ element in the output tensor with the following formula: \f[ -a_{i} = floor(a_{i}) +a_{i} = \lfloor a_{i} \rfloor \f] **Attributes**: *Floor* operation has no attributes. diff --git a/docs/ops/arithmetic/Maximum_1.md b/docs/ops/arithmetic/Maximum_1.md index d16db0e0d77..18eb0e757b9 100644 --- a/docs/ops/arithmetic/Maximum_1.md +++ b/docs/ops/arithmetic/Maximum_1.md @@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ After broadcasting *Maximum* does the following with the input tensors *a* and *b*: \f[ -o_{i} = max(a_{i}, b_{i}) +o_{i} = max(a_{i},\ b_{i}) \f] **Attributes**: diff --git a/docs/ops/arithmetic/Minimum_1.md b/docs/ops/arithmetic/Minimum_1.md index 69d5e8d85ef..30204e136dc 100644 --- a/docs/ops/arithmetic/Minimum_1.md +++ b/docs/ops/arithmetic/Minimum_1.md @@ -10,7 +10,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: \f[ -o_{i} = min(a_{i}, b_{i}) +o_{i} = min(a_{i},\ b_{i}) \f] **Attributes**: diff --git a/docs/ops/arithmetic/Mod_1.md b/docs/ops/arithmetic/Mod_1.md index 7daf20d565c..df414c0f4fe 100644 --- a/docs/ops/arithmetic/Mod_1.md +++ b/docs/ops/arithmetic/Mod_1.md @@ -10,7 +10,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: \f[ -o_{i} = a_{i} % b_{i} +o_{i} = a_{i} \mod b_{i} \f] *Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined. diff --git a/docs/ops/arithmetic/Multiply_1.md b/docs/ops/arithmetic/Multiply_1.md index 6b8273922f5..a713c9c0eac 100644 --- a/docs/ops/arithmetic/Multiply_1.md +++ b/docs/ops/arithmetic/Multiply_1.md @@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below: \f[ -o_{i} = a_{i} * b_{i} +o_{i} = a_{i} \cdot b_{i} \f] **Attributes**: diff --git a/docs/ops/comparison/GreaterEqual_1.md b/docs/ops/comparison/GreaterEqual_1.md index 5acf4cbe6d6..f4a29c667fe 100644 --- a/docs/ops/comparison/GreaterEqual_1.md +++ b/docs/ops/comparison/GreaterEqual_1.md @@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*: \f[ -o_{i} = a_{i} >= b_{i} +o_{i} = a_{i} \geq b_{i} \f] **Examples** diff --git a/docs/ops/comparison/LessEqual_1.md b/docs/ops/comparison/LessEqual_1.md index a8b7c810181..bb7eed13793 100644 --- a/docs/ops/comparison/LessEqual_1.md +++ b/docs/ops/comparison/LessEqual_1.md @@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste After broadcasting *LessEqual* does the following with the input tensors *a* and *b*: \f[ -o_{i} = a_{i} <= b_{i} +o_{i} = a_{i} \leq b_{i} \f] **Attributes**: diff --git a/docs/ops/comparison/NotEqual_1.md b/docs/ops/comparison/NotEqual_1.md index 456aeb7a785..448f4bcb66a 100644 --- a/docs/ops/comparison/NotEqual_1.md +++ b/docs/ops/comparison/NotEqual_1.md @@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste After broadcasting *NotEqual* does the following with the input tensors *a* and *b*: \f[ -o_{i} = a_{i} != b_{i} +o_{i} = a_{i} \neq b_{i} \f] **Examples** diff --git a/docs/ops/convolution/Convolution_1.md b/docs/ops/convolution/Convolution_1.md index e77967e4130..431575b99c3 100644 --- a/docs/ops/convolution/Convolution_1.md +++ b/docs/ops/convolution/Convolution_1.md @@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1 The receptive field in each layer is calculated using the formulas: * Jump in the output feature map: \f[ - j_{out} = j_{in} * s + j_{out} = j_{in} \cdot s \f] * Size of the receptive field of output feature: \f[ - r_{out} = r_{in} + ( k - 1 ) * j_{in} + r_{out} = r_{in} + ( k - 1 ) \cdot j_{in} \f] * Center position of the receptive field of the first output feature: \f[ - start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in} + start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in} \f] * Output is calculated using the following formula: \f[ diff --git a/docs/ops/convolution/DeformableConvolution_1.md b/docs/ops/convolution/DeformableConvolution_1.md index 77140cb30c7..6c73e202be5 100644 --- a/docs/ops/convolution/DeformableConvolution_1.md +++ b/docs/ops/convolution/DeformableConvolution_1.md @@ -12,7 +12,7 @@ Output is calculated using the following formula: \f[ - y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) + y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \f] diff --git a/docs/ops/convolution/DeformableConvolution_8.md b/docs/ops/convolution/DeformableConvolution_8.md index 0474a71193d..fc7c05a235c 100644 --- a/docs/ops/convolution/DeformableConvolution_8.md +++ b/docs/ops/convolution/DeformableConvolution_8.md @@ -14,7 +14,7 @@ Output is calculated using the following formula: \f[ - y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k} + y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k} \f] Where diff --git a/docs/ops/logical/LogicalNot_1.md b/docs/ops/logical/LogicalNot_1.md index 9dd9132383f..97c41ddb14c 100644 --- a/docs/ops/logical/LogicalNot_1.md +++ b/docs/ops/logical/LogicalNot_1.md @@ -25,7 +25,7 @@ *LogicalNot* does the following with the input tensor *a*: \f[ -a_{i} = not(a_{i}) +a_{i} = \lnot a_{i} \f] **Examples** diff --git a/docs/ops/logical/LogicalXor_1.md b/docs/ops/logical/LogicalXor_1.md index 61bfa9bc25c..16072f01183 100644 --- a/docs/ops/logical/LogicalXor_1.md +++ b/docs/ops/logical/LogicalXor_1.md @@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*: \f[ -o_{i} = a_{i} xor b_{i} +o_{i} = a_{i} \oplus b_{i} \f] **Examples** diff --git a/docs/ops/pooling/AdaptiveAvgPool_8.md b/docs/ops/pooling/AdaptiveAvgPool_8.md index cff1e91e92c..3c6193045ca 100644 --- a/docs/ops/pooling/AdaptiveAvgPool_8.md +++ b/docs/ops/pooling/AdaptiveAvgPool_8.md @@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW \f[ \begin{array}{lcl} -d_{start} &=& floor(i*D_{in}/D_{out})\\ -d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\ -h_{start} &=& floor(j*H_{in}/H_{out})\\ -h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\ -w_{start} &=& floor(k*W_{in}/W_{out})\\ -w_{end} &=& ceil((k+1)*W_{in}/W_{out}) +d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\ +d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\ +h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\ +h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\ +w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\ +w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil \end{array} \f] The output is calculated with the following formula: \f[ -Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})} +Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})} \f] **Inputs**: diff --git a/docs/ops/pooling/AdaptiveMaxPool_8.md b/docs/ops/pooling/AdaptiveMaxPool_8.md index a86c3f67ac0..c34629351b8 100644 --- a/docs/ops/pooling/AdaptiveMaxPool_8.md +++ b/docs/ops/pooling/AdaptiveMaxPool_8.md @@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW \f[ \begin{array}{lcl} -d_{start} &=& floor(i*D_{in}/D_{out})\\ -d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\ -h_{start} &=& floor(j*H_{in}/H_{out})\\ -h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\ -w_{start} &=& floor(k*W_{in}/W_{out})\\ -w_{end} &=& ceil((k+1)*W_{in}/W_{out}) +d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\ +d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\ +h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\ +h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\ +w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\ +w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil \end{array} \f]