Beautify operator specifications (#6958)

* beautify operator specifications

* further update ops specs

* update FloorMod spec

* update adaptive pool spec

* update HSwish spec

* bringg back old erf version
This commit is contained in:
Dawid Kożykowski 2021-08-12 12:11:30 +02:00 committed by GitHub
parent f26ecdd53f
commit 273c7188a4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
27 changed files with 45 additions and 42 deletions

View File

@ -15,7 +15,7 @@
Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
\f[
clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big)
clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big)
\f]
**Attributes**:

View File

@ -12,7 +12,7 @@
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
\f[
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
\f]
where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

View File

@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim
For `erf` approximation mode, *Gelu* function is represented as:
\f[
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
\f]
For `tanh` approximation mode, *Gelu* function is represented as:
\f[
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
\f]
**Attributes**

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula:
\f[
HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6}
HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6}
\f]
The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula:
\f[
HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6}
HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6}
\f]
The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).

View File

@ -12,10 +12,13 @@
For each element from the input tensor calculates corresponding
element in the output tensor with the following formula:
\f[
y = max(0, min(1, alpha * x + beta))
y = max(0,\ min(1,\ \alpha x + \beta))
\f]
where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input.
**Inputs**
* **1**: An tensor of type *T*. **Required.**

View File

@ -8,8 +8,8 @@
**Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
\f[
t = (x - ReduceMax(x, axis)) \\
LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis))
t = (x - ReduceMax(x,\ axis)) \\
LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis))
\f]
**Attributes**

View File

@ -15,7 +15,7 @@
For each element from the input tensor calculates corresponding
element in the output tensor with the following formula:
\f[
Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )})
Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )})
\f]
**Inputs**:

View File

@ -25,7 +25,7 @@
*Abs* does the following with the input tensor *a*:
\f[
a_{i} = abs(a_{i})
a_{i} = \vert a_{i} \vert
\f]
**Examples**

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula:
\f[
a_{i} = ceiling(a_{i})
a_{i} = \lceil a_{i} \rceil
\f]
**Attributes**: *Ceiling* operation has no attributes.

View File

@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:
\f[
o_{i} = a_{i} / b_{i}
o_{i} = \frac{a_{i}}{b_{i}}
\f]
The result of division by zero is undefined.

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[
o_{i} = a_{i} % b_{i}
o_{i} = a_{i} \mod b_{i}
\f]
*FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula:
\f[
a_{i} = floor(a_{i})
a_{i} = \lfloor a_{i} \rfloor
\f]
**Attributes**: *Floor* operation has no attributes.

View File

@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ
After broadcasting *Maximum* does the following with the input tensors *a* and *b*:
\f[
o_{i} = max(a_{i}, b_{i})
o_{i} = max(a_{i},\ b_{i})
\f]
**Attributes**:

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[
o_{i} = min(a_{i}, b_{i})
o_{i} = min(a_{i},\ b_{i})
\f]
**Attributes**:

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[
o_{i} = a_{i} % b_{i}
o_{i} = a_{i} \mod b_{i}
\f]
*Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.

View File

@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:
\f[
o_{i} = a_{i} * b_{i}
o_{i} = a_{i} \cdot b_{i}
\f]
**Attributes**:

View File

@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:
\f[
o_{i} = a_{i} >= b_{i}
o_{i} = a_{i} \geq b_{i}
\f]
**Examples**

View File

@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:
\f[
o_{i} = a_{i} <= b_{i}
o_{i} = a_{i} \leq b_{i}
\f]
**Attributes**:

View File

@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:
\f[
o_{i} = a_{i} != b_{i}
o_{i} = a_{i} \neq b_{i}
\f]
**Examples**

View File

@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
The receptive field in each layer is calculated using the formulas:
* Jump in the output feature map:
\f[
j_{out} = j_{in} * s
j_{out} = j_{in} \cdot s
\f]
* Size of the receptive field of output feature:
\f[
r_{out} = r_{in} + ( k - 1 ) * j_{in}
r_{out} = r_{in} + ( k - 1 ) \cdot j_{in}
\f]
* Center position of the receptive field of the first output feature:
\f[
start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in}
start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in}
\f]
* Output is calculated using the following formula:
\f[

View File

@ -12,7 +12,7 @@ Output is calculated using the following formula:
\f[
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k})
y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k})
\f]

View File

@ -14,7 +14,7 @@ Output is calculated using the following formula:
\f[
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k}
y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k}
\f]
Where

View File

@ -25,7 +25,7 @@
*LogicalNot* does the following with the input tensor *a*:
\f[
a_{i} = not(a_{i})
a_{i} = \lnot a_{i}
\f]
**Examples**

View File

@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i
After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:
\f[
o_{i} = a_{i} xor b_{i}
o_{i} = a_{i} \oplus b_{i}
\f]
**Examples**

View File

@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
\f[
\begin{array}{lcl}
d_{start} &=& floor(i*D_{in}/D_{out})\\
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\
h_{start} &=& floor(j*H_{in}/H_{out})\\
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\
w_{start} &=& floor(k*W_{in}/W_{out})\\
w_{end} &=& ceil((k+1)*W_{in}/W_{out})
d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
\end{array}
\f]
The output is calculated with the following formula:
\f[
Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})}
Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})}
\f]
**Inputs**:

View File

@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
\f[
\begin{array}{lcl}
d_{start} &=& floor(i*D_{in}/D_{out})\\
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\
h_{start} &=& floor(j*H_{in}/H_{out})\\
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\
w_{start} &=& floor(k*W_{in}/W_{out})\\
w_{end} &=& ceil((k+1)*W_{in}/W_{out})
d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
\end{array}
\f]