Beautify operator specifications (#6958)

* beautify operator specifications

* further update ops specs

* update FloorMod spec

* update adaptive pool spec

* update HSwish spec

* bringg back old erf version
This commit is contained in:
Dawid Kożykowski 2021-08-12 12:11:30 +02:00 committed by GitHub
parent f26ecdd53f
commit 273c7188a4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
27 changed files with 45 additions and 42 deletions

View File

@ -15,7 +15,7 @@
Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows: Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
\f[ \f[
clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big) clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big)
\f] \f]
**Attributes**: **Attributes**:

View File

@ -12,7 +12,7 @@
It performs element-wise activation function on a given input tensor, based on the following mathematical formula: It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
\f[ \f[
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right] Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
\f] \f]
where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution. where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.

View File

@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim
For `erf` approximation mode, *Gelu* function is represented as: For `erf` approximation mode, *Gelu* function is represented as:
\f[ \f[
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right] Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
\f] \f]
For `tanh` approximation mode, *Gelu* function is represented as: For `tanh` approximation mode, *Gelu* function is represented as:
\f[ \f[
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right) Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
\f] \f]
**Attributes** **Attributes**

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6} HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6}
\f] \f]
The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf). The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6} HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6}
\f] \f]
The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf). The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).

View File

@ -12,10 +12,13 @@
For each element from the input tensor calculates corresponding For each element from the input tensor calculates corresponding
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
y = max(0, min(1, alpha * x + beta)) y = max(0,\ min(1,\ \alpha x + \beta))
\f] \f]
where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input.
**Inputs** **Inputs**
* **1**: An tensor of type *T*. **Required.** * **1**: An tensor of type *T*. **Required.**

View File

@ -8,8 +8,8 @@
**Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as: **Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
\f[ \f[
t = (x - ReduceMax(x, axis)) \\ t = (x - ReduceMax(x,\ axis)) \\
LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis)) LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis))
\f] \f]
**Attributes** **Attributes**

View File

@ -15,7 +15,7 @@
For each element from the input tensor calculates corresponding For each element from the input tensor calculates corresponding
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )}) Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )})
\f] \f]
**Inputs**: **Inputs**:

View File

@ -25,7 +25,7 @@
*Abs* does the following with the input tensor *a*: *Abs* does the following with the input tensor *a*:
\f[ \f[
a_{i} = abs(a_{i}) a_{i} = \vert a_{i} \vert
\f] \f]
**Examples** **Examples**

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
a_{i} = ceiling(a_{i}) a_{i} = \lceil a_{i} \rceil
\f] \f]
**Attributes**: *Ceiling* operation has no attributes. **Attributes**: *Ceiling* operation has no attributes.

View File

@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below: After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:
\f[ \f[
o_{i} = a_{i} / b_{i} o_{i} = \frac{a_{i}}{b_{i}}
\f] \f]
The result of division by zero is undefined. The result of division by zero is undefined.

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[ \f[
o_{i} = a_{i} % b_{i} o_{i} = a_{i} \mod b_{i}
\f] \f]
*FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined. *FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.

View File

@ -10,7 +10,7 @@
element in the output tensor with the following formula: element in the output tensor with the following formula:
\f[ \f[
a_{i} = floor(a_{i}) a_{i} = \lfloor a_{i} \rfloor
\f] \f]
**Attributes**: *Floor* operation has no attributes. **Attributes**: *Floor* operation has no attributes.

View File

@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ
After broadcasting *Maximum* does the following with the input tensors *a* and *b*: After broadcasting *Maximum* does the following with the input tensors *a* and *b*:
\f[ \f[
o_{i} = max(a_{i}, b_{i}) o_{i} = max(a_{i},\ b_{i})
\f] \f]
**Attributes**: **Attributes**:

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[ \f[
o_{i} = min(a_{i}, b_{i}) o_{i} = min(a_{i},\ b_{i})
\f] \f]
**Attributes**: **Attributes**:

View File

@ -10,7 +10,7 @@
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below: As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
\f[ \f[
o_{i} = a_{i} % b_{i} o_{i} = a_{i} \mod b_{i}
\f] \f]
*Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined. *Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.

View File

@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below: After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:
\f[ \f[
o_{i} = a_{i} * b_{i} o_{i} = a_{i} \cdot b_{i}
\f] \f]
**Attributes**: **Attributes**:

View File

@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*: After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:
\f[ \f[
o_{i} = a_{i} >= b_{i} o_{i} = a_{i} \geq b_{i}
\f] \f]
**Examples** **Examples**

View File

@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *LessEqual* does the following with the input tensors *a* and *b*: After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:
\f[ \f[
o_{i} = a_{i} <= b_{i} o_{i} = a_{i} \leq b_{i}
\f] \f]
**Attributes**: **Attributes**:

View File

@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
After broadcasting *NotEqual* does the following with the input tensors *a* and *b*: After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:
\f[ \f[
o_{i} = a_{i} != b_{i} o_{i} = a_{i} \neq b_{i}
\f] \f]
**Examples** **Examples**

View File

@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
The receptive field in each layer is calculated using the formulas: The receptive field in each layer is calculated using the formulas:
* Jump in the output feature map: * Jump in the output feature map:
\f[ \f[
j_{out} = j_{in} * s j_{out} = j_{in} \cdot s
\f] \f]
* Size of the receptive field of output feature: * Size of the receptive field of output feature:
\f[ \f[
r_{out} = r_{in} + ( k - 1 ) * j_{in} r_{out} = r_{in} + ( k - 1 ) \cdot j_{in}
\f] \f]
* Center position of the receptive field of the first output feature: * Center position of the receptive field of the first output feature:
\f[ \f[
start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in} start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in}
\f] \f]
* Output is calculated using the following formula: * Output is calculated using the following formula:
\f[ \f[

View File

@ -12,7 +12,7 @@ Output is calculated using the following formula:
\f[ \f[
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k})
\f] \f]

View File

@ -14,7 +14,7 @@ Output is calculated using the following formula:
\f[ \f[
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k} y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k}
\f] \f]
Where Where

View File

@ -25,7 +25,7 @@
*LogicalNot* does the following with the input tensor *a*: *LogicalNot* does the following with the input tensor *a*:
\f[ \f[
a_{i} = not(a_{i}) a_{i} = \lnot a_{i}
\f] \f]
**Examples** **Examples**

View File

@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i
After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*: After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:
\f[ \f[
o_{i} = a_{i} xor b_{i} o_{i} = a_{i} \oplus b_{i}
\f] \f]
**Examples** **Examples**

View File

@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
\f[ \f[
\begin{array}{lcl} \begin{array}{lcl}
d_{start} &=& floor(i*D_{in}/D_{out})\\ d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\ d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
h_{start} &=& floor(j*H_{in}/H_{out})\\ h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\ h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
w_{start} &=& floor(k*W_{in}/W_{out})\\ w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
w_{end} &=& ceil((k+1)*W_{in}/W_{out}) w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
\end{array} \end{array}
\f] \f]
The output is calculated with the following formula: The output is calculated with the following formula:
\f[ \f[
Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})} Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})}
\f] \f]
**Inputs**: **Inputs**:

View File

@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
\f[ \f[
\begin{array}{lcl} \begin{array}{lcl}
d_{start} &=& floor(i*D_{in}/D_{out})\\ d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\ d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
h_{start} &=& floor(j*H_{in}/H_{out})\\ h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\ h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
w_{start} &=& floor(k*W_{in}/W_{out})\\ w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
w_{end} &=& ceil((k+1)*W_{in}/W_{out}) w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
\end{array} \end{array}
\f] \f]