Beautify operator specifications (#6958)
* beautify operator specifications * further update ops specs * update FloorMod spec * update adaptive pool spec * update HSwish spec * bringg back old erf version
This commit is contained in:
parent
f26ecdd53f
commit
273c7188a4
@ -15,7 +15,7 @@
|
|||||||
|
|
||||||
Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
|
Let *min_value* and *max_value* be *min* and *max*, respectively. The mathematical formula of *Clamp* is as follows:
|
||||||
\f[
|
\f[
|
||||||
clamp( x_{i} )=\min\big( \max\left( x_{i}, min\_value \right), max\_value \big)
|
clamp( x_{i} )=\min\big( \max\left( x_{i},\ min\_value \right),\ max\_value \big)
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**:
|
||||||
|
@ -12,7 +12,7 @@
|
|||||||
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
|
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
|
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.
|
where Φ(x) is the Cumulative Distribution Function for Gaussian Distribution.
|
||||||
|
@ -22,13 +22,13 @@ The *Gelu* function may be approximated in two different ways based on *approxim
|
|||||||
For `erf` approximation mode, *Gelu* function is represented as:
|
For `erf` approximation mode, *Gelu* function is represented as:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
|
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\frac{x}{\sqrt{2}}\right]
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
For `tanh` approximation mode, *Gelu* function is represented as:
|
For `tanh` approximation mode, *Gelu* function is represented as:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
|
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right]\right)
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**
|
**Attributes**
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
HSigmoid(x) = \frac{min(max(x + 3, 0), 6)}{6}
|
HSigmoid(x) = \frac{min(max(x + 3,\ 0),\ 6)}{6}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
|
The HSigmoid operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
HSwish(x) = x \frac{min(max(x + 3, 0), 6)}{6}
|
HSwish(x) = x \cdot \frac{min(max(x + 3,\ 0),\ 6)}{6}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
|
The HSwish operation is introduced in the following [article](https://arxiv.org/pdf/1905.02244.pdf).
|
||||||
|
@ -12,10 +12,13 @@
|
|||||||
|
|
||||||
For each element from the input tensor calculates corresponding
|
For each element from the input tensor calculates corresponding
|
||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
y = max(0, min(1, alpha * x + beta))
|
y = max(0,\ min(1,\ \alpha x + \beta))
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
|
where α corresponds to `alpha` scalar input and β corresponds to `beta` scalar input.
|
||||||
|
|
||||||
**Inputs**
|
**Inputs**
|
||||||
|
|
||||||
* **1**: An tensor of type *T*. **Required.**
|
* **1**: An tensor of type *T*. **Required.**
|
||||||
|
@ -8,8 +8,8 @@
|
|||||||
|
|
||||||
**Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
|
**Note**: This is recommended to not compute LogSoftmax directly as Log(Softmax(x, axis)), more numeric stable is to compute LogSoftmax as:
|
||||||
\f[
|
\f[
|
||||||
t = (x - ReduceMax(x, axis)) \\
|
t = (x - ReduceMax(x,\ axis)) \\
|
||||||
LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t), axis))
|
LogSoftmax(x, axis) = t - Log(ReduceSum(Exp(t),\ axis))
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**
|
**Attributes**
|
||||||
|
@ -15,7 +15,7 @@
|
|||||||
For each element from the input tensor calculates corresponding
|
For each element from the input tensor calculates corresponding
|
||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
\f[
|
\f[
|
||||||
Y_{i}^{( l )} = max(0, Y_{i}^{( l - 1 )})
|
Y_{i}^{( l )} = max(0,\ Y_{i}^{( l - 1 )})
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Inputs**:
|
**Inputs**:
|
||||||
|
@ -25,7 +25,7 @@
|
|||||||
*Abs* does the following with the input tensor *a*:
|
*Abs* does the following with the input tensor *a*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
a_{i} = abs(a_{i})
|
a_{i} = \vert a_{i} \vert
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
a_{i} = ceiling(a_{i})
|
a_{i} = \lceil a_{i} \rceil
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**: *Ceiling* operation has no attributes.
|
**Attributes**: *Ceiling* operation has no attributes.
|
||||||
|
@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
|
|||||||
After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:
|
After broadcasting *Divide* performs division operation for the input tensors *a* and *b* using the formula below:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} / b_{i}
|
o_{i} = \frac{a_{i}}{b_{i}}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
The result of division by zero is undefined.
|
The result of division by zero is undefined.
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *FloorMod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} % b_{i}
|
o_{i} = a_{i} \mod b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
*FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.
|
*FloorMod* operation computes a reminder of a floored division. It is the same behaviour like in Python programming language: `floor(x / y) * y + floor_mod(x, y) = x`. The sign of the result is equal to a sign of a divisor. The result of division by zero is undefined.
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
element in the output tensor with the following formula:
|
element in the output tensor with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
a_{i} = floor(a_{i})
|
a_{i} = \lfloor a_{i} \rfloor
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**: *Floor* operation has no attributes.
|
**Attributes**: *Floor* operation has no attributes.
|
||||||
|
@ -12,7 +12,7 @@ As a first step input tensors *a* and *b* are broadcasted if their shapes differ
|
|||||||
After broadcasting *Maximum* does the following with the input tensors *a* and *b*:
|
After broadcasting *Maximum* does the following with the input tensors *a* and *b*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = max(a_{i}, b_{i})
|
o_{i} = max(a_{i},\ b_{i})
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**:
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Minimum* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = min(a_{i}, b_{i})
|
o_{i} = min(a_{i},\ b_{i})
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**:
|
||||||
|
@ -10,7 +10,7 @@
|
|||||||
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
As a first step input tensors *a* and *b* are broadcasted if their shapes differ. Broadcasting is performed according to `auto_broadcast` attribute specification. As a second step *Mod* operation is computed element-wise on the input tensors *a* and *b* according to the formula below:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} % b_{i}
|
o_{i} = a_{i} \mod b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
*Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.
|
*Mod* operation computes a reminder of a truncated division. It is the same behaviour like in C programming language: `truncated(x / y) * y + truncated_mod(x, y) = x`. The sign of the result is equal to a sign of a dividend. The result of division by zero is undefined.
|
||||||
|
@ -11,7 +11,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
|
|||||||
After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:
|
After broadcasting *Multiply* performs multiplication operation for the input tensors *a* and *b* using the formula below:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} * b_{i}
|
o_{i} = a_{i} \cdot b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**:
|
||||||
|
@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
|
|||||||
After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:
|
After broadcasting *GreaterEqual* does the following with the input tensors *a* and *b*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} >= b_{i}
|
o_{i} = a_{i} \geq b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
@ -12,7 +12,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
|
|||||||
After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:
|
After broadcasting *LessEqual* does the following with the input tensors *a* and *b*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} <= b_{i}
|
o_{i} = a_{i} \leq b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**:
|
||||||
|
@ -37,7 +37,7 @@ Before performing arithmetic operation, input tensors *a* and *b* are broadcaste
|
|||||||
After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:
|
After broadcasting *NotEqual* does the following with the input tensors *a* and *b*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} != b_{i}
|
o_{i} = a_{i} \neq b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
@ -16,15 +16,15 @@ n_{out} = \left ( \frac{n_{in} + 2p - k}{s} \right ) + 1
|
|||||||
The receptive field in each layer is calculated using the formulas:
|
The receptive field in each layer is calculated using the formulas:
|
||||||
* Jump in the output feature map:
|
* Jump in the output feature map:
|
||||||
\f[
|
\f[
|
||||||
j_{out} = j_{in} * s
|
j_{out} = j_{in} \cdot s
|
||||||
\f]
|
\f]
|
||||||
* Size of the receptive field of output feature:
|
* Size of the receptive field of output feature:
|
||||||
\f[
|
\f[
|
||||||
r_{out} = r_{in} + ( k - 1 ) * j_{in}
|
r_{out} = r_{in} + ( k - 1 ) \cdot j_{in}
|
||||||
\f]
|
\f]
|
||||||
* Center position of the receptive field of the first output feature:
|
* Center position of the receptive field of the first output feature:
|
||||||
\f[
|
\f[
|
||||||
start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) * j_{in}
|
start_{out} = start_{in} + ( \frac{k - 1}{2} - p ) \cdot j_{in}
|
||||||
\f]
|
\f]
|
||||||
* Output is calculated using the following formula:
|
* Output is calculated using the following formula:
|
||||||
\f[
|
\f[
|
||||||
|
@ -12,7 +12,7 @@ Output is calculated using the following formula:
|
|||||||
|
|
||||||
\f[
|
\f[
|
||||||
|
|
||||||
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k})
|
y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k})
|
||||||
|
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
|
@ -14,7 +14,7 @@ Output is calculated using the following formula:
|
|||||||
|
|
||||||
\f[
|
\f[
|
||||||
|
|
||||||
y(p) = \sum_{k = 1}^{K}w_{k}x(p + p_{k} + {\Delta}p_{k}) * {\Delta}m_{k}
|
y(p) = \displaystyle{\sum_{k = 1}^{K}}w_{k}x(p + p_{k} + {\Delta}p_{k}) \cdot {\Delta}m_{k}
|
||||||
|
|
||||||
\f]
|
\f]
|
||||||
Where
|
Where
|
||||||
|
@ -25,7 +25,7 @@
|
|||||||
*LogicalNot* does the following with the input tensor *a*:
|
*LogicalNot* does the following with the input tensor *a*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
a_{i} = not(a_{i})
|
a_{i} = \lnot a_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
@ -37,7 +37,7 @@ Before performing logical operation, input tensors *a* and *b* are broadcasted i
|
|||||||
After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:
|
After broadcasting *LogicalXor* does the following with the input tensors *a* and *b*:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
o_{i} = a_{i} xor b_{i}
|
o_{i} = a_{i} \oplus b_{i}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
@ -11,19 +11,19 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
|
|||||||
|
|
||||||
\f[
|
\f[
|
||||||
\begin{array}{lcl}
|
\begin{array}{lcl}
|
||||||
d_{start} &=& floor(i*D_{in}/D_{out})\\
|
d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
|
||||||
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\
|
d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
|
||||||
h_{start} &=& floor(j*H_{in}/H_{out})\\
|
h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
|
||||||
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\
|
h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
|
||||||
w_{start} &=& floor(k*W_{in}/W_{out})\\
|
w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
|
||||||
w_{end} &=& ceil((k+1)*W_{in}/W_{out})
|
w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
|
||||||
\end{array}
|
\end{array}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
The output is calculated with the following formula:
|
The output is calculated with the following formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start})*(h_{end}-h_{start})*(w_{end}-w_{start})}
|
Output(i,j,k) = \frac{Input[d_{start}:d_{end}, h_{start}:h_{end}, w_{start}:w_{end}]}{(d_{end}-d_{start}) \cdot (h_{end}-h_{start}) \cdot (w_{end}-w_{start})}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
**Inputs**:
|
**Inputs**:
|
||||||
|
@ -11,12 +11,12 @@ The kernel dimensions are calculated using the following formulae for the `NCDHW
|
|||||||
|
|
||||||
\f[
|
\f[
|
||||||
\begin{array}{lcl}
|
\begin{array}{lcl}
|
||||||
d_{start} &=& floor(i*D_{in}/D_{out})\\
|
d_{start} &=& \lfloor i \cdot \frac{D_{in}}{D_{out}}\rfloor\\
|
||||||
d_{end} &=& ceil((i+1)*D_{in}/D_{out})\\
|
d_{end} &=& \lceil(i+1) \cdot \frac{D_{in}}{D_{out}}\rceil\\
|
||||||
h_{start} &=& floor(j*H_{in}/H_{out})\\
|
h_{start} &=& \lfloor j \cdot \frac{H_{in}}{H_{out}}\rfloor\\
|
||||||
h_{end} &=& ceil((j+1)*H_{in}/H_{out})\\
|
h_{end} &=& \lceil(j+1) \cdot \frac{H_{in}}{H_{out}}\rceil\\
|
||||||
w_{start} &=& floor(k*W_{in}/W_{out})\\
|
w_{start} &=& \lfloor k \cdot \frac{W_{in}}{W_{out}}\rfloor\\
|
||||||
w_{end} &=& ceil((k+1)*W_{in}/W_{out})
|
w_{end} &=& \lceil(k+1) \cdot \frac{W_{in}}{W_{out}}\rceil
|
||||||
\end{array}
|
\end{array}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user