Gelu-7 specification refactoring (#5439)
* Review spec of Gelu-7 operation * Address review comments * Modified formulas * Changed type from `T` to *T*
This commit is contained in:
parent
05dc0c8cf7
commit
2062a648a7
@ -2,53 +2,63 @@
|
|||||||
|
|
||||||
**Versioned name**: *Gelu-7*
|
**Versioned name**: *Gelu-7*
|
||||||
|
|
||||||
**Category**: *Activation*
|
**Category**: *Activation function*
|
||||||
|
|
||||||
**Short description**: Calculates Gaussian error linear.
|
**Short description**: Gaussian error linear unit element-wise activation function.
|
||||||
|
|
||||||
**Detailed description**: `Gelu(x) = x * Φ(x)`, where `Φ(x)` is the Cumulative Distribution Function for Gaussian Distribution.
|
**Detailed description**:
|
||||||
The Gelu operation is introduced in the [paper](https://arxiv.org/abs/1606.08415).
|
|
||||||
|
*Gelu* operation is introduced in this [article](https://arxiv.org/abs/1606.08415).
|
||||||
|
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
|
||||||
|
|
||||||
|
\f[
|
||||||
|
Gelu(x) = x\cdot\Phi(x)
|
||||||
|
\f]
|
||||||
|
|
||||||
|
where `Φ(x)` is the Cumulative Distribution Function for Gaussian Distribution.
|
||||||
|
|
||||||
|
The *Gelu* function may be approximated in two different ways based on *approximation_mode* attribute.
|
||||||
|
|
||||||
|
For `erf` approximation mode, *Gelu* function is represented as:
|
||||||
|
|
||||||
|
\f[
|
||||||
|
Gelu(x) = x\cdot\Phi(x) = x\cdot\frac{1}{2}\cdot\left[1 + erf\left(x/\sqrt{2}\right)\right]
|
||||||
|
\f]
|
||||||
|
|
||||||
|
For `tanh` approximation mode, *Gelu* function is represented as:
|
||||||
|
|
||||||
|
\f[
|
||||||
|
Gelu(x) \approx x\cdot\frac{1}{2}\cdot \left(1 + \tanh\left[\sqrt{2/\pi} \cdot (x + 0.044715 \cdot x^3)\right]\right)
|
||||||
|
\f]
|
||||||
|
|
||||||
**Attributes**
|
**Attributes**
|
||||||
|
|
||||||
* *approximation_mode*
|
* *approximation_mode*
|
||||||
|
|
||||||
* **Description**: Specifies the formulae to calculate the output.
|
* **Description**: Specifies the formulae to calculate the *Gelu* function.
|
||||||
* **Range of values**:
|
* **Range of values**:
|
||||||
* `erf` -- calculate output using the Gauss error function.
|
* `erf` - calculate output using the Gauss error function
|
||||||
* `tanh` -- calculate output using tanh approximation
|
* `tanh` - calculate output using tanh approximation
|
||||||
* **Type**: `string`
|
* **Type**: `string`
|
||||||
* **Default value**: `erf`
|
* **Default value**: `erf`
|
||||||
* **Required**: *no*
|
* **Required**: *no*
|
||||||
|
|
||||||
|
|
||||||
**Mathematical Formulation**
|
|
||||||
|
|
||||||
For the `erf` approximation mode:
|
|
||||||
\f[
|
|
||||||
Gelu(x) = 0.5 \cdot x \cdot (1.0 + erf((x) / \sqrt{2})
|
|
||||||
\f]
|
|
||||||
|
|
||||||
For the `tanh` approximation mode:
|
|
||||||
|
|
||||||
\f[
|
|
||||||
Gelu(x) \approx 0.5 \cdot x \cdot (1.0 + tanh(\sqrt{2.0/pi} \cdot (x + 0.044715 \cdot x ^ 3))
|
|
||||||
\f]
|
|
||||||
|
|
||||||
**Inputs**:
|
**Inputs**:
|
||||||
|
|
||||||
* **1**: Multidimensional input tensor of type *T*. Required.
|
* **1**: A tensor of type *T* and arbitrary shape. **Required**.
|
||||||
|
|
||||||
**Outputs**:
|
**Outputs**:
|
||||||
|
|
||||||
* **1**: Floating point tensor with shape and type *T* matching the input tensor.
|
* **1**: The result of element-wise *Gelu* function applied to the input tensor. A tensor of type *T* and the same shape as input tensor.
|
||||||
|
|
||||||
**Types**
|
**Types**
|
||||||
|
|
||||||
* *T*: any floating point type.
|
* *T*: arbitrary supported floating-point type.
|
||||||
|
|
||||||
**Examples**
|
**Examples**
|
||||||
|
|
||||||
|
*Example: `tanh` approximation mode*
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
<layer ... type="Gelu">
|
<layer ... type="Gelu">
|
||||||
<data approximation_mode="tanh"/>
|
<data approximation_mode="tanh"/>
|
||||||
@ -67,6 +77,8 @@ For the `tanh` approximation mode:
|
|||||||
</layer>
|
</layer>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
*Example: `erf` approximation mode*
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
<layer ... type="Gelu">
|
<layer ... type="Gelu">
|
||||||
<data approximation_mode="erf"/>
|
<data approximation_mode="erf"/>
|
||||||
|
Loading…
Reference in New Issue
Block a user