Swish specification refactoring (#5015)
* Review spec of Swish operation * Change reference link to abstract * Minor change in example section * Fix minor wording issues
This commit is contained in:
parent
80acd27096
commit
f7863847ad
@ -2,38 +2,40 @@
|
|||||||
|
|
||||||
**Versioned name**: *Swish-4*
|
**Versioned name**: *Swish-4*
|
||||||
|
|
||||||
**Category**: *Activation*
|
**Category**: *Activation function*
|
||||||
|
|
||||||
**Short description**: Swish takes one input tensor and produces output tensor where the Swish function is applied to the tensor elementwise.
|
**Short description**: *Swish* performs element-wise activation function on a given input tensor.
|
||||||
|
|
||||||
**Detailed description**: For each element from the input tensor calculates corresponding
|
**Detailed description**
|
||||||
element in the output tensor with the following formula:
|
|
||||||
|
*Swish* operation is introduced in this [article](https://arxiv.org/abs/1710.05941).
|
||||||
|
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
|
||||||
|
|
||||||
\f[
|
\f[
|
||||||
Swish(x) = x / (1.0 + e^{-(beta * x)})
|
Swish(x) = x\cdot \sigma(\beta x) = x \left(1 + e^{-(\beta x)}\right)^{-1}
|
||||||
\f]
|
\f]
|
||||||
|
|
||||||
The Swish operation is introduced in the [article](https://arxiv.org/pdf/1710.05941.pdf).
|
where β corresponds to `beta` scalar input.
|
||||||
|
|
||||||
**Attributes**:
|
**Attributes**: *Swish* operation has no attributes.
|
||||||
|
|
||||||
**Inputs**:
|
**Inputs**:
|
||||||
|
|
||||||
* **1**: Multidimensional input tensor of type *T*. **Required**.
|
* **1**: `data`. A tensor of type `T` and arbitrary shape. **Required**.
|
||||||
|
|
||||||
* **2**: Scalar with non-negative value of type *T*. Multiplication parameter *beta* for the sigmoid. If the input is not connected then the default value 1.0 is used. **Optional**
|
* **2**: `beta`. A non-negative scalar value of type `T`. Multiplication parameter for the sigmoid. Default value 1.0 is used. **Optional**.
|
||||||
|
|
||||||
**Outputs**:
|
**Outputs**:
|
||||||
|
|
||||||
* **1**: The resulting tensor of the same shape and type as input tensor.
|
* **1**: The result of element-wise *Swish* function applied to the input tensor `data`. A tensor of type `T` and the same shape as `data` input tensor.
|
||||||
|
|
||||||
**Types**
|
**Types**
|
||||||
|
|
||||||
* *T*: arbitrary supported floating point type.
|
* *T*: arbitrary supported floating-point type.
|
||||||
|
|
||||||
|
**Examples**
|
||||||
|
|
||||||
**Example**
|
*Example: Second input `beta` provided*
|
||||||
|
|
||||||
```xml
|
```xml
|
||||||
<layer ... type="Swish">
|
<layer ... type="Swish">
|
||||||
<input>
|
<input>
|
||||||
@ -41,13 +43,30 @@ The Swish operation is introduced in the [article](https://arxiv.org/pdf/1710.05
|
|||||||
<dim>256</dim>
|
<dim>256</dim>
|
||||||
<dim>56</dim>
|
<dim>56</dim>
|
||||||
</port>
|
</port>
|
||||||
<port id="1"/>
|
<port id="1"> <!-- beta value: 2.0 -->
|
||||||
|
</port>
|
||||||
</input>
|
</input>
|
||||||
<output>
|
<output>
|
||||||
<port id="1">
|
<port id="2">
|
||||||
<dim>256</dim>
|
<dim>256</dim>
|
||||||
<dim>56</dim>
|
<dim>56</dim>
|
||||||
</port>
|
</port>
|
||||||
</output>
|
</output>
|
||||||
</layer>
|
</layer>
|
||||||
```
|
```
|
||||||
|
|
||||||
|
*Example: Second input `beta` not provided*
|
||||||
|
```xml
|
||||||
|
<layer ... type="Swish">
|
||||||
|
<input>
|
||||||
|
<port id="0">
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</input>
|
||||||
|
<output>
|
||||||
|
<port id="1">
|
||||||
|
<dim>128</dim>
|
||||||
|
</port>
|
||||||
|
</output>
|
||||||
|
</layer>
|
||||||
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user