Swish specification refactoring (#5015)

* Review spec of Swish operation

* Change reference link to abstract

* Minor change in example section

* Fix minor wording issues
This commit is contained in:
Gabriele Galiero Casay 2021-04-01 15:07:20 +02:00 committed by GitHub
parent 80acd27096
commit f7863847ad
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2,38 +2,40 @@
**Versioned name**: *Swish-4*
**Category**: *Activation*
**Category**: *Activation function*
**Short description**: Swish takes one input tensor and produces output tensor where the Swish function is applied to the tensor elementwise.
**Short description**: *Swish* performs element-wise activation function on a given input tensor.
**Detailed description**: For each element from the input tensor calculates corresponding
element in the output tensor with the following formula:
**Detailed description**
*Swish* operation is introduced in this [article](https://arxiv.org/abs/1710.05941).
It performs element-wise activation function on a given input tensor, based on the following mathematical formula:
\f[
Swish(x) = x / (1.0 + e^{-(beta * x)})
Swish(x) = x\cdot \sigma(\beta x) = x \left(1 + e^{-(\beta x)}\right)^{-1}
\f]
The Swish operation is introduced in the [article](https://arxiv.org/pdf/1710.05941.pdf).
where β corresponds to `beta` scalar input.
**Attributes**:
**Attributes**: *Swish* operation has no attributes.
**Inputs**:
* **1**: Multidimensional input tensor of type *T*. **Required**.
* **1**: `data`. A tensor of type `T` and arbitrary shape. **Required**.
* **2**: Scalar with non-negative value of type *T*. Multiplication parameter *beta* for the sigmoid. If the input is not connected then the default value 1.0 is used. **Optional**
* **2**: `beta`. A non-negative scalar value of type `T`. Multiplication parameter for the sigmoid. Default value 1.0 is used. **Optional**.
**Outputs**:
* **1**: The resulting tensor of the same shape and type as input tensor.
* **1**: The result of element-wise *Swish* function applied to the input tensor `data`. A tensor of type `T` and the same shape as `data` input tensor.
**Types**
* *T*: arbitrary supported floating point type.
* *T*: arbitrary supported floating-point type.
**Examples**
**Example**
*Example: Second input `beta` provided*
```xml
<layer ... type="Swish">
<input>
@ -41,13 +43,30 @@ The Swish operation is introduced in the [article](https://arxiv.org/pdf/1710.05
<dim>256</dim>
<dim>56</dim>
</port>
<port id="1"/>
<port id="1"> <!-- beta value: 2.0 -->
</port>
</input>
<output>
<port id="1">
<port id="2">
<dim>256</dim>
<dim>56</dim>
</port>
</output>
</layer>
```
```
*Example: Second input `beta` not provided*
```xml
<layer ... type="Swish">
<input>
<port id="0">
<dim>128</dim>
</port>
</input>
<output>
<port id="1">
<dim>128</dim>
</port>
</output>
</layer>
```