[Opset13][FP8][Spec] FakeConvert op specification (#21039)

* FakeConvert spec init

* Add spec files to the opset docs

* Remove apply_scale attribute and update inputs description

* Experimental op notice

* Update short description

* Update detailed description.

* Add BF16 type to supported inputs

* Update  note about "Fake"

* Add more details

* Add formula of the operation
This commit is contained in:
Katarzyna Mitrus 2023-11-28 09:14:06 +01:00 committed by GitHub
parent be5c755c32
commit 37bac6ebcd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
3 changed files with 100 additions and 0 deletions

View File

@ -73,6 +73,7 @@ Table of Contents
* :doc:`ExperimentalDetectronTopKROIs_6 <openvino_docs_ops_sort_ExperimentalDetectronTopKROIs_6>`
* :doc:`ExtractImagePatches <openvino_docs_ops_movement_ExtractImagePatches_3>`
* :doc:`Eye <openvino_docs_ops_generation_Eye_9>`
* :doc:`FakeConvert <openvino_docs_ops_quantization_FakeConvert_13>`
* :doc:`FakeQuantize <openvino_docs_ops_quantization_FakeQuantize_1>`
* :doc:`Floor <openvino_docs_ops_arithmetic_Floor_1>`
* :doc:`FloorMod <openvino_docs_ops_arithmetic_FloorMod_1>`

View File

@ -68,6 +68,7 @@
ExperimentalDetectronTopKROIs-6 <openvino_docs_ops_sort_ExperimentalDetectronTopKROIs_6>
ExtractImagePatches-3 <openvino_docs_ops_movement_ExtractImagePatches_3>
Eye-9 <openvino_docs_ops_generation_Eye_9>
FakeConvert-13 <openvino_docs_ops_quantization_FakeConvert_13>
FakeQuantize-1 <openvino_docs_ops_quantization_FakeQuantize_1>
FloorMod-1 <openvino_docs_ops_arithmetic_FloorMod_1>
Floor-1 <openvino_docs_ops_arithmetic_Floor_1>

View File

@ -0,0 +1,98 @@
# FakeConvert {#openvino_docs_ops_quantization_FakeConvert_13}
**Note**: FakeConvert is an experimental operation and subject to change.
@sphinxdirective
.. meta::
:description: Learn about FakeConvert-13 - a quantization operation.
**Versioned name**: *FakeConvert-13*
**Category**: *Quantization*
**Short description**: *FakeConvert* is element-wise quantization of floating-point input values into a set of values corresponding to a target low-precision floating-point type.
**Detailed description**: *FakeConvert* operation converts the input tensor to a specified target low-precision floating-point type and performs backward conversion to the source precision. It also applies affine transformation defined by ``scale`` and ``shift`` parameters before the conversion step and its reverse form after the backward conversion.
It emulates types defined by the ``destination_type`` attribute, on the original type of the ``data`` input.
Possible destination types are: "f8e4m3", "f8e5m2". The "f8e4m3" is an 8-bit floating-point format, where 1 bit for the sign, 4 bits for the exponents and 3 bits for the mantissa. The "f8e5m2" is an 8-bit floating-point format, where 1 bit is for the sign, 5 bits for the exponents and 2 for the mantissa.
The FP8 types were introduced in the following paper: `FP8 Formats for Deep Learning <https://arxiv.org/abs/2209.05433>`__ .
*Fake* in *FakeConvert* means that the output tensor preserve the same element type as an original type of the input tensor, not the ``destination_type``.
Each element of the output is defined as the result of the following expression:
.. code-block:: py
:force:
data = (data + shift) / scale
ConvertLike(Convert(data, destination_type), data)
data = data * scale - shift
**Attributes**
* *destination_type*
* **Description**: *destination_type* is the emulated type
* **Range of values**: "f8e4m3", "f8e5m2"
* **Type**: `string`
* **Required**: *yes*
**Inputs**:
* **1**: `data` - tensor of type *T_F* and arbitrary shape. **Required.**
* **2**: `scale` - tensor of type *T_F* with a scale factor for the *data* input value. The shape must be numpy-broadcastable to the shape of *data*. **Required.**
* **3**: `shift` - tensor of type *T_F* with value to subtract before and add after conversion of the *data* input value. The shape must be numpy-broadcastable to the shape of *data*, and match the shape of the *scale* input. **Optional.**
**Outputs**:
* **1**: Output tensor of type *T_F* with shape and type matching the 1st input tensor *data*.
**Types**
* *T_F*: supported floating-point type (`FP16`, `BF16`, `FP32`).
**Example**
.. code-block:: xml
:force:
<layer type="FakeConvert">
<data destination_type="f8e4m3"/>
<input>
<port id="0">
<dim>1</dim>
<dim>64</dim>
<dim>56</dim>
<dim>56</dim>
</port>
<port id="1">
<dim>1</dim>
<dim>64</dim>
<dim>1</dim>
<dim>1</dim>
</port>
<port id="2">
<dim>1</dim>
<dim>64</dim>
<dim>1</dim>
<dim>1</dim>
</port>
</input>
<output>
<port id="3">
<dim>1</dim>
<dim>64</dim>
<dim>56</dim>
<dim>56</dim>
</port>
</output>
</layer>
@endsphinxdirective