DOCS shift to rst Advanced topics (#16454)
This commit is contained in:
parent
9b38e5168f
commit
9e5be9ad24
@ -1,15 +1,20 @@
|
||||
# Quantized models compute and restrictions {#openvino_docs_ov_plugin_dg_quantized_models}
|
||||
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
One of the feature of OpenVINO is the support of quantized models with different precisions: INT8, INT4, etc.
|
||||
However, it is up to the plugin to define what exact precisions are supported by the particular HW.
|
||||
All quantized models which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
|
||||
For more details about low-precision model representation please refer to this [document](@ref openvino_docs_ie_plugin_dg_lp_representation).
|
||||
For more details about low-precision model representation please refer to this :doc:`document <openvino_docs_ie_plugin_dg_lp_representation>`.
|
||||
|
||||
Interpreting FakeQuantize at runtime
|
||||
####################################
|
||||
|
||||
### Interpreting FakeQuantize at runtime
|
||||
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
|
||||
- Independently based on the definition of *FakeQuantize* operation.
|
||||
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
|
||||
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into models with low-precision operations.
|
||||
|
||||
* Independently based on the definition of *FakeQuantize* operation.
|
||||
* Using a special library of low-precision transformations (LPT) which applies common rules for generic operations, such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into models with low-precision operations.
|
||||
|
||||
Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
|
||||
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.
|
||||
@ -17,33 +22,47 @@ The former one is aimed to transform the input data into the target precision wh
|
||||
In practice *Dequantize* operations can be propagated forward through the linear operations, such as *Convolution* or *Fully-Connected*,
|
||||
and in some cases fused with the following *Quantize* operation for the next layer into the so-called *Requantize* operation (see Fig. 1).
|
||||
|
||||
![qdq_propagation]
|
||||
<div align="center">Figure 1. Quantization operations propagation at runtime. Q, DQ, RQ stand for Quantize, Dequantize, and Requantize correspondingly.</div>
|
||||
.. image:: _static/images/qdq_propagation.png
|
||||
|
||||
Figure 1. Quantization operations propagation at runtime. Q, DQ, RQ stand for Quantize, Dequantize, and Requantize correspondingly.
|
||||
|
||||
From the calculation standpoint, the FakeQuantize formula also is split into two parts accordingly:
|
||||
`output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low`
|
||||
|
||||
``output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low``
|
||||
|
||||
The first part of this formula represents *Quantize* operation:
|
||||
`q = round((x - input_low) / (input_high - input_low) * (levels-1))`
|
||||
|
||||
``q = round((x - input_low) / (input_high - input_low) * (levels-1))``
|
||||
|
||||
The second is responsible for the dequantization:
|
||||
`r = q / (levels-1) * (output_high - output_low) + output_low`
|
||||
|
||||
``r = q / (levels-1) * (output_high - output_low) + output_low``
|
||||
|
||||
From the scale/zero-point notation standpoint the latter formula can be written as follows:
|
||||
`r = (output_high - output_low) / (levels-1) * (q + output_low / (output_high - output_low) * (levels-1))`
|
||||
|
||||
``r = (output_high - output_low) / (levels-1) * (q + output_low / (output_high - output_low) * (levels-1))``
|
||||
|
||||
Thus we can define:
|
||||
- **Scale** as `(output_high - output_low) / (levels-1)`
|
||||
- **Zero-point** as `-output_low / (output_high - output_low) * (levels-1)`
|
||||
|
||||
> **NOTE**: During the quantization process the values `input_low`, `input_high`, `output_low`, `output_high` are selected so that to map a floating-point zero exactly to an integer value (zero-point) and vice versa.
|
||||
* **Scale** as ``(output_high - output_low) / (levels-1)``
|
||||
* **Zero-point** as ``-output_low / (output_high - output_low) * (levels-1)``
|
||||
|
||||
.. note::
|
||||
During the quantization process the values ``input_low``, ``input_high``, ``output_low``, ``output_high`` are selected so that to map a floating-point zero exactly to an integer value (zero-point) and vice versa.
|
||||
|
||||
Quantization specifics and restrictions
|
||||
#######################################
|
||||
|
||||
## Quantization specifics and restrictions
|
||||
In general, OpenVINO can represent and execute quantized models from different sources. However, the Post-training Optimization Tool (POT)
|
||||
is considered the default way to get optimized models. Since the POT supports HW-aware quantization it means that specific rules can be implemented in it for
|
||||
the particular HW. However, it is reasonable to have compatibility with general-purpose HW such as CPU and GPU and support their quantization schemes.
|
||||
Below we define these rules as follows:
|
||||
- Support of mixed-precision models where some layers can be kept in the floating-point precision.
|
||||
- Per-channel quantization of weights of Convolutional and Fully-Connected layers.
|
||||
- Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.
|
||||
- Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.
|
||||
- Non-unified quantization parameters for Eltwise and Concat operations.
|
||||
- Non-quantized models output, i.e. there are no quantization parameters for it.
|
||||
|
||||
[qdq_propagation]: images/qdq_propagation.png
|
||||
* Support of mixed-precision models where some layers can be kept in the floating-point precision.
|
||||
* Per-channel quantization of weights of Convolutional and Fully-Connected layers.
|
||||
* Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.
|
||||
* Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.
|
||||
* Non-unified quantization parameters for Eltwise and Concat operations.
|
||||
* Non-quantized network output, i.e. there are no quantization parameters for it.
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -9,10 +9,11 @@
|
||||
openvino_docs_ov_plugin_dg_quantized_models
|
||||
openvino_docs_OV_UG_lpt
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
The guides below provides extra information about specific features of OpenVINO needed for understanding during OpenVINO plugin development:
|
||||
|
||||
* [Quantized networks](@ref openvino_docs_ov_plugin_dg_quantized_models)
|
||||
* [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
|
||||
* [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide
|
||||
* :doc:`Quantized networks <openvino_docs_ov_plugin_dg_quantized_models>`
|
||||
* :doc:`Low precision transformations guide <openvino_docs_OV_UG_lpt>`
|
||||
* :doc:`Writing OpenVINO™ transformations guide <openvino_docs_transformations>`
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
|
@ -1,11 +1,21 @@
|
||||
# AvgPoolPrecisionPreserved Attribute {#openvino_docs_OV_UG_lpt_AvgPoolPrecisionPreserved}
|
||||
|
||||
ngraph::AvgPoolPrecisionPreservedAttribute class represents the `AvgPoolPrecisionPreserved` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
Utility attribute, which is used only during `AvgPool` operation, precision preserved property definition.
|
||||
:ref:`ngraph::AvgPoolPrecisionPreservedAttribute <doxid-classngraph_1_1_avg_pool_precision_preserved_attribute>` class represents the ``AvgPoolPrecisionPreserved`` attribute.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | Yes |
|
||||
| Defined | Operation |
|
||||
| Properties | value (boolean) |
|
||||
Utility attribute, which is used only during ``AvgPool`` operation, precision preserved property definition.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- Yes
|
||||
* - Defined
|
||||
- Operation
|
||||
* - Properties
|
||||
- value (boolean)
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,11 +1,21 @@
|
||||
# IntervalsAlignment Attribute {#openvino_docs_OV_UG_lpt_IntervalsAlignment}
|
||||
|
||||
ngraph::IntervalsAlignmentAttribute class represents the `IntervalsAlignment` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
The attribute defines a subgraph with the same quantization intervals alignment. `FakeQuantize` operations are included. The attribute is used by quantization operations.
|
||||
:ref:`ngraph::IntervalsAlignmentAttribute <doxid-classngraph_1_1_intervals_alignment_attribute>` class represents the ``IntervalsAlignment`` attribute.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | Yes |
|
||||
| Defined | Operation |
|
||||
| Properties | combined interval, minimal interval, minimal levels, preferable precisions |
|
||||
The attribute defines a subgraph with the same quantization intervals alignment. ``FakeQuantize`` operations are included. The attribute is used by quantization operations.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- Yes
|
||||
* - Defined
|
||||
- Operation
|
||||
* - Properties
|
||||
- combined interval, minimal interval, minimal levels, preferable precisions
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,11 +1,21 @@
|
||||
# PrecisionPreserved Attribute {#openvino_docs_OV_UG_lpt_PrecisionPreserved}
|
||||
|
||||
ngraph::PrecisionPreservedAttribute class represents the `PrecisionPreserved` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
:ref:`ngraph::PrecisionPreservedAttribute <doxid-classngraph_1_1_precision_preserved_attribute>` class represents the ``PrecisionPreserved`` attribute.
|
||||
|
||||
The attribute defines a precision preserved operation. If the attribute is absent, then an operation is not precision preserved.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | Yes |
|
||||
| Defined | Operation |
|
||||
| Properties | value (boolean) |
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- Yes
|
||||
* - Defined
|
||||
- Operation
|
||||
* - Properties
|
||||
- value (boolean)
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,11 +1,21 @@
|
||||
# Precisions Attribute {#openvino_docs_OV_UG_lpt_Precisions}
|
||||
|
||||
ngraph::PrecisionsAttribute class represents the `Precisions` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
:ref:`ngraph::PrecisionsAttribute <doxid-classngraph_1_1_precisions_attribute>` class represents the ``Precisions`` attribute.
|
||||
|
||||
The attribute defines precision which is required for input/output port or an operation.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | Yes |
|
||||
| Defined | Operation, input port, output port |
|
||||
| Properties | precisions |
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- Yes
|
||||
* - Defined
|
||||
- Operation, input port, output port
|
||||
* - Properties
|
||||
- precisions
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,11 +1,21 @@
|
||||
# QuantizationAlignment Attribute {#openvino_docs_OV_UG_lpt_QuantizationAlignment}
|
||||
|
||||
ngraph::QuantizationAlignmentAttribute class represents the `QuantizationAlignment` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
The attribute defines a subgraph with the same quantization alignment. `FakeQuantize` operations are not included. The attribute is used by quantization operations.
|
||||
:ref:`ngraph::QuantizationAlignmentAttribute <doxid-classngraph_1_1_quantization_alignment_attribute>` class represents the ``QuantizationAlignment`` attribute.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | Yes |
|
||||
| Defined | Operation |
|
||||
| Properties | value (boolean) |
|
||||
The attribute defines a subgraph with the same quantization alignment. ``FakeQuantize`` operations are not included. The attribute is used by quantization operations.
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- Yes
|
||||
* - Defined
|
||||
- Operation
|
||||
* - Properties
|
||||
- value (boolean)
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,11 +1,21 @@
|
||||
# QuantizationGranularity Attribute {#openvino_docs_OV_UG_lpt_QuantizationGranularity}
|
||||
|
||||
ngraph::QuantizationAttribute class represents the `QuantizationGranularity` attribute.
|
||||
@sphinxdirective
|
||||
|
||||
ngraph::QuantizationAttribute class represents the ``QuantizationGranularity`` attribute.
|
||||
|
||||
The attribute defines quantization granularity of operation inputs.
|
||||
|
||||
| Property name | Values |
|
||||
|---------------|----------------------------------------------|
|
||||
| Required | No |
|
||||
| Defined | Input ports |
|
||||
| Properties | Quantization granularity |
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - Property name
|
||||
- Values
|
||||
* - Required
|
||||
- No
|
||||
* - Defined
|
||||
- Input ports
|
||||
* - Properties
|
||||
- Quantization granularity
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -15,305 +15,454 @@
|
||||
Step 3. Main transformations <openvino_docs_OV_UG_lpt_step3_main>
|
||||
Step 4. Cleanup transformations <openvino_docs_OV_UG_lpt_step4_cleanup>
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
## Introduction
|
||||
Introduction
|
||||
############
|
||||
|
||||
Low precision transformations (known as LPT) are a set of nGraph transformations, which are combined in one library. The library is mandatory part of OpenVINO to infer quantized model in low precision with the maximum performance on Intel CPU, GPU and ARM platforms. The library includes more than 45 transformations and supports more then 30 operations. Some transformations are mandatory, some of them are optional and developed for specific device.
|
||||
|
||||
The goal of Low Precision Transformations (LPT) is to transform a quantized model from its original precision (FP16 or FP32) to a low precision (INT8: `signed int8` or `unsigned int8`), so that it is prepared for low precision inference in OpenVINO™ plugin. It is achieved by two main principles:
|
||||
1. `FakeQuantize` operation decomposition to two parts:
|
||||
- part #1: quantize operation - new `FakeQuantize` operation with output quantization intervals in low precision range (signed int8: [-128, 127] or [-127, 127], unsigned int8: [0, 255] or [0, 256]) and with low precision output (`signed int8` or `unsigned int8`),
|
||||
- part #2: dequantization operations with low precision input and original precision output.
|
||||
The goal of Low Precision Transformations (LPT) is to transform a quantized model from its original precision (FP16 or FP32) to a low precision (INT8: ``signed int8`` or ``unsigned int8``), so that it is prepared for low precision inference in OpenVINO™ plugin. It is achieved by two main principles:
|
||||
|
||||
1. ``FakeQuantize`` operation decomposition to two parts:
|
||||
|
||||
* part 1: quantize operation - new ``FakeQuantize`` operation with output quantization intervals in low precision range (signed int8: [-128, 127] or [-127, 127], unsigned int8: [0, 255] or [0, 256]) and with low precision output (``signed int8`` or ``unsigned int8``).
|
||||
|
||||
* part 2: dequantization operations with low precision input and original precision output.
|
||||
|
||||
2. Propagation of the dequantization operation through original model's operations. It is done to avoid dequantization operations before original model operations, thus the quantize operations with low precision output remain before the original model operations.
|
||||
|
||||
As result, operation input tensor precisions will be changed from original to low precision and operations can be inferred by OpenVINO™ plugin in low precision.
|
||||
|
||||
For a more detailed description on how to quantize a model, see the [Low precision tools](#low-precision-tools) section below. For more information about model quantization, refer to **Brief History of Lower Precision in Deep Learning** section in [this whitepaper](https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training).
|
||||
For a more detailed description on how to quantize a model, see the `Low precision tools <#low-precision-tools>`__ section below. For more information about model quantization, refer to **Brief History of Lower Precision in Deep Learning** section in `this whitepaper <https://software.intel.com/en-us/articles/lower-numerical-precision-deep-learning-inference-and-training>`__.
|
||||
|
||||
## Input model requirements
|
||||
Input model requirements
|
||||
########################
|
||||
|
||||
LPT transformations propagate dequantization operations through the following operations:
|
||||
* [Add-1](@ref openvino_docs_ops_arithmetic_Add_1)
|
||||
* [AvgPool-1](@ref openvino_docs_ops_pooling_AvgPool_1)
|
||||
* [Clamp-1](@ref openvino_docs_ops_activation_Clamp_1)
|
||||
* [Concat-1](@ref openvino_docs_ops_movement_Concat_1)
|
||||
* [Convolution-1](@ref openvino_docs_ops_convolution_Convolution_1)
|
||||
* [ConvolutionBackpropData-1](@ref openvino_docs_ops_convolution_ConvolutionBackpropData_1)
|
||||
* [DepthToSpace-1](@ref openvino_docs_ops_movement_DepthToSpace_1)
|
||||
* [FakeQuantize-1](@ref openvino_docs_ops_quantization_FakeQuantize_1)
|
||||
* [GroupConvolution-1](@ref openvino_docs_ops_convolution_GroupConvolution_1)
|
||||
* [Interpolate-1](@ref openvino_docs_ops_image_Interpolate_1)
|
||||
* [Interpolate-4](@ref openvino_docs_ops_image_Interpolate_4)
|
||||
* [MatMul-1](@ref openvino_docs_ops_matrix_MatMul_1)
|
||||
* [MaxPool-1](@ref openvino_docs_ops_pooling_MaxPool_1)
|
||||
* [Multiply-1](@ref openvino_docs_ops_arithmetic_Multiply_1)
|
||||
* [MVN-1](@ref openvino_docs_ops_normalization_MVN_1)
|
||||
* [NormalizeL2-1](@ref openvino_docs_ops_normalization_NormalizeL2_1)
|
||||
* [PRelu-1](@ref openvino_docs_ops_activation_PReLU_1)
|
||||
* [ReduceMax-1](@ref openvino_docs_ops_reduction_ReduceMax_1)
|
||||
* [ReduceMean-1](@ref openvino_docs_ops_reduction_ReduceMean_1)
|
||||
* [ReduceMin-1](@ref openvino_docs_ops_reduction_ReduceMin_1)
|
||||
* [ReduceSum-1](@ref openvino_docs_ops_reduction_ReduceSum_1)
|
||||
* [Relu-1](@ref openvino_docs_ops_activation_ReLU_1)
|
||||
* [Reshape-1](@ref openvino_docs_ops_shape_Reshape_1)
|
||||
* [Split-1](@ref openvino_docs_ops_movement_Split_1)
|
||||
* [Squeeze-1](@ref openvino_docs_ops_shape_Reshape_1)
|
||||
* [StridedSlice-1](@ref openvino_docs_ops_movement_StridedSlice_1)
|
||||
* [Transpose-1](@ref openvino_docs_ops_movement_Transpose_1)
|
||||
* [Gather-7](@ref openvino_docs_ops_movement_Gather_7)
|
||||
* [Gather-8](@ref openvino_docs_ops_movement_Gather_8)
|
||||
* [Unsqueeze-1](@ref openvino_docs_ops_shape_Unsqueeze_1)
|
||||
* [VariadicSplit-1](@ref openvino_docs_ops_movement_VariadicSplit_1)
|
||||
|
||||
* :doc:`Add-1 <openvino_docs_ops_arithmetic_Add_1>`
|
||||
* :doc:`AvgPool-1 <openvino_docs_ops_pooling_AvgPool_1>`
|
||||
* :doc:`Clamp-1 <openvino_docs_ops_activation_Clamp_1>`
|
||||
* :doc:`Concat-1 <openvino_docs_ops_movement_Concat_1>`
|
||||
* :doc:`Convolution-1 <openvino_docs_ops_convolution_Convolution_1>`
|
||||
* :doc:`ConvolutionBackpropData-1 <openvino_docs_ops_convolution_ConvolutionBackpropData_1>`
|
||||
* :doc:`DepthToSpace-1 <openvino_docs_ops_movement_DepthToSpace_1>`
|
||||
* :doc:`FakeQuantize-1 <openvino_docs_ops_quantization_FakeQuantize_1>`
|
||||
* :doc:`GroupConvolution-1 <openvino_docs_ops_convolution_GroupConvolution_1>`
|
||||
* :doc:`Interpolate-1 <openvino_docs_ops_image_Interpolate_1>`
|
||||
* :doc:`Interpolate-4 <openvino_docs_ops_image_Interpolate_4>`
|
||||
* :doc:`MatMul-1 <openvino_docs_ops_matrix_MatMul_1>`
|
||||
* :doc:`MaxPool-1 <openvino_docs_ops_pooling_MaxPool_1>`
|
||||
* :doc:`Multiply-1 <openvino_docs_ops_arithmetic_Multiply_1>`
|
||||
* :doc:`MVN-1 <openvino_docs_ops_normalization_MVN_1>`
|
||||
* :doc:`NormalizeL2-1 <openvino_docs_ops_normalization_NormalizeL2_1>`
|
||||
* :doc:`PRelu-1 <openvino_docs_ops_activation_PReLU_1>`
|
||||
* :doc:`ReduceMax-1 <openvino_docs_ops_reduction_ReduceMax_1>`
|
||||
* :doc:`ReduceMean-1 <openvino_docs_ops_reduction_ReduceMean_1>`
|
||||
* :doc:`ReduceMin-1 <openvino_docs_ops_reduction_ReduceMin_1>`
|
||||
* :doc:`ReduceSum-1 <openvino_docs_ops_reduction_ReduceSum_1>`
|
||||
* :doc:`Relu-1 <openvino_docs_ops_activation_ReLU_1>`
|
||||
* :doc:`Reshape-1 <openvino_docs_ops_shape_Reshape_1>`
|
||||
* :doc:`Split-1 <openvino_docs_ops_movement_Split_1>`
|
||||
* :doc:`Squeeze-1 <openvino_docs_ops_shape_Reshape_1>`
|
||||
* :doc:`StridedSlice-1 <openvino_docs_ops_movement_StridedSlice_1>`
|
||||
* :doc:`Transpose-1 <openvino_docs_ops_movement_Transpose_1>`
|
||||
* :doc:`Gather-7 <openvino_docs_ops_movement_Gather_7>`
|
||||
* :doc:`Gather-8 <openvino_docs_ops_movement_Gather_8>`
|
||||
* :doc:`Unsqueeze-1 <openvino_docs_ops_shape_Unsqueeze_1>`
|
||||
* :doc:`VariadicSplit-1 <openvino_docs_ops_movement_VariadicSplit_1>`
|
||||
|
||||
If operation is not supported by LPT then dequantization operation will not be propagated, input tensor precisions will not be changed to low precision and operation will be executed in original precision.
|
||||
|
||||
For example, if you would like to infer a model with `Convolution` operation in low precision then the model can look as on picture below:
|
||||
For example, if you would like to infer a model with ``Convolution`` operation in low precision then the model can look as on picture below:
|
||||
|
||||

|
||||
.. image:: _static/images/model_fq_and_convolution.common.svg
|
||||
:alt: Quantized Convolution
|
||||
|
||||
> There are several supported quantization approaches on activations and on weights. All supported approaches are described in [Quantization approaches](#quantization-approaches) section below. In demonstrated model [FakeQuantize operation quantization](#fakequantize-operation) approach is used.
|
||||
There are several supported quantization approaches on activations and on weights. All supported approaches are described in `Quantization approaches <#quantization-approaches>`__ section below. In demonstrated model `FakeQuantize operation quantization <#fakequantize-operation>`__ approach is used.
|
||||
|
||||
### <a name="low-precision-tools"></a> Low precision tools
|
||||
For more details on how to get a quantized model, refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
|
||||
Low precision tools
|
||||
+++++++++++++++++++
|
||||
|
||||
For more details on how to get a quantized model, refer to :doc:`Model Optimization <openvino_docs_model_optimization_guide>` document.
|
||||
|
||||
Quantization approaches
|
||||
#######################
|
||||
|
||||
## <a name="quantization-approaches"></a> Quantization approaches
|
||||
LPT transformations support two quantization approaches:
|
||||
1. `FakeQuantize` operation,
|
||||
|
||||
1. ``FakeQuantize`` operation,
|
||||
2. Quantize and dequantization operations
|
||||
|
||||
Let's explore both approaches in details on `Convolution` operation.
|
||||
### <a name="fakequantize-operation"></a> FakeQuantize operation
|
||||
In this case `FakeQuantize` operation is used on activations and quantized constant on weights. Original input model:
|
||||
Let's explore both approaches in details on ``Convolution`` operation.
|
||||
|
||||

|
||||
FakeQuantize operation
|
||||
++++++++++++++++++++++
|
||||
|
||||
### Quantize and dequantization operations
|
||||
In this case `FakeQuantize` operation and `Convert` are used as quantize operation and return quantized low precision tensor. After quantize operation on activations there are `Convert` and dequantization operations to compensate decomposition. Original input model:
|
||||
In this case ``FakeQuantize`` operation is used on activations and quantized constant on weights. Original input model:
|
||||
|
||||

|
||||
.. image:: _static/images/model_fq_and_convolution.common.svg
|
||||
:alt: Original model with FakeQuantize
|
||||
|
||||
In both cases result is the same. In LPT result model you can see, that:
|
||||
1. if necessary, `FakeQuantize` operations on activations were decomposed to two part:
|
||||
- new `FakeQuantize`operation with updated output intervals in low precision range and low precision output,
|
||||
- dequantization operations on activations;
|
||||
2. if necessary, an existing `FakeQuantize` decomposition can be reworked to get better precision;
|
||||
3. dequantization operations were propagated through `Convolution`.
|
||||
|
||||
Quantize and dequantization operations
|
||||
++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
In this case ``FakeQuantize`` operation and ``Convert`` are used as quantize operation and return quantized low precision tensor. After quantize operation on activations there are ``Convert`` and dequantization operations to compensate decomposition. Original input model:
|
||||
|
||||
.. image:: _static/images/model_qdq_and_convolution.common.svg
|
||||
:alt: Original model with Q/DQ
|
||||
|
||||
In both cases result is the same. In LPT result model you can see that:
|
||||
|
||||
1. if necessary, ``FakeQuantize`` operations on activations were decomposed to two part:
|
||||
|
||||
* new ``FakeQuantize`` operation with updated output intervals in low precision range and low precision output,
|
||||
* dequantization operations on activations;
|
||||
|
||||
2. if necessary, an existing ``FakeQuantize`` decomposition can be reworked to get better precision;
|
||||
|
||||
3. dequantization operations were propagated through ``Convolution``.
|
||||
|
||||
LPT result model:
|
||||
|
||||

|
||||
.. image:: _static/images/model_fq_and_convolution.transformed.svg
|
||||
:alt: Result model
|
||||
|
||||
Low precision transformations pipeline
|
||||
++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
### Low precision transformations pipeline
|
||||
LPT transformation pipeline has several steps. For each transformation inside one step pattern matcher is unique per transformation, but each operation can be assigned to several transformations.
|
||||
|
||||

|
||||
.. image:: _static/images/low_precision_transformation_pipeline.svg
|
||||
:alt: Low precision transformations pipeline
|
||||
|
||||
Inside each step LPT transformations handle input model operation by operation, applying transformation matching pattern for each transformation from the step to an operation, and execute transformation if pattern is matched. Decomposition transformation decomposes `FakeQuantize` to quantize and dequantization operations. Dequantization operations from previous transformation result is used for the current one and so on, until the end of the model is achieved.
|
||||
Inside each step LPT transformations handle input model operation by operation, applying transformation matching pattern for each transformation from the step to an operation, and execute transformation if pattern is matched. Decomposition transformation decomposes ``FakeQuantize`` to quantize and dequantization operations. Dequantization operations from previous transformation result is used for the current one and so on, until the end of the model is achieved.
|
||||
|
||||
As result, usually all operations are inferred by plugin in low precision. If plugin doesn't support an operation inference in low precision, then corresponding LPT transformation can be disabled, and input tensor precisions for the operation will not be changed. In this case the operation is inferred in the original precision.
|
||||
|
||||
Low precision transformations pipeline includes four steps:
|
||||
* [Step #1: Prerequisites](@ref openvino_docs_OV_UG_lpt_step1_prerequisites)
|
||||
* [Step #2: Markup transformations](@ref openvino_docs_OV_UG_lpt_step2_markup)
|
||||
* [Step #3: Main transformations](@ref openvino_docs_OV_UG_lpt_step3_main)
|
||||
* [Step #4: Cleanup transformations](@ref openvino_docs_OV_UG_lpt_step4_cleanup)
|
||||
|
||||
### Step 1. Prerequisites
|
||||
* :doc:`Step 1: Prerequisites <openvino_docs_OV_UG_lpt_step1_prerequisites>`
|
||||
* :doc:`Step 2: Markup transformations <openvino_docs_OV_UG_lpt_step2_markup>`
|
||||
* :doc:`Step 3: Main transformations <openvino_docs_OV_UG_lpt_step3_main>`
|
||||
* :doc:`Step 4: Cleanup transformations <openvino_docs_OV_UG_lpt_step4_cleanup>`
|
||||
|
||||
Step 1. Prerequisites
|
||||
---------------------
|
||||
|
||||
This step fuses and propagates some operations in the model to prepare for the next step. It is required for OpenVINO plugins. Transformations:
|
||||
* [PullReshapeThroughDequantization](@ref openvino_docs_OV_UG_lpt_PullReshapeThroughDequantization)
|
||||
* [PullTransposeThroughDequantization](@ref openvino_docs_OV_UG_lpt_PullTransposeThroughDequantization)
|
||||
* [LinOpSequenceFusion](@ref openvino_docs_OV_UG_lpt_LinOpSequenceFusion)
|
||||
|
||||
The model on this step is changed. There are more details in developer guide [Prerequisites transformations](@ref openvino_docs_OV_UG_lpt_step1_prerequisites).
|
||||
* :doc:`PullReshapeThroughDequantization <openvino_docs_OV_UG_lpt_PullReshapeThroughDequantization>`
|
||||
* :doc:`PullTransposeThroughDequantization <openvino_docs_OV_UG_lpt_PullTransposeThroughDequantization>`
|
||||
* :doc:`LinOpSequenceFusion <openvino_docs_OV_UG_lpt_LinOpSequenceFusion>`
|
||||
|
||||
The model on this step is changed. There are more details in developer guide :doc:`Prerequisites transformations <openvino_docs_OV_UG_lpt_step1_prerequisites>`.
|
||||
|
||||
Step 2. Markup
|
||||
--------------
|
||||
|
||||
### Step 2. Markup
|
||||
This step creates runtime attributes for operations. These attributes will be used in next step. Transformations:
|
||||
* [MarkupBias](@ref openvino_docs_OV_UG_lpt_MarkupBias)
|
||||
* [MarkupCanBeQuantized](@ref openvino_docs_OV_UG_lpt_MarkupCanBeQuantized)
|
||||
* [MarkupPrecisions](@ref openvino_docs_OV_UG_lpt_MarkupPrecisions)
|
||||
* [MarkupPerTensorQuantization](@ref openvino_docs_OV_UG_lpt_MarkupPerTensorQuantization)
|
||||
* [MarkupAvgPoolPrecisionPreserved](@ref openvino_docs_OV_UG_lpt_MarkupAvgPoolPrecisionPreserved)
|
||||
* [PropagatePrecisions](@ref openvino_docs_OV_UG_lpt_PropagatePrecisions)
|
||||
* [AlignQuantizationIntervals](@ref openvino_docs_OV_UG_lpt_AlignQuantizationIntervals)
|
||||
* [AlignQuantizationParameters](@ref openvino_docs_OV_UG_lpt_AlignQuantizationParameters)
|
||||
|
||||
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide [Markup transformations](@ref openvino_docs_OV_UG_lpt_step2_markup).
|
||||
* :doc:`MarkupBias <openvino_docs_OV_UG_lpt_MarkupBias>`
|
||||
* :doc:`MarkupCanBeQuantized <openvino_docs_OV_UG_lpt_MarkupCanBeQuantized>`
|
||||
* :doc:`MarkupPrecisions <openvino_docs_OV_UG_lpt_MarkupPrecisions>`
|
||||
* :doc:`MarkupPerTensorQuantization <openvino_docs_OV_UG_lpt_MarkupPerTensorQuantization>`
|
||||
* :doc:`MarkupAvgPoolPrecisionPreserved <openvino_docs_OV_UG_lpt_MarkupAvgPoolPrecisionPreserved>`
|
||||
* :doc:`PropagatePrecisions <openvino_docs_OV_UG_lpt_PropagatePrecisions>`
|
||||
* :doc:`AlignQuantizationIntervals <openvino_docs_OV_UG_lpt_AlignQuantizationIntervals>`
|
||||
* :doc:`AlignQuantizationParameters <openvino_docs_OV_UG_lpt_AlignQuantizationParameters>`
|
||||
|
||||
### Step 3. Main transformations, FakeQuantize decomposition and dequantization operations handling
|
||||
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide [Main transformations](@ref openvino_docs_OV_UG_lpt_step3_main). Transformations:
|
||||
* [AddTransformation](@ref openvino_docs_OV_UG_lpt_AddTransformation)
|
||||
* [AvgPoolTransformation](@ref openvino_docs_OV_UG_lpt_AvgPoolTransformation)
|
||||
* [ClampTransformation](@ref openvino_docs_OV_UG_lpt_AvgPoolTransformation)
|
||||
* [ConcatTransformation](@ref openvino_docs_OV_UG_lpt_ConcatTransformation)
|
||||
* [ConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_ConvolutionTransformation)
|
||||
* [ConvolutionBackpropDataTransformation](@ref openvino_docs_OV_UG_lpt_ConvolutionBackpropDataTransformation)
|
||||
* [DepthToSpaceTransformation](@ref openvino_docs_OV_UG_lpt_DepthToSpaceTransformation)
|
||||
* [FakeQuantizeDecompositionTransformation](@ref openvino_docs_OV_UG_lpt_FakeQuantizeDecompositionTransformation)
|
||||
* [FakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FakeQuantizeTransformation)
|
||||
* [InterpolateTransformation](@ref openvino_docs_OV_UG_lpt_InterpolateTransformation)
|
||||
* [GroupConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_GroupConvolutionTransformation)
|
||||
* [GatherTransformation](@ref openvino_docs_OV_UG_lpt_GatherTransformation)
|
||||
* [MatMulTransformation](@ref openvino_docs_OV_UG_lpt_MatMulTransformation)
|
||||
* [MaxPoolTransformation](@ref openvino_docs_OV_UG_lpt_MaxPoolTransformation)
|
||||
* [MultiplyTransformation](@ref openvino_docs_OV_UG_lpt_MultiplyTransformation)
|
||||
* [MVNTransformation](@ref openvino_docs_OV_UG_lpt_MVNTransformation)
|
||||
* [NormalizeL2Transformation](@ref openvino_docs_OV_UG_lpt_NormalizeL2Transformation)
|
||||
* [PReluTransformation](@ref openvino_docs_OV_UG_lpt_PReluTransformation)
|
||||
* [ReduceMaxTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMaxTransformation)
|
||||
* [ReduceMeanTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMeanTransformation)
|
||||
* [ReduceMinTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMinTransformation)
|
||||
* [ReduceSumTransformation](@ref openvino_docs_OV_UG_lpt_ReduceSumTransformation)
|
||||
* [ReluTransformation](@ref openvino_docs_OV_UG_lpt_ReluTransformation)
|
||||
* [ReshapeTransformation](@ref openvino_docs_OV_UG_lpt_ReshapeTransformation)
|
||||
* [SqueezeTransformation](@ref openvino_docs_OV_UG_lpt_SqueezeTransformation)
|
||||
* [ShuffleChannelsTransformation](@ref openvino_docs_OV_UG_lpt_ShuffleChannelsTransformation)
|
||||
* [SplitTransformation](@ref openvino_docs_OV_UG_lpt_SplitTransformation)
|
||||
* [StridedSliceTransformation](@ref openvino_docs_OV_UG_lpt_StridedSliceTransformation)
|
||||
* [TransposeTransformation](@ref openvino_docs_OV_UG_lpt_TransposeTransformation)
|
||||
* [UnsqueezeTransformation](@ref openvino_docs_OV_UG_lpt_UnsqueezeTransformation)
|
||||
* [VariadicSplitTransformation](@ref openvino_docs_OV_UG_lpt_VariadicSplitTransformation)
|
||||
The model on this step is changed: only new attributes are added to some operations. There are more details in developer guide :doc:`Markup transformations <openvino_docs_OV_UG_lpt_step2_markup>`.
|
||||
|
||||
#### Decomposition transformations
|
||||
Decomposition transformations decompose the `FakeQuantize` operation to: quantize (`FakeQuantize` with low precision output) and dequantization operations (opposite to quantize, with low precision input and the original precision output). For dequantization operations LPT uses three operations: `Convert`, `Subtract` and `Multiply`. Element-wise operations `Subtract` and `Multiply` have constants on the second branches. If dequantization operations are not handled at the end of LPT pipeline, then they will be fused back to the `FakeQuantize`.
|
||||
Step 3. Main transformations, FakeQuantize decomposition and dequantization operations handling
|
||||
-----------------------------------------------------------------------------------------------
|
||||
|
||||
This step has the most transformations. These transformations can be separated in two groups: decomposition transformation and dequantization operations handling. There are more details in developer guide :doc:`Main transformations <openvino_docs_OV_UG_lpt_step3_main>`.
|
||||
|
||||
Transformations:
|
||||
|
||||
* :doc:`AddTransformation <openvino_docs_OV_UG_lpt_AddTransformation>`
|
||||
* :doc:`AvgPoolTransformation <openvino_docs_OV_UG_lpt_AvgPoolTransformation>`
|
||||
* :doc:`ClampTransformation <openvino_docs_OV_UG_lpt_AvgPoolTransformation>`
|
||||
* :doc:`ConcatTransformation <openvino_docs_OV_UG_lpt_ConcatTransformation>`
|
||||
* :doc:`ConvolutionTransformation <openvino_docs_OV_UG_lpt_ConvolutionTransformation>`
|
||||
* :doc:`ConvolutionBackpropDataTransformation <openvino_docs_OV_UG_lpt_ConvolutionBackpropDataTransformation>`
|
||||
* :doc:`DepthToSpaceTransformation <openvino_docs_OV_UG_lpt_DepthToSpaceTransformation>`
|
||||
* :doc:`FakeQuantizeDecompositionTransformation <openvino_docs_OV_UG_lpt_FakeQuantizeDecompositionTransformation>`
|
||||
* :doc:`FakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FakeQuantizeTransformation>`
|
||||
* :doc:`InterpolateTransformation <openvino_docs_OV_UG_lpt_InterpolateTransformation>`
|
||||
* :doc:`GroupConvolutionTransformation <openvino_docs_OV_UG_lpt_GroupConvolutionTransformation>`
|
||||
* :doc:`GatherTransformation <openvino_docs_OV_UG_lpt_GatherTransformation>`
|
||||
* :doc:`MatMulTransformation <openvino_docs_OV_UG_lpt_MatMulTransformation>`
|
||||
* :doc:`MaxPoolTransformation <openvino_docs_OV_UG_lpt_MaxPoolTransformation>`
|
||||
* :doc:`MultiplyTransformation <openvino_docs_OV_UG_lpt_MultiplyTransformation>`
|
||||
* :doc:`MVNTransformation <openvino_docs_OV_UG_lpt_MVNTransformation>`
|
||||
* :doc:`NormalizeL2Transformation <openvino_docs_OV_UG_lpt_NormalizeL2Transformation>`
|
||||
* :doc:`PReluTransformation <openvino_docs_OV_UG_lpt_PReluTransformation>`
|
||||
* :doc:`ReduceMaxTransformation <openvino_docs_OV_UG_lpt_ReduceMaxTransformation>`
|
||||
* :doc:`ReduceMeanTransformation <openvino_docs_OV_UG_lpt_ReduceMeanTransformation>`
|
||||
* :doc:`ReduceMinTransformation <openvino_docs_OV_UG_lpt_ReduceMinTransformation>`
|
||||
* :doc:`ReduceSumTransformation <openvino_docs_OV_UG_lpt_ReduceSumTransformation>`
|
||||
* :doc:`ReluTransformation <openvino_docs_OV_UG_lpt_ReluTransformation>`
|
||||
* :doc:`ReshapeTransformation <openvino_docs_OV_UG_lpt_ReshapeTransformation>`
|
||||
* :doc:`SqueezeTransformation <openvino_docs_OV_UG_lpt_SqueezeTransformation>`
|
||||
* :doc:`ShuffleChannelsTransformation <openvino_docs_OV_UG_lpt_ShuffleChannelsTransformation>`
|
||||
* :doc:`SplitTransformation <openvino_docs_OV_UG_lpt_SplitTransformation>`
|
||||
* :doc:`StridedSliceTransformation <openvino_docs_OV_UG_lpt_StridedSliceTransformation>`
|
||||
* :doc:`TransposeTransformation <openvino_docs_OV_UG_lpt_TransposeTransformation>`
|
||||
* :doc:`UnsqueezeTransformation <openvino_docs_OV_UG_lpt_UnsqueezeTransformation>`
|
||||
* :doc:`VariadicSplitTransformation <openvino_docs_OV_UG_lpt_VariadicSplitTransformation>`
|
||||
|
||||
Decomposition transformations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Decomposition transformations decompose the ``FakeQuantize`` operation to: quantize (``FakeQuantize`` with low precision output) and dequantization operations (opposite to quantize, with low precision input and the original precision output). For dequantization operations LPT uses three operations: ``Convert``, ``Subtract`` and ``Multiply``. Element-wise operations ``Subtract`` and ``Multiply`` have constants on the second branches. If dequantization operations are not handled at the end of LPT pipeline, then they will be fused back to the ``FakeQuantize``.
|
||||
|
||||
|
||||
Original `FakeQuantize`:
|
||||

|
||||
Original ``FakeQuantize``:
|
||||
|
||||
.. image:: _static/images/fq.common.svg
|
||||
:alt: FakeQuantize operation before LPT
|
||||
|
||||
`FakeQuantize` after decomposition to quantization and dequantization operations:
|
||||

|
||||
``FakeQuantize`` after decomposition to quantization and dequantization operations:
|
||||
|
||||
.. image:: _static/images/fq.transformed.svg
|
||||
:alt: FakeQuantize operation after LPT
|
||||
|
||||
#### Dequantization operations handling transformations
|
||||
Dequantization operations handling transformations
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
In this step, LPT transformations fuse dequantization operations or move them through existing model operations as much as possible.
|
||||
|
||||
Original `Convolution` operation in FP32 with dequantization operations before:
|
||||

|
||||
Original ``Convolution`` operation in FP32 with dequantization operations before:
|
||||
|
||||
`Convolution` operation in INT8 after decomposition and dequantization operations handling:
|
||||

|
||||
.. image:: _static/images/model_fq_and_convolution.common.svg
|
||||
:alt: Convolution operation before LPT
|
||||
|
||||
### Step 4: Cleanup of the result model
|
||||
LPT cleanup transformations is final stage in LPT pipeline. In this step LPT transformations clean up the result model to avoid not handled dequantization operations: fuse dequantization operations if possible (fuse at least `Convert` operations if not) to other model operations to cleanup result model. Transformations:
|
||||
* [FoldConvertTransformation](@ref openvino_docs_OV_UG_lpt_FoldConvertTransformation)
|
||||
* [FoldFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FoldFakeQuantizeTransformation)
|
||||
* [FuseConvertTransformation](@ref openvino_docs_OV_UG_lpt_FuseConvertTransformation)
|
||||
* [FuseMultiplyToFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FuseMultiplyToFakeQuantizeTransformation)
|
||||
* [FuseSubtractToFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FuseSubtractToFakeQuantizeTransformation)
|
||||
* [MultiplyToGroupConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_MultiplyToGroupConvolutionTransformation)
|
||||
``Convolution`` operation in INT8 after decomposition and dequantization operations handling:
|
||||
|
||||
There are more details in developer guide [Cleanup transformations](@ref openvino_docs_OV_UG_lpt_step4_cleanup).
|
||||
|
||||
`FakeQuantize` operation with not handled dequantization operations:
|
||||

|
||||
|
||||
`FakeQuantize` operation with fused dequantization operations:
|
||||

|
||||
.. image:: _static/images/model_fq_and_convolution.transformed.svg
|
||||
:alt: Convolution operation after LPT
|
||||
|
||||
|
||||
Step 4: Cleanup of the result model
|
||||
-----------------------------------
|
||||
|
||||
LPT cleanup transformations is final stage in LPT pipeline. In this step LPT transformations clean up the result model to avoid not handled dequantization operations: fuse dequantization operations if possible (fuse at least ``Convert`` operations if not` to other model operations to cleanup result model).
|
||||
|
||||
Transformations:
|
||||
|
||||
* :doc:`FoldConvertTransformation <openvino_docs_OV_UG_lpt_FoldConvertTransformation>`
|
||||
* :doc:`FoldFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FoldFakeQuantizeTransformation>`
|
||||
* :doc:`FuseConvertTransformation <openvino_docs_OV_UG_lpt_FuseConvertTransformation>`
|
||||
* :doc:`FuseMultiplyToFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FuseMultiplyToFakeQuantizeTransformation>`
|
||||
* :doc:`FuseSubtractToFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FuseSubtractToFakeQuantizeTransformation>`
|
||||
* :doc:`MultiplyToGroupConvolutionTransformation <openvino_docs_OV_UG_lpt_MultiplyToGroupConvolutionTransformation>`
|
||||
|
||||
There are more details in developer guide :doc:`Cleanup transformations <openvino_docs_OV_UG_lpt_step4_cleanup>`.
|
||||
|
||||
``FakeQuantize`` operation with not handled dequantization operations:
|
||||
|
||||
.. image:: _static/images/fq.transformed.svg
|
||||
:alt: TODO: FakeQuantize operation with dequantization operations before LPT
|
||||
|
||||
``FakeQuantize`` operation with fused dequantization operations:
|
||||
|
||||
.. image:: _static/images/fq.common.svg
|
||||
:alt: TODO: FakeQuantize operation with fused operations after LPT
|
||||
|
||||
|
||||
Low precision transformations in plugin transformation pipeline
|
||||
###############################################################
|
||||
|
||||
## Low precision transformations in plugin transformation pipeline
|
||||
Typical transformation pipeline described below.
|
||||
|
||||
### Step 1. Common optimizations
|
||||
Step 1. Common optimizations
|
||||
++++++++++++++++++++++++++++
|
||||
|
||||
This step is optional for LPT but typically is presented in OpenVINO™ plugins. The step doesn't use any LPT transformation. Firstly, the step disables dequantization operations constant folding on constant subgraph on weights to prevent the lost of dequantization info on the next plugin transformations. After that, it optimizes nGraph function and convert operations to operation set 1. Typically, usage of this step is the simplest way to meet LPT requirements for the input quantized model. If plugin can guarantee that LPT input requirements are met, then this step can be skipped.
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp lpt_common
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [lpt_common]
|
||||
|
||||
### Step 2. Low precision transformations execution
|
||||
Step 2. Low precision transformations execution
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++
|
||||
This step is mandatory. It configures and runs LPT transformations.
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp lpt_execution
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [lpt_execution]
|
||||
|
||||
Step 3. Plugin-specific transformations
|
||||
+++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
### Step 3. Plugin-specific transformations
|
||||
This step is optional. It modifies the nGraph function to a device-specific operation set.
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp lpt_device
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [lpt_device]
|
||||
|
||||
## Result model overview
|
||||
Result model overview
|
||||
#####################
|
||||
|
||||
Let's explore quantized `TensorFlow implementation of ResNet-50 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf>`__ model. Use `Model Downloader <https://docs.openvino.ai/2022.3/omz_tools_downloader.html>`__ tool to download the ``fp16`` model from `OpenVINO™ Toolkit - Open Model Zoo repository <https://github.com/openvinotoolkit/open_model_zoo>`__:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
Let's explore quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model. Use [Model Downloader](@ref omz_tools_downloader) tool to download the `fp16` model from [OpenVINO™ Toolkit - Open Model Zoo repository](https://github.com/openvinotoolkit/open_model_zoo):
|
||||
```sh
|
||||
omz_downloader --name resnet-50-tf --precisions FP16-INT8
|
||||
```
|
||||
After that you should quantize model by the [Model Quantizer](@ref omz_tools_downloader) tool.
|
||||
```sh
|
||||
|
||||
After that you should quantize model by the `Model Quantizer <https://docs.openvino.ai/2022.3/omz_tools_downloader.html>`__ tool.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
omz_quantizer --model_dir public/resnet-50-tf --dataset_dir <DATASET_DIR> --precisions=FP16-INT8
|
||||
```
|
||||
|
||||
### Inference
|
||||
|
||||
The simplest way to infer the model and collect performance counters is [Benchmark Application](../../../../samples/cpp/benchmark_app/README.md).
|
||||
```sh
|
||||
Inference
|
||||
+++++++++
|
||||
|
||||
The simplest way to infer the model and collect performance counters is :doc:`Benchmark Application <openvino_inference_engine_samples_benchmark_app_README>`.
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
./benchmark_app -m resnet-50-tf.xml -d CPU -niter 1 -api sync -report_type average_counters -report_folder pc_report_dir
|
||||
```
|
||||
|
||||
If you infer the model with the OpenVINO™ CPU plugin and collect performance counters, all operations (except last not quantized SoftMax) are executed in INT8 precision.
|
||||
|
||||
### Results analysis
|
||||
Results analysis
|
||||
++++++++++++++++
|
||||
|
||||
Result model depends on different factors:
|
||||
* The original model quantization possibility and quantization quality. For some models, some operations are not possible to be quantized by POT and NNCF tools. In this case `FakeQuantize` operations are absent before these operations and they will be inferred in original precision.
|
||||
|
||||
* The original model quantization possibility and quantization quality. For some models, some operations are not possible to be quantized by POT and NNCF tools. In this case ``FakeQuantize`` operations are absent before these operations and they will be inferred in original precision.
|
||||
* LPT customization and plugin supported operations. If plugin doesn't support INT8 inference for some operation then corresponding LPT transformation should be disabled and the operation will be inferred in original precision.
|
||||
|
||||
|
||||
Information about layer precision is stored in the performance counters that are
|
||||
available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized [TensorFlow* implementation of ResNet-50](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf) model inference on CPU Plugin looks as follows:
|
||||
available from the OpenVINO Runtime API. For example, the part of performance counters table for quantized `TensorFlow implementation of ResNet-50 <https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-tf>`__ model inference on CPU Plugin looks as follows:
|
||||
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
* - layerName
|
||||
- execStatus
|
||||
- layerType
|
||||
- execType
|
||||
- realTime (ms)
|
||||
- cpuTime (ms)
|
||||
* - resnet_model/batch_normalization_15/FusedBatchNorm/Add
|
||||
- EXECUTED
|
||||
- Convolution
|
||||
- jit_avx512_1x1_I8
|
||||
- 0.377
|
||||
- 0.377
|
||||
* - resnet_model/conv2d_16/Conv2D/fq_input_0
|
||||
- NOT_RUN
|
||||
- FakeQuantize
|
||||
- undef
|
||||
- 0
|
||||
- 0
|
||||
* - resnet_model/batch_normalization_16/FusedBatchNorm/Add
|
||||
- EXECUTED
|
||||
- Convolution
|
||||
- jit_avx512_I8
|
||||
- 0.499
|
||||
- 0.499
|
||||
* - resnet_model/conv2d_17/Conv2D/fq_input_0
|
||||
- NOT_RUN
|
||||
- FakeQuantize
|
||||
- undef
|
||||
- 0
|
||||
- 0
|
||||
* - resnet_model/batch_normalization_17/FusedBatchNorm/Add
|
||||
- EXECUTED
|
||||
- Convolution
|
||||
- jit_avx512_1x1_I8
|
||||
- 0.399
|
||||
- 0.399
|
||||
* - resnet_model/add_4/fq_input_0
|
||||
- NOT_RUN
|
||||
- FakeQuantize
|
||||
- undef
|
||||
- 0
|
||||
- 0
|
||||
* - resnet_model/add_4
|
||||
- NOT_RUN
|
||||
- Eltwise
|
||||
- undef
|
||||
- 0
|
||||
- 0
|
||||
* - resnet_model/add_5/fq_input_1
|
||||
- NOT_RUN
|
||||
- FakeQuantize
|
||||
- undef
|
||||
- 0
|
||||
- 0
|
||||
|
||||
|
||||
| layerName | execStatus | layerType | execType | realTime (ms) | cpuTime (ms) |
|
||||
| --------------------------------------------------------- | ---------- | ------------ | -------------------- | ------------- | ------------ |
|
||||
| resnet\_model/batch\_normalization\_15/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.377 | 0.377 |
|
||||
| resnet\_model/conv2d\_16/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/batch\_normalization\_16/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_I8 | 0.499 | 0.499 |
|
||||
| resnet\_model/conv2d\_17/Conv2D/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/batch\_normalization\_17/FusedBatchNorm/Add | EXECUTED | Convolution | jit\_avx512\_1x1\_I8 | 0.399 | 0.399 |
|
||||
| resnet\_model/add\_4/fq\_input\_0 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
| resnet\_model/add\_4 | NOT\_RUN | Eltwise | undef | 0 | 0 |
|
||||
| resnet\_model/add\_5/fq\_input\_1 | NOT\_RUN | FakeQuantize | undef | 0 | 0 |
|
||||
The ``execStatus`` column of the table includes possible values:
|
||||
|
||||
* ``EXECUTED`` - layer was executed by standalone primitive,
|
||||
* ``NOT_RUN`` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.
|
||||
|
||||
> The `execStatus` column of the table includes possible values:
|
||||
> - `EXECUTED` - layer was executed by standalone primitive,
|
||||
> - `NOT_RUN` - layer was not executed by standalone primitive or was fused with another operation and executed in another layer primitive.
|
||||
>
|
||||
> The `execType` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
|
||||
> * Suffix `I8` for layers that had 8-bit data type input and were computed in 8-bit precision
|
||||
> * Suffix `FP32` for layers computed in 32-bit precision
|
||||
The ``execType`` column of the table includes inference primitives with specific suffixes. The layers have the following marks:
|
||||
|
||||
As result all operations (except not quantized `SoftMax` at the end of the model) in OpenVINO™ CPU plugin are inferred in low precision. Note, please, in the result model there are `FakeQuantize` operations in FP32 but the plugin responsibility is fuse these operations with previous operations. OpenVINO™ CPU plugin achieves maximum optimized inference for all operations by fusing INT8 `Convolution` with FP32 output with `FakeQuantize` operation with FP32 input and INT8 output. In this case OpenVINO™ CPU plugin uses INT8 and FP32 vectorized instructions but reports about one INT8 kernel usage for inference, which is the most optimized for this case.
|
||||
* Suffix ``I8`` for layers that had 8-bit data type input and were computed in 8-bit precision
|
||||
* Suffix ``FP32`` for layers computed in 32-bit precision
|
||||
|
||||
## Mixed precision
|
||||
If LPT input model operation output has `fp16` precision then dequantization computations still occurs in `fp32` precision. This approach is used to avoid accuracy loss in `fp16` arithmetic computations. The ultimate output of the dequantization operation will have the `fp16` precision, as expected.
|
||||
As result all operations (except not quantized ``SoftMax`` at the end of the model) in OpenVINO™ CPU plugin are inferred in low precision. Note, please, in the result model there are ``FakeQuantize`` operations in FP32 but the plugin responsibility is fuse these operations with previous operations. OpenVINO™ CPU plugin achieves maximum optimized inference for all operations by fusing INT8 ``Convolution`` with FP32 output with ``FakeQuantize`` operation with FP32 input and INT8 output. In this case OpenVINO™ CPU plugin uses INT8 and FP32 vectorized instructions but reports about one INT8 kernel usage for inference, which is the most optimized for this case.
|
||||
|
||||
Mixed precision
|
||||
###############
|
||||
|
||||
If LPT input model operation output has ``fp16`` precision then dequantization computations still occurs in ``fp32`` precision. This approach is used to avoid accuracy loss in ``fp16`` arithmetic computations. The ultimate output of the dequantization operation will have the ``fp16`` precision, as expected.
|
||||
|
||||
Customization
|
||||
#############
|
||||
|
||||
## Customization
|
||||
Low Precision Transformations can be customizable. Build-in customization options:
|
||||
|
||||
* operation precision restrictions,
|
||||
* operation per tensor quantization restrictions,
|
||||
* update precisions,
|
||||
* dequantization precision.
|
||||
|
||||
Operation precision restrictions
|
||||
++++++++++++++++++++++++++++++++
|
||||
|
||||
### Operation precision restrictions
|
||||
This option defines precisions which allowed for the operation input ports. The option value is passed as input argument for `LowPrecision` constructor. For example:
|
||||
This option defines precisions which allowed for the operation input ports. The option value is passed as input argument for ``LowPrecision`` constructor. For example:
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp lpt_supported_precisions
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [lpt_supported_precisions]
|
||||
|
||||
In provided example in result model `Convolution` operation inputs must have specific precisions: `u8` (unsigned int8) precision on input 0 (on activations) and `i8` (signed int8) precision on input 1 (on weights).
|
||||
In provided example in result model ``Convolution`` operation inputs must have specific precisions: ``u8`` (unsigned int8) precision on input 0 (on activations) and ``i8`` (signed int8) precision on input 1 (on weights).
|
||||
|
||||
### Operation per tensor quantization restrictions
|
||||
This option defines if operation supports per-tensor quantization only. The option value is passed as input argument for `LowPrecision` constructor. For example:
|
||||
Operation per tensor quantization restrictions
|
||||
++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp per_tensor_quantization
|
||||
This option defines if operation supports per-tensor quantization only. The option value is passed as input argument for ``LowPrecision`` constructor. For example:
|
||||
|
||||
In provided example in result model `Convolution` operations must have per-tensor quantization on input 0 (on activations).
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [per_tensor_quantization]
|
||||
|
||||
### Update precisions
|
||||
This option defines if each LPT transformation updates precision or not. The option value is boolean and is passed as `updatePrecisions` member of `LayerTransformation::Params` which is input argument for `LowPrecision` constructor. All transformations are affected. If `true` then low precision transformations update precisions to low precision and doesn't if `false`. Typically this option is used for plugin debugging.
|
||||
In provided example in result model ``Convolution`` operations must have per-tensor quantization on input 0 (on activations).
|
||||
|
||||
### Typical customization use cases
|
||||
Update precisions
|
||||
++++++++++++++++++
|
||||
|
||||
Plugin specific customization can be implemented via nGraph transformation callbacks. For example: asymmetric quantization support can be easily customizable via `LayerTransformation::isAsymmetricQuantization` and `WeightableLayerTransformation::isAsymmetricOnWeights` methods usage in callbacks. For example:
|
||||
This option defines if each LPT transformation updates precision or not. The option value is boolean and is passed as ``updatePrecisions`` member of ``LayerTransformation::Params`` which is input argument for ``LowPrecision`` constructor. All transformations are affected. If ``true`` then low precision transformations update precisions to low precision and doesn't if ``false``. Typically this option is used for plugin debugging.
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp asymmetric_quantization
|
||||
Typical customization use cases
|
||||
+++++++++++++++++++++++++++++++
|
||||
|
||||
Plugin specific customization can be implemented via nGraph transformation callbacks. For example: asymmetric quantization support can be easily customizable via ``LayerTransformation::isAsymmetricQuantization`` and ``WeightableLayerTransformation::isAsymmetricOnWeights`` methods usage in callbacks. For example:
|
||||
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [asymmetric_quantization]
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -14,44 +14,89 @@
|
||||
QuantizationAlignment <openvino_docs_OV_UG_lpt_QuantizationAlignment>
|
||||
QuantizationGranularity <openvino_docs_OV_UG_lpt_QuantizationGranularity>
|
||||
|
||||
@endsphinxdirective
|
||||
Introduction
|
||||
############
|
||||
|
||||
## Introduction
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
| Name | Target | Required | Mutable |
|
||||
|-------------------------------------------------------------------------------------|--------------------------|----------|---------|
|
||||
| [AvgPoolPrecisionPreserved](@ref openvino_docs_OV_UG_lpt_AvgPoolPrecisionPreserved) | Precision | No | Yes |
|
||||
| [IntervalsAlignment](@ref openvino_docs_OV_UG_lpt_IntervalsAlignment) | Quantization interval | Yes | Yes |
|
||||
| [PrecisionPreserved](@ref openvino_docs_OV_UG_lpt_PrecisionPreserved) | Precision | Yes | Yes |
|
||||
| [Precisions](@ref openvino_docs_OV_UG_lpt_Precisions) | Precision | Yes | Yes |
|
||||
| [QuantizationAlignment](@ref openvino_docs_OV_UG_lpt_QuantizationAlignment) | Quantization granularity | Yes | Yes |
|
||||
| [QuantizationGranularity](@ref openvino_docs_OV_UG_lpt_QuantizationGranularity) | Quantization granularity | Yes | No |
|
||||
* - Name
|
||||
- Target
|
||||
- Required
|
||||
- Mutable
|
||||
* - :doc:`AvgPoolPrecisionPreserved <openvino_docs_OV_UG_lpt_AvgPoolPrecisionPreserved>`
|
||||
- Precision
|
||||
- No
|
||||
- Yes
|
||||
* - :doc:`IntervalsAlignment <openvino_docs_OV_UG_lpt_IntervalsAlignment>`
|
||||
- Quantization interval
|
||||
- Yes
|
||||
- Yes
|
||||
* - :doc:`PrecisionPreserved <openvino_docs_OV_UG_lpt_PrecisionPreserved>`
|
||||
- Precision
|
||||
- Yes
|
||||
- Yes
|
||||
* - :doc:`Precisions <openvino_docs_OV_UG_lpt_Precisions>`
|
||||
- Precision
|
||||
- Yes
|
||||
- Yes
|
||||
* - :doc:`QuantizationAlignment <openvino_docs_OV_UG_lpt_QuantizationAlignment>`
|
||||
- Quantization granularity
|
||||
- Yes
|
||||
- Yes
|
||||
* - :doc:`QuantizationGranularity <openvino_docs_OV_UG_lpt_QuantizationGranularity>`
|
||||
- Quantization granularity
|
||||
- Yes
|
||||
- No
|
||||
|
||||
> `Target` attribute group defines attribute usage during model transformation for the best performance:
|
||||
> - `Precision` - the attribute defines the most optimal output port precision.
|
||||
> - `Quantization interval` - the attribute defines quantization interval.
|
||||
> - `Quantization alignment` - the attribute defines quantization granularity in runtime: per-channel or per-tensor quantization.
|
||||
> - `Quantization granularity` - the attribute is set by plugin to define quantization granularity: per-channel or per-tensor quantization.
|
||||
>
|
||||
> `Required` attribute group defines if attribute usage is required to get an optimal model during transformation:
|
||||
> - `Yes` - the attribute is used by all OpenVINO plugins for low-precision optimization.
|
||||
> - `No` - the attribute is used in a specific OpenVINO plugin.
|
||||
>
|
||||
> `Mutable` attribute group defines if transformation can update an existing attribute:
|
||||
> - `Yes` - the attribute can be updated by the next transformations in the pipeline. But attribute update order is still important.
|
||||
> - `No` - existing attribute can not be updated by the next transformation. Previous handled transformation has optimized a model according to the current value.
|
||||
|
||||
`FakeQuantize` decomposition is a mandatory part of low precision transformations. Attributes used during decomposition are mandatory. Optional attributes are required only for certain operations.
|
||||
``Target`` attribute group defines attribute usage during model transformation for the best performance:
|
||||
|
||||
* ``Precision`` - the attribute defines the most optimal output port precision.
|
||||
* ``Quantization interval`` - the attribute defines quantization interval.
|
||||
* ``Quantization alignment`` - the attribute defines quantization granularity in runtime: per-channel or per-tensor quantization.
|
||||
* ``Quantization granularity`` - the attribute is set by plugin to define quantization granularity: per-channel or per-tensor quantization.
|
||||
|
||||
``Required`` attribute group defines if attribute usage is required to get an optimal model during transformation:
|
||||
|
||||
* ``Yes`` - the attribute is used by all OpenVINO plugins for low-precision optimization.
|
||||
* ``No`` - the attribute is used in a specific OpenVINO plugin.
|
||||
|
||||
``Mutable`` attribute group defines if transformation can update an existing attribute:
|
||||
|
||||
* ``Yes`` - the attribute can be updated by the next transformations in the pipeline. But attribute update order is still important.
|
||||
* ``No`` - existing attribute can not be updated by the next transformation. Previous handled transformation has optimized a model according to the current value.
|
||||
|
||||
``FakeQuantize`` decomposition is a mandatory part of low precision transformations. Attributes used during decomposition are mandatory. Optional attributes are required only for certain operations.
|
||||
|
||||
Attributes usage by transformations:
|
||||
|
||||
| Attribute name | Created by transformations | Used by transformations |
|
||||
|---------------------------|---------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------|
|
||||
| PrecisionPreserved | MarkupPrecisions, MarkupAvgPoolPrecisionPreserved | AlignQuantizationIntervals, AlignQuantizationParameters, FakeQuantizeDecompositionTransformation, MarkupAvgPoolPrecisionPreserved |
|
||||
| AvgPoolPrecisionPreserved | MarkupAvgPoolPrecisionPreserved | |
|
||||
| Precisions | MarkupCanBeQuantized, MarkupPrecisions | FakeQuantizeDecompositionTransformation |
|
||||
| PerTensorQuantization | MarkupPerTensorQuantization | |
|
||||
| IntervalsAlignment | AlignQuantizationIntervals | FakeQuantizeDecompositionTransformation |
|
||||
| QuantizationAlignment | AlignQuantizationParameters | FakeQuantizeDecompositionTransformation |
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
> **NOTE**: The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different.
|
||||
* - Attribute name
|
||||
- Created by transformations
|
||||
- Used by transformations
|
||||
* - PrecisionPreserved
|
||||
- MarkupPrecisions, MarkupAvgPoolPrecisionPreserved
|
||||
- AlignQuantizationIntervals, AlignQuantizationParameters, FakeQuantizeDecompositionTransformation, MarkupAvgPoolPrecisionPreserved
|
||||
* - AvgPoolPrecisionPreserved
|
||||
- MarkupAvgPoolPrecisionPreserved
|
||||
-
|
||||
* - Precisions
|
||||
- MarkupCanBeQuantized, MarkupPrecisions
|
||||
- FakeQuantizeDecompositionTransformation
|
||||
* - PerTensorQuantization
|
||||
- MarkupPerTensorQuantization
|
||||
-
|
||||
* - IntervalsAlignment
|
||||
- AlignQuantizationIntervals
|
||||
- FakeQuantizeDecompositionTransformation
|
||||
* - QuantizationAlignment
|
||||
- AlignQuantizationParameters
|
||||
- FakeQuantizeDecompositionTransformation
|
||||
|
||||
.. note::
|
||||
The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, ``Precision`` attribute instances are created in ``MarkupCanBeQuantized`` and ``MarkupPrecisions`` transformations, but the reasons for their creation are different.
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,6 +1,11 @@
|
||||
# Step 1. Prerequisites Transformations {#openvino_docs_OV_UG_lpt_step1_prerequisites}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
Prerequisites transformations are optional. The transformations prepare a model before running other low precision transformations. The transformations do not operate with dequantization operations or update precisions. Prerequisites transformations include:
|
||||
* [PullReshapeThroughDequantization](@ref openvino_docs_OV_UG_lpt_PullReshapeThroughDequantization)
|
||||
* [PullTransposeThroughDequantization](@ref openvino_docs_OV_UG_lpt_PullTransposeThroughDequantization)
|
||||
* [LinOpSequenceFusion](@ref openvino_docs_OV_UG_lpt_LinOpSequenceFusion)
|
||||
|
||||
* :doc:`PullReshapeThroughDequantization <openvino_docs_OV_UG_lpt_PullReshapeThroughDequantization>`
|
||||
* :doc:`PullTransposeThroughDequantization <openvino_docs_OV_UG_lpt_PullTransposeThroughDequantization>`
|
||||
* :doc:`LinOpSequenceFusion <openvino_docs_OV_UG_lpt_LinOpSequenceFusion>`
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,142 +1,207 @@
|
||||
# Step 2. Markup Transformations {#openvino_docs_OV_UG_lpt_step2_markup}
|
||||
|
||||
This step defines the optimal `FakeQuantize` decomposition precisions for the best inference performance via operations markup with runtime attribute instances. Attributes are created for input and output ports and operations. Transformations do not change the operation output port precisions. A model markup low precision logic is decomposed and implemented into the following common markup transformations. The order of transformations is important:
|
||||
@sphinxdirective
|
||||
|
||||
1. [MarkupBias](@ref openvino_docs_OV_UG_lpt_MarkupBias)
|
||||
2. [MarkupCanBeQuantized](@ref openvino_docs_OV_UG_lpt_MarkupCanBeQuantized)
|
||||
3. [MarkupPrecisions](@ref openvino_docs_OV_UG_lpt_MarkupPrecisions)
|
||||
4. [MarkupPerTensorQuantization](@ref openvino_docs_OV_UG_lpt_MarkupPerTensorQuantization)
|
||||
5. [MarkupAvgPoolPrecisionPreserved](@ref openvino_docs_OV_UG_lpt_MarkupAvgPoolPrecisionPreserved)
|
||||
6. [PropagatePrecisions](@ref openvino_docs_OV_UG_lpt_PropagatePrecisions)
|
||||
7. [AlignQuantizationIntervals](@ref openvino_docs_OV_UG_lpt_AlignQuantizationIntervals)
|
||||
8. [AlignQuantizationParameters](@ref openvino_docs_OV_UG_lpt_AlignQuantizationParameters)
|
||||
This step defines the optimal ``FakeQuantize`` decomposition precisions for the best inference performance via operations markup with runtime attribute instances. Attributes are created for input and output ports and operations. Transformations do not change the operation output port precisions. A model markup low precision logic is decomposed and implemented into the following common markup transformations. The order of transformations is important:
|
||||
|
||||
The table of transformations and used attributes:
|
||||
1. :doc:`MarkupBias <openvino_docs_OV_UG_lpt_MarkupBias>`
|
||||
2. :doc:`MarkupCanBeQuantized <openvino_docs_OV_UG_lpt_MarkupCanBeQuantized>`
|
||||
3. :doc:`MarkupPrecisions <openvino_docs_OV_UG_lpt_MarkupPrecisions>`
|
||||
4. :doc:`MarkupPerTensorQuantization <openvino_docs_OV_UG_lpt_MarkupPerTensorQuantization>`
|
||||
5. :doc:`MarkupAvgPoolPrecisionPreserved <openvino_docs_OV_UG_lpt_MarkupAvgPoolPrecisionPreserved>`
|
||||
6. :doc:`PropagatePrecisions <openvino_docs_OV_UG_lpt_PropagatePrecisions>`
|
||||
7. :doc:`AlignQuantizationIntervals <openvino_docs_OV_UG_lpt_AlignQuantizationIntervals>`
|
||||
8. :doc:`AlignQuantizationParameters <openvino_docs_OV_UG_lpt_AlignQuantizationParameters>`
|
||||
|
||||
| Transformation name | Create attributes | Use attributes |
|
||||
|---------------------------------|-------------------------------|-------------------------------------------|
|
||||
| MarkupBias | Bias | |
|
||||
| MarkupCanBeQuantized | Precisions | |
|
||||
| MarkupPrecisions | Precisions,PrecisionPreserved | |
|
||||
| MarkupPerTensorQuantization | PerTensorQuantization | |
|
||||
| MarkupAvgPoolPrecisionPreserved | AvgPoolPrecisionPreserved | Precisions, PrecisionPreserved |
|
||||
| PropagatePrecisions | Precisions | Precisions, PrecisionPreserved |
|
||||
| AlignQuantizationIntervals | IntervalsAlignment | PrecisionPreserved |
|
||||
| AlignQuantizationParameters | QuantizationAlignment | PrecisionPreserved, PerTensorQuantization |
|
||||
.. list-table::
|
||||
:header-rows: 1
|
||||
|
||||
> **NOTE**: The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different
|
||||
* - Transformation name
|
||||
- Create attributes
|
||||
- Use attributes
|
||||
* - MarkupBias
|
||||
- Bias
|
||||
-
|
||||
* - MarkupCanBeQuantized
|
||||
- Precisions
|
||||
-
|
||||
* - MarkupPrecisions
|
||||
- Precisions,PrecisionPreserved
|
||||
-
|
||||
* - MarkupPerTensorQuantization
|
||||
- PerTensorQuantization
|
||||
-
|
||||
* - MarkupAvgPoolPrecisionPreserved
|
||||
- AvgPoolPrecisionPreserved
|
||||
- Precisions, PrecisionPreserved
|
||||
* - PropagatePrecisions
|
||||
- Precisions
|
||||
- Precisions, PrecisionPreserved
|
||||
* - AlignQuantizationIntervals
|
||||
- IntervalsAlignment
|
||||
- PrecisionPreserved
|
||||
* - AlignQuantizationParameters
|
||||
- QuantizationAlignment
|
||||
- PrecisionPreserved, PerTensorQuantization
|
||||
|
||||
.. note::
|
||||
The same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, ``Precision`` attribute instances are created in ``MarkupCanBeQuantized`` and ``MarkupPrecisions`` transformations, but the reasons for their creation are different
|
||||
|
||||
Common markup transformations can be decomposed into simpler utility markup transformations. The order of Markup utility transformations is not important:
|
||||
* [CreateAttribute](@ref openvino_docs_OV_UG_lpt_CreateAttribute)
|
||||
* [CreatePrecisionsDependentAttribute](@ref openvino_docs_OV_UG_lpt_CreatePrecisionsDependentAttribute)
|
||||
* [PropagateThroughPrecisionPreserved](@ref openvino_docs_OV_UG_lpt_PropagateThroughPrecisionPreserved)
|
||||
* [PropagateToInput](@ref openvino_docs_OV_UG_lpt_PropagateToInput)
|
||||
* [UpdateSharedPrecisionPreserved](@ref openvino_docs_OV_UG_lpt_UpdateSharedPrecisionPreserved)
|
||||
|
||||
* :doc:`CreateAttribute <openvino_docs_OV_UG_lpt_CreateAttribute>`
|
||||
* :doc:`CreatePrecisionsDependentAttribute <openvino_docs_OV_UG_lpt_CreatePrecisionsDependentAttribute>`
|
||||
* :doc:`PropagateThroughPrecisionPreserved <openvino_docs_OV_UG_lpt_PropagateThroughPrecisionPreserved>`
|
||||
* :doc:`PropagateToInput <openvino_docs_OV_UG_lpt_PropagateToInput>`
|
||||
* :doc:`UpdateSharedPrecisionPreserved <openvino_docs_OV_UG_lpt_UpdateSharedPrecisionPreserved>`
|
||||
|
||||
Let's explore all transformations and their relations in detail, using one and the same model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup_original.svg
|
||||
|
||||
The original model key features:
|
||||
* The first `concat1` concatenation operation has not quantized `convolution1` consumer.
|
||||
* The second `concat2` concatenation operation has quantized `convolution2` consumer with requirements:
|
||||
- support `unsigned int8` on activations,
|
||||
- per-tensor quantization.
|
||||
* Between the `concat2` concatenation operation and `Convolution` there is an `AvgPool` operation, which mathematically should return an `f32` tensor. But the `MarkupAvgPoolPrecisionPreserved` transformation is active. This allows the low precision transformation, that goes after the `AvgPool`, to propagate low precision tensor to the next consumer.
|
||||
|
||||
* The first ``concat1`` concatenation operation has not quantized ``convolution1`` consumer.
|
||||
|
||||
|
||||
* The second ``concat2`` concatenation operation has quantized ``convolution2`` consumer with requirements:
|
||||
|
||||
* support ``unsigned int8`` on activations,
|
||||
* per-tensor quantization.
|
||||
|
||||
* Between the ``concat2`` concatenation operation and ``Convolution`` there is an ``AvgPool`` operation, which mathematically should return an ``f32`` tensor. But the ``MarkupAvgPoolPrecisionPreserved`` transformation is active. This allows the low precision transformation, that goes after the ``AvgPool``, to propagate low precision tensor to the next consumer.
|
||||
|
||||
Transformations are run with the following parameters:
|
||||
|
||||
@snippet snippets/lpt_intel_cpu_plugin.cpp lpt_markup_pipeline
|
||||
.. doxygensnippet:: docs/snippets/lpt_intel_cpu_plugin.cpp
|
||||
:language: cpp
|
||||
:fragment: [lpt_markup_pipeline]
|
||||
|
||||
1. MarkupCanBeQuantized
|
||||
#######################
|
||||
|
||||
## 1. MarkupCanBeQuantized
|
||||
The transformation marks operations that cannot be quantized. No attributes are required before the transformation.
|
||||
|
||||
Changes in the example model after `MarkupCanBeQuantized` transformation:
|
||||
* Not quantized `convolution1` operation is marked by the `Precisions` attribute with empty values. This attribute allows the next transformation to ignore not quantized operation.
|
||||
Changes in the example model after ``MarkupCanBeQuantized`` transformation:
|
||||
|
||||
* Not quantized ``convolution1`` operation is marked by the ``Precisions`` attribute with empty values. This attribute allows the next transformation to ignore not quantized operation.
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup1.svg
|
||||
:alt: MarkupCanBeQuantize
|
||||
|
||||
Model display features (here and below):
|
||||
|
||||
* The attributes added by the current transformation are marked in bold.
|
||||
* If attributes do not fit into one line, then one line consists of only one attribute.
|
||||
|
||||
## 2. MarkupPrecisions
|
||||
2. MarkupPrecisions
|
||||
###################
|
||||
|
||||
The transformation is required and includes two tasks:
|
||||
1. Mark operation input ports (create `Precision` attribute instance) by provided restrictions: input port index and required precisions. Restrictions are provided as input argument in `ngraph::pass::low_precision::LowPrecision` constructor.
|
||||
|
||||
1. Mark operation input ports (create ``Precision`` attribute instance) by provided restrictions: input port index and required precisions. Restrictions are provided as input argument in ``:ref:`ngraph::pass::low_precision::LowPrecision <doxid-classngraph_1_1pass_1_1low__precision_1_1_low_precision>``` constructor.
|
||||
2. Mark precision preserved operations.
|
||||
|
||||
No attributes are required before the transformation. Changes in the example model after `MarkupPrecisions` transformation:
|
||||
No attributes are required before the transformation. Changes in the example model after ``MarkupPrecisions`` transformation:
|
||||
|
||||
* Both concatenation operations are marked as precision preserved operations. It allows to propagate precision via these operations.
|
||||
* Quantized `convolution2` operation is marked by the `Precisions` attribute with `u8` precision on activations and `i8` precisions on weights according to the provided restrictions. This attribute instance allows to specify which precisions are required for quantized `Convolution` operation.
|
||||
* Quantized ``convolution2`` operation is marked by the ``Precisions`` attribute with ``u8`` precision on activations and ``i8`` precisions on weights according to the provided restrictions. This attribute instance allows to specify which precisions are required for quantized ``Convolution`` operation.
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup2.svg
|
||||
:alt: MarkupPrecisions result
|
||||
|
||||
## 3. MarkupPerTensorQuantization
|
||||
The transformation is required and marks operations (create `PerTensorQuantization` attribute instance) by provided restrictions: an operation that requires per-tensor quantization. No attributes are required before the transformation.
|
||||
3. MarkupPerTensorQuantization
|
||||
##############################
|
||||
|
||||
Changes in the example model after `MarkupPerTensorQuantization` transformation:
|
||||
* both `Convolution` operations are marked by `PerTensorQuantization`
|
||||
The transformation is required and marks operations (create ``PerTensorQuantization`` attribute instance) by provided restrictions: an operation that requires per-tensor quantization. No attributes are required before the transformation.
|
||||
|
||||
Changes in the example model after ``MarkupPerTensorQuantization`` transformation:
|
||||
|
||||
* both ``Convolution`` operations are marked by ``PerTensorQuantization``
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup3.svg
|
||||
:alt: MarkupPerTensorQuantization result
|
||||
|
||||
4. MarkupAvgPoolPrecisionPreserved
|
||||
##################################
|
||||
|
||||
The transformation is optional. ``MarkupAvgPoolPrecisionPreserved`` marks ``AvgPool`` operations as precision preserved or not precision preserved. ``AvgPool`` operation is precision preserved if next not precision preserved operation can be inferred in low precision. In other words, ``AvgPool`` operations become precision preserved operations to speed up model inference. The transformation uses ``PrecisionPreserved`` attributes created before. The transformation is combined and uses:
|
||||
|
||||
## 4. MarkupAvgPoolPrecisionPreserved
|
||||
The transformation is optional. `MarkupAvgPoolPrecisionPreserved` marks `AvgPool` operations as precision preserved or not precision preserved. `AvgPool` operation is precision preserved if next not precision preserved operation can be inferred in low precision. In other words, `AvgPool` operations become precision preserved operations to speed up model inference. The transformation uses `PrecisionPreserved` attributes created before. The transformation is combined and uses:
|
||||
* CreatePrecisionsDependentAttribute
|
||||
* PropagateThroughPrecisionPreserved
|
||||
* UpdateSharedPrecisionPreserved
|
||||
|
||||
Changes in the example model after `MarkupAvgPoolPrecisionPreserved` transformation:
|
||||
* `AvgPool` operations are marked by `PrecisionPreserved` and `AvgPoolPrecisionPreserved` (not used below).
|
||||
Changes in the example model after ``MarkupAvgPoolPrecisionPreserved`` transformation:
|
||||
|
||||
* ``AvgPool`` operations are marked by ``PrecisionPreserved`` and ``AvgPoolPrecisionPreserved`` (not used below).
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup4.svg
|
||||
:alt: arkupAvgPoolPrecisionPreserved
|
||||
|
||||
## 5. PropagatePrecisions
|
||||
The transformation is required. `PropagatePrecision` is a key transformation in the markup pipeline, which marks `FakeQuantize` output port precisions. The transformation uses `PrecisionPreserved` attribute instances created before. The transformation is combined and uses:
|
||||
5. PropagatePrecisions
|
||||
######################
|
||||
|
||||
The transformation is required. ``PropagatePrecision`` is a key transformation in the markup pipeline, which marks ``FakeQuantize`` output port precisions. The transformation uses ``PrecisionPreserved`` attribute instances created before. The transformation is combined and uses:
|
||||
|
||||
* CreateAttribute
|
||||
* PropagateThroughPrecisionPreserved
|
||||
* PropagateToInput
|
||||
|
||||
Changes in the example model after `PropagatePrecisions` transformation:
|
||||
* All precision preserved operations are marked by the `Precisions` attribute instance, which defines the required precision for the operation.
|
||||
* `FakeQuantize` operation output ports are marked by `Precisions` attribute instances, which define target precision for decomposition. In the sample model, `FakeQuantize` operations have signed intervals, but the `Precisions` attributes are initialized by `u8` (`unsigned int8`) values as the result applied during transformations restrictions for `Convolution` operations.
|
||||
Changes in the example model after ``PropagatePrecisions`` transformation:
|
||||
|
||||
* All precision preserved operations are marked by the ``Precisions`` attribute instance, which defines the required precision for the operation.
|
||||
* ``FakeQuantize`` operation output ports are marked by ``Precisions`` attribute instances, which define target precision for decomposition. In the sample model, ``FakeQuantize`` operations have signed intervals, but the ``Precisions`` attributes are initialized by ``u8`` (``unsigned int8``) values as the result applied during transformations restrictions for ``Convolution`` operations.
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup5.svg
|
||||
:alt: PropagatePrecisions
|
||||
|
||||
> **NOTE**: `AlignQuantizationIntervals` and `AlignQuantizationParameters` transformations are required if the model has quantized concatenation operations.
|
||||
.. note::
|
||||
``AlignQuantizationIntervals`` and ``AlignQuantizationParameters`` transformations are required if the model has quantized concatenation operations.
|
||||
|
||||
6. AlignQuantizationIntervals
|
||||
#############################
|
||||
|
||||
The transformation is required for models with the quantized operation. The transformation marks ``FakeQuantize`` operation and precision preserved consumers to combine quantization information from different ``FakeQuantize`` operations for future quantization intervals alignment. The transformation is combined and uses:
|
||||
|
||||
## 6. AlignQuantizationIntervals
|
||||
The transformation is required for models with the quantized operation. The transformation marks `FakeQuantize` operation and precision preserved consumers to combine quantization information from different `FakeQuantize` operations for future quantization intervals alignment. The transformation is combined and uses:
|
||||
* CreateAttribute
|
||||
* PropagateThroughPrecisionPreserved
|
||||
|
||||
Changes in the example model after `AlignQuantizationIntervals` transformation:
|
||||
* All `FakeQuantize` operations and their precision preserved consumers are marked by the `IntervalsAlignment` attribute instance.
|
||||
Changes in the example model after ``AlignQuantizationIntervals`` transformation:
|
||||
|
||||
* All ``FakeQuantize`` operations and their precision preserved consumers are marked by the ``IntervalsAlignment`` attribute instance.
|
||||
|
||||
Result model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup6.svg
|
||||
:alt: AlignQuantizationIntervals
|
||||
|
||||
7. AlignQuantizationParameters
|
||||
##############################
|
||||
|
||||
## 7. AlignQuantizationParameters
|
||||
The transformation is required for models with quantized concatenation operation. The transformation marks `FakeQuantize` precision preserved consumers to align quantization intervals. The transformation is combined and uses:
|
||||
|
||||
* CreateAttribute
|
||||
* PropagateThroughPrecisionPreserved
|
||||
* UpdateSharedPrecisionPreserved
|
||||
|
||||
|
||||
Changes in the example model after `AlignQuantizationParameters` transformation:
|
||||
* All `FakeQuantize` precision preserved consumers are marked by `QuantizationAlignment` attribute instance. `convolution1` input ports are marked by `Precisions` attribute instances with empty precisions collection. As a result, the `convolution1` operation was detected as not quantized, and the `QuantizationAlignment` attribute default value `false` does not change. `convolution2` input ports are marked by `Precisions` attribute instances with not empty precisions collection. `convolution2` operation was detected as quantized with the `PerTensorQuantization` attribute, and the `QuantizationAlignment` attribute default value changed to `true`.
|
||||
Changes in the example model after ``AlignQuantizationParameters`` transformation:
|
||||
|
||||
* All ``FakeQuantize`` precision preserved consumers are marked by ``QuantizationAlignment`` attribute instance. ``convolution1`` input ports are marked by ``Precisions`` attribute instances with empty precisions collection. As a result, the ``convolution1`` operation was detected as not quantized, and the ``QuantizationAlignment`` attribute default value ``false`` does not change. ``convolution2`` input ports are marked by ``Precisions`` attribute instances with not empty precisions collection. ``convolution2`` operation was detected as quantized with the ``PerTensorQuantization`` attribute, and the ``QuantizationAlignment`` attribute default value changed to ``true``.
|
||||
|
||||
Final model:
|
||||
|
||||

|
||||
.. image:: _static/images/step2_markup7.svg
|
||||
:alt: AlignQuantizationParameters
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,50 +1,62 @@
|
||||
# Step 3. Main Transformations {#openvino_docs_OV_UG_lpt_step3_main}
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
Main transformations are the majority of low precision transformations. Transformations operate with dequantization operations. Main transformations include:
|
||||
* [AddTransformation](@ref openvino_docs_OV_UG_lpt_AddTransformation)
|
||||
* [AvgPoolTransformation](@ref openvino_docs_OV_UG_lpt_AvgPoolTransformation)
|
||||
* [ClampTransformation](@ref openvino_docs_OV_UG_lpt_AvgPoolTransformation)
|
||||
* [ConcatTransformation](@ref openvino_docs_OV_UG_lpt_ConcatTransformation)
|
||||
* [ConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_ConvolutionTransformation)
|
||||
* [ConvolutionBackpropDataTransformation](@ref openvino_docs_OV_UG_lpt_ConvolutionBackpropDataTransformation)
|
||||
* [DepthToSpaceTransformation](@ref openvino_docs_OV_UG_lpt_DepthToSpaceTransformation)
|
||||
* [FakeQuantizeDecompositionTransformation](@ref openvino_docs_OV_UG_lpt_FakeQuantizeDecompositionTransformation)
|
||||
* [FakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FakeQuantizeTransformation)
|
||||
* [InterpolateTransformation](@ref openvino_docs_OV_UG_lpt_InterpolateTransformation)
|
||||
* [GroupConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_GroupConvolutionTransformation)
|
||||
* [GatherTransformation](@ref openvino_docs_OV_UG_lpt_GatherTransformation)
|
||||
* [MatMulTransformation](@ref openvino_docs_OV_UG_lpt_MatMulTransformation)
|
||||
* [MaxPoolTransformation](@ref openvino_docs_OV_UG_lpt_MaxPoolTransformation)
|
||||
* [MultiplyTransformation](@ref openvino_docs_OV_UG_lpt_MultiplyTransformation)
|
||||
* [MVNTransformation](@ref openvino_docs_OV_UG_lpt_MVNTransformation)
|
||||
* [NormalizeL2Transformation](@ref openvino_docs_OV_UG_lpt_NormalizeL2Transformation)
|
||||
* [PReluTransformation](@ref openvino_docs_OV_UG_lpt_PReluTransformation)
|
||||
* [ReduceMaxTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMaxTransformation)
|
||||
* [ReduceMeanTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMeanTransformation)
|
||||
* [ReduceMinTransformation](@ref openvino_docs_OV_UG_lpt_ReduceMinTransformation)
|
||||
* [ReduceSumTransformation](@ref openvino_docs_OV_UG_lpt_ReduceSumTransformation)
|
||||
* [ReluTransformation](@ref openvino_docs_OV_UG_lpt_ReluTransformation)
|
||||
* [ReshapeTransformation](@ref openvino_docs_OV_UG_lpt_ReshapeTransformation)
|
||||
* [SqueezeTransformation](@ref openvino_docs_OV_UG_lpt_SqueezeTransformation)
|
||||
* [ShuffleChannelsTransformation](@ref openvino_docs_OV_UG_lpt_ShuffleChannelsTransformation)
|
||||
* [SplitTransformation](@ref openvino_docs_OV_UG_lpt_SplitTransformation)
|
||||
* [StridedSliceTransformation](@ref openvino_docs_OV_UG_lpt_StridedSliceTransformation)
|
||||
* [TransposeTransformation](@ref openvino_docs_OV_UG_lpt_TransposeTransformation)
|
||||
* [UnsqueezeTransformation](@ref openvino_docs_OV_UG_lpt_UnsqueezeTransformation)
|
||||
* [VariadicSplitTransformation](@ref openvino_docs_OV_UG_lpt_VariadicSplitTransformation)
|
||||
|
||||
* :doc:`AddTransformation <openvino_docs_OV_UG_lpt_AddTransformation>`
|
||||
* :doc:`AvgPoolTransformation <openvino_docs_OV_UG_lpt_AvgPoolTransformation>`
|
||||
* :doc:`ClampTransformation <openvino_docs_OV_UG_lpt_AvgPoolTransformation>`
|
||||
* :doc:`ConcatTransformation <openvino_docs_OV_UG_lpt_ConcatTransformation>`
|
||||
* :doc:`ConvolutionTransformation <openvino_docs_OV_UG_lpt_ConvolutionTransformation>`
|
||||
* :doc:`ConvolutionBackpropDataTransformation <openvino_docs_OV_UG_lpt_ConvolutionBackpropDataTransformation>`
|
||||
* :doc:`DepthToSpaceTransformation <openvino_docs_OV_UG_lpt_DepthToSpaceTransformation>`
|
||||
* :doc:`FakeQuantizeDecompositionTransformation <openvino_docs_OV_UG_lpt_FakeQuantizeDecompositionTransformation>`
|
||||
* :doc:`FakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FakeQuantizeTransformation>`
|
||||
* :doc:`InterpolateTransformation <openvino_docs_OV_UG_lpt_InterpolateTransformation>`
|
||||
* :doc:`GroupConvolutionTransformation <openvino_docs_OV_UG_lpt_GroupConvolutionTransformation>`
|
||||
* :doc:`GatherTransformation <openvino_docs_OV_UG_lpt_GatherTransformation>`
|
||||
* :doc:`MatMulTransformation <openvino_docs_OV_UG_lpt_MatMulTransformation>`
|
||||
* :doc:`MaxPoolTransformation <openvino_docs_OV_UG_lpt_MaxPoolTransformation>`
|
||||
* :doc:`MultiplyTransformation <openvino_docs_OV_UG_lpt_MultiplyTransformation>`
|
||||
* :doc:`MVNTransformation <openvino_docs_OV_UG_lpt_MVNTransformation>`
|
||||
* :doc:`NormalizeL2Transformation <openvino_docs_OV_UG_lpt_NormalizeL2Transformation>`
|
||||
* :doc:`PReluTransformation <openvino_docs_OV_UG_lpt_PReluTransformation>`
|
||||
* :doc:`ReduceMaxTransformation <openvino_docs_OV_UG_lpt_ReduceMaxTransformation>`
|
||||
* :doc:`ReduceMeanTransformation <openvino_docs_OV_UG_lpt_ReduceMeanTransformation>`
|
||||
* :doc:`ReduceMinTransformation <openvino_docs_OV_UG_lpt_ReduceMinTransformation>`
|
||||
* :doc:`ReduceSumTransformation <openvino_docs_OV_UG_lpt_ReduceSumTransformation>`
|
||||
* :doc:`ReluTransformation <openvino_docs_OV_UG_lpt_ReluTransformation>`
|
||||
* :doc:`ReshapeTransformation <openvino_docs_OV_UG_lpt_ReshapeTransformation>`
|
||||
* :doc:`SqueezeTransformation <openvino_docs_OV_UG_lpt_SqueezeTransformation>`
|
||||
* :doc:`ShuffleChannelsTransformation <openvino_docs_OV_UG_lpt_ShuffleChannelsTransformation>`
|
||||
* :doc:`SplitTransformation <openvino_docs_OV_UG_lpt_SplitTransformation>`
|
||||
* :doc:`StridedSliceTransformation <openvino_docs_OV_UG_lpt_StridedSliceTransformation>`
|
||||
* :doc:`TransposeTransformation <openvino_docs_OV_UG_lpt_TransposeTransformation>`
|
||||
* :doc:`UnsqueezeTransformation <openvino_docs_OV_UG_lpt_UnsqueezeTransformation>`
|
||||
* :doc:`VariadicSplitTransformation <openvino_docs_OV_UG_lpt_VariadicSplitTransformation>`
|
||||
|
||||
Let's explore some main transformations on the example model. Original model:
|
||||
|
||||

|
||||
.. image:: _static/images/step3_original.svg
|
||||
:alt: Original model
|
||||
|
||||
Result model after main transformations:
|
||||
|
||||

|
||||
.. image:: _static/images/step3_transformed.svg
|
||||
:alt: Transformed model
|
||||
|
||||
Changes in the example model after main transformation:
|
||||
* All `FakeQuantize` operations (`fakeQuantize1`, `fakeQuantize2` and `fakeQuantize3`) were decomposed:
|
||||
- original `FakeQuantize` operations were replaced with new operations with other output intervals and output port precision,
|
||||
- dequantization operations.
|
||||
* Dequantization operations were moved via precision preserved (`concat1` and `concat2`) and quantized (`convolution2`) operations.
|
||||
|
||||
> **NOTE**: The left branch (branch #1) does not require per-tensor quantization. As a result, the `fakeQuantize1`output interval is [0, 255]. But quantized `convolution2` requires per-tensor quantization on the right branch (branch #2). Then all connected `FakeQuantize` interval operations (`fakeQuantize1` and `fakeQuantize2`) are aligned to have per-tensor quantization after the concatenation (`concat2`) operation.
|
||||
* All ``FakeQuantize`` operations (``fakeQuantize1``, ``fakeQuantize2`` and ``fakeQuantize3``) were decomposed:
|
||||
|
||||
* original ``FakeQuantize`` operations were replaced with new operations with other output intervals and output port precision,
|
||||
* dequantization operations.
|
||||
|
||||
* Dequantization operations were moved via precision preserved (``concat1`` and ``concat2``) and quantized (``convolution2``) operations.
|
||||
|
||||
.. note::
|
||||
|
||||
The left branch (branch #1) does not require per-tensor quantization. As a result, the ``fakeQuantize1``output interval is [0, 255]. But quantized ``convolution2`` requires per-tensor quantization on the right branch (branch #2). Then all connected ``FakeQuantize`` interval operations (``fakeQuantize1`` and ``fakeQuantize2``) are aligned to have per-tensor quantization after the concatenation (``concat2``) operation.
|
||||
|
||||
@endsphinxdirective
|
||||
|
@ -1,8 +1,12 @@
|
||||
# Step 4. Cleanup Transformations {#openvino_docs_OV_UG_lpt_step4_cleanup}
|
||||
|
||||
* [FoldConvertTransformation](@ref openvino_docs_OV_UG_lpt_FoldConvertTransformation)
|
||||
* [FoldFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FoldFakeQuantizeTransformation)
|
||||
* [FuseConvertTransformation](@ref openvino_docs_OV_UG_lpt_FuseConvertTransformation)
|
||||
* [FuseMultiplyToFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FuseMultiplyToFakeQuantizeTransformation)
|
||||
* [FuseSubtractToFakeQuantizeTransformation](@ref openvino_docs_OV_UG_lpt_FuseSubtractToFakeQuantizeTransformation)
|
||||
* [MultiplyToGroupConvolutionTransformation](@ref openvino_docs_OV_UG_lpt_MultiplyToGroupConvolutionTransformation)
|
||||
@sphinxdirective
|
||||
|
||||
* :doc:`FoldConvertTransformation <openvino_docs_OV_UG_lpt_FoldConvertTransformation>`
|
||||
* :doc:`FoldFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FoldFakeQuantizeTransformation>`
|
||||
* :doc:`FuseConvertTransformation <openvino_docs_OV_UG_lpt_FuseConvertTransformation>`
|
||||
* :doc:`FuseMultiplyToFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FuseMultiplyToFakeQuantizeTransformation>`
|
||||
* :doc:`FuseSubtractToFakeQuantizeTransformation <openvino_docs_OV_UG_lpt_FuseSubtractToFakeQuantizeTransformation>`
|
||||
* :doc:`MultiplyToGroupConvolutionTransformation <openvino_docs_OV_UG_lpt_MultiplyToGroupConvolutionTransformation>`
|
||||
|
||||
@endsphinxdirective
|
||||
|
3
docs/_static/images/fq.common.svg
vendored
Normal file
3
docs/_static/images/fq.common.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:d4daac1d270f60e4819683b467c20967f78cb736eef5ff760a9a15ad428ab48b
|
||||
size 15681
|
3
docs/_static/images/fq.transformed.svg
vendored
Normal file
3
docs/_static/images/fq.transformed.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c1d986eea3590b2c214551e4f76a323b1f3ff4f14d6237bd6faaca17c3a0fbb7
|
||||
size 23275
|
3
docs/_static/images/low_precision_transformation_pipeline.svg
vendored
Normal file
3
docs/_static/images/low_precision_transformation_pipeline.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3b5ccafe14d5dae83894b520d8b6d65bc2cb08015b54cfa88c784db4eb009964
|
||||
size 22741
|
3
docs/_static/images/model_fq_and_convolution.common.svg
vendored
Normal file
3
docs/_static/images/model_fq_and_convolution.common.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:81e8cda60a44b726cd6c021c452029c4d815f1ab2625a16a3022b206367840f9
|
||||
size 27133
|
3
docs/_static/images/model_fq_and_convolution.transformed.svg
vendored
Normal file
3
docs/_static/images/model_fq_and_convolution.transformed.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:28a4d377a646d45905960e317b507e816ce60f66e9e015a91f06590ea1a884b8
|
||||
size 29783
|
3
docs/_static/images/model_qdq_and_convolution.common.svg
vendored
Normal file
3
docs/_static/images/model_qdq_and_convolution.common.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:222f890cbcc7ca8e2498808a2d2d976a4c8f91e3152aaf4c69df8ae2464de7a4
|
||||
size 39429
|
3
docs/_static/images/qdq_propagation.png
vendored
Normal file
3
docs/_static/images/qdq_propagation.png
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e0bab657bf979494cb84459e29024e5b8b9cd320388c62c6a91b74b897b19718
|
||||
size 18108
|
3
docs/_static/images/step2_markup1.svg
vendored
Normal file
3
docs/_static/images/step2_markup1.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:e336651c517c77c32126fd0a718b15b704340216d7e3fb155b2e06743c24d3a8
|
||||
size 62139
|
3
docs/_static/images/step2_markup2.svg
vendored
Normal file
3
docs/_static/images/step2_markup2.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:abba6671b011a2a7c4126364e0b5e7ae5ebc95d2ea5cc4269afdbddbda31278f
|
||||
size 63263
|
3
docs/_static/images/step2_markup3.svg
vendored
Normal file
3
docs/_static/images/step2_markup3.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:8e6244537b1cf1f1e1c72af87c7e8fff5e2d1f06b19e262aaad43da65deb5edd
|
||||
size 63943
|
3
docs/_static/images/step2_markup4.svg
vendored
Normal file
3
docs/_static/images/step2_markup4.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fdb5721ca6b5ffe1941f7bf799c2e0179ea24970f04d63f642e412f56cc34fb8
|
||||
size 65682
|
3
docs/_static/images/step2_markup5.svg
vendored
Normal file
3
docs/_static/images/step2_markup5.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:aa4e5a1055a3707c50936fab1266da11babad65c4857b5ecd8392617ebb5ea77
|
||||
size 68218
|
3
docs/_static/images/step2_markup6.svg
vendored
Normal file
3
docs/_static/images/step2_markup6.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:56d2c54e091568943481ccc7aecd9151091c00d84b2a8ebd6e381a384dee2e8b
|
||||
size 80265
|
3
docs/_static/images/step2_markup7.svg
vendored
Normal file
3
docs/_static/images/step2_markup7.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:aa3d45e1d1d0c335c7ab58973c243345e0f6639b151edd8d9d2566001299636d
|
||||
size 79154
|
3
docs/_static/images/step2_markup_original.svg
vendored
Normal file
3
docs/_static/images/step2_markup_original.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cd609c7264c0f705cdd7ace07a3a4d734ce80cc2288c2b0b0bed44182794fe14
|
||||
size 59093
|
3
docs/_static/images/step3_original.svg
vendored
Normal file
3
docs/_static/images/step3_original.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:912906c7e75c96901d5a9dfebde489d509bce1232f2d93482af06108cb48ed44
|
||||
size 79154
|
3
docs/_static/images/step3_transformed.svg
vendored
Normal file
3
docs/_static/images/step3_transformed.svg
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:a6ed59344d205908e316a35e49e725184a48587a6ca8127e5d186a734f7077e6
|
||||
size 97610
|
Loading…
Reference in New Issue
Block a user