Fix a paragraph in LPT docs w.r.t mixed precision (#9570)

This commit is contained in:
Vasily Shamporov
2022-01-13 15:58:14 +03:00
committed by GitHub
parent fdee2d5728
commit b7e8ef910d

View File

@@ -285,7 +285,7 @@ available from the Inference Engine API. For example, the part of performance co
As result all operations (except not quantized `SoftMax` at the end of the model) in OpenVINO™ CPU plugin are inferred in low precision. Note, please, in the result model there are `FakeQuantize` operations in FP32 but the plugin responsibility is fuse these operations with previous operations. OpenVINO™ CPU plugin achieves maximum optimized inference for all operations by fusing INT8 `Convolution` with FP32 output with `FakeQuantize` operation with FP32 input and INT8 output. In this case OpenVINO™ CPU plugin uses INT8 and FP32 vectorized instructions but reports about one INT8 kernel usage for inference, which is the most optimized for this case.
## Mixed precision
If LPT input model operation output has `fp16` precision then dequantization computations still occurs in `fp32` precision. This approach is used to avoid accuracy loss in `fp16` arithmetic computations. Note, the latest dequantization operation output has `fp16` precision.
If LPT input model operation output has `fp16` precision then dequantization computations still occurs in `fp32` precision. This approach is used to avoid accuracy loss in `fp16` arithmetic computations. The ultimate output of the dequantization operation will have the `fp16` precision, as expected.
## Customization
Low Precision Transformations can be customizable. Build-in customization options: