fix wording (#6108)

This commit is contained in:
Maxim Shevtsov
2021-06-09 22:22:47 +03:00
committed by GitHub
parent 1a6392eb53
commit d4f323f3dd

View File

@@ -22,7 +22,8 @@ $ benchmark_app -m <model.xml> -enforcebf16=false
Notice that for quantized (e.g. INT8) models the bfloat16 calculations (of the layers that remain in FP32) is disabled by default.
Refer to the [CPU Plugin documentation](supported_plugins/CPU.md) for more details.
Similarly, the GPU device has a dedicated config key to enable FP16 execution of the layers that remain in FP32 in the quantized models (as the quantization is typically performed on the FP32 models), refer to the ENABLE_FP16_FOR_QUANTIZED_MODELS key in the [GPU Plugin documentation](supported_plugins/GPU.md)
Similarly, the GPU device automatically executes FP16 for the layers that remain in FP16 in the quantized models (assuming that the FP16 model was quantized).
Refer to the ENABLE_FP16_FOR_QUANTIZED_MODELS key in the [GPU Plugin documentation](supported_plugins/GPU.md).
## Latency vs. Throughput
One way to increase computational efficiency is batching, which combines many (potentially tens) of