From 2dadf50a68d8f0dbcb2ad925d92c7c9f09dd96a7 Mon Sep 17 00:00:00 2001 From: Nikita Malinin Date: Tue, 2 Nov 2021 13:31:51 +0300 Subject: [PATCH] [POT] Update AccuracyAware doc (#8261) * Update AA doc with GNA * Apply comments * Update doc with note --- .../pot/algorithms/quantization/accuracy_aware/README.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md b/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md index d96eb69c539..f2497de5442 100644 --- a/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md +++ b/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md @@ -4,6 +4,11 @@ AccuracyAware algorithm is designed to perform accurate 8-bit quantization and allows the model to stay in the pre-defined range of accuracy drop, for example 1%, defined by the user in the configuration file. This may cause a degradation in performance in comparison to [DefaultQuantization](../default/README.md) algorithm because some layers can be reverted back to the original precision. + +> **NOTE**: +In case of GNA `target_device`, POT moves INT8 weights to INT16 to stay in the pre-defined range of the accuracy drop. Thus, the algorithm works for the `performance` (INT8) preset only. +For the `accuracy` preset, this algorithm is not performed, but the parameters tuning is available (if `tune_hyperparams` option is enabled). + Generally, the algorithm consists of the following steps: 1. The model gets fully quantized using the DefaultQuantization algorithm. 2. The quantized and full-precision models are compared on a subset of the validation set in order to find mismatches in the target accuracy metric. A ranking subset is extracted based on the mismatches.