From 39abc5368ddcb8bef2965d2fc313c74026f71ead Mon Sep 17 00:00:00 2001 From: Maciej Smyk Date: Wed, 12 Oct 2022 17:06:37 +0200 Subject: [PATCH] DOCS: Port to master - NNCF Fix (#13283) Port from #13215 --- tools/pot/docs/BestPractices.md | 60 +++++++++++----------- tools/pot/docs/FrequentlyAskedQuestions.md | 8 +-- 2 files changed, 34 insertions(+), 34 deletions(-) diff --git a/tools/pot/docs/BestPractices.md b/tools/pot/docs/BestPractices.md index e335265a3bb..8dcdb0544f9 100644 --- a/tools/pot/docs/BestPractices.md +++ b/tools/pot/docs/BestPractices.md @@ -42,46 +42,46 @@ There are two alternatives in case of substantial accuracy degradation after app ### Tuning Hyperparameters of the Default Quantization The Default Quantization algorithm provides multiple hyperparameters which can be used in order to improve accuracy results for the fully-quantized model. Below is a list of best practices that can be applied to improve accuracy without a substantial performance reduction with respect to default settings: -1. The first recommended option is to change the `preset` from `performance` to `mixed`. This enables asymmetric quantization of -activations and can be helpful for models with non-ReLU activation functions, for example, YOLO, EfficientNet, etc. -2. The next option is `use_fast_bias`. Setting this option to `false` enables a different bias correction method which is more accurate, in general, -and applied after model quantization as a part of the Default Quantization algorithm. + +1. The first recommended option is to change the `preset` from `performance` to `mixed`. This enables asymmetric quantization of activations and can be helpful for models with non-ReLU activation functions, for example, YOLO, EfficientNet, etc. + +2. The next option is `use_fast_bias`. Setting this option to `false` enables a different bias correction method which is more accurate, in general,and applied after model quantization as a part of the Default Quantization algorithm. > **NOTE**: Changing this option can substantially increase quantization time in the POT tool. + 3. Some model architectures require a special approach when being quantized. For example, Transformer-based models need to keep some operations in the original precision to preserve accuracy. That is why POT provides a `model_type` option to specify the model architecture. Now, only `"transformer"` type is available. Use it to quantize Transformer-based models, e.g. BERT. -4. Another important option is a `range_estimator`. It defines how to calculate the minimum and maximum of quantization range for weights and activations. -For example, the following `range_estimator` for activations can improve the accuracy for Faster R-CNN based networks: -```python -{ - "name": "DefaultQuantization", - "params": { - "preset": "performance", - "stat_subset_size": 300 - - "activations": { # defines activation - "range_estimator": { # defines how to estimate statistics - "max": { # right border of the quantizating floating-point range - "aggregator": "max", # use max(x) to aggregate statistics over calibration dataset - "type": "abs_max" # use abs(max(x)) to get per-sample statistics - } - } - } - } -} -``` +4. Another important option is a `range_estimator`. It defines how to calculate the minimum and maximum of quantization range for weights and activations. For example, the following `range_estimator` for activations can improve the accuracy for Faster R-CNN based networks: -5. The next option is `stat_subset_size`. It controls the size of the calibration dataset used by POT to collect statistics for quantization parameters initialization. -It is assumed that this dataset should contain a sufficient number of representative samples. Thus, varying this parameter may affect accuracy (higher is better). -However, we empirically found that 300 samples are sufficient to get representative statistics in most cases. -6. The last option is `ignored_scope`. It allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision. -For example, `DetectionOutput` layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models. -One of the sources for the ignored scope can be the Accuracy-aware algorithm which can revert layers back to the original precision (see details below). + ```python + { + "name": "DefaultQuantization", + "params": { + "preset": "performance", + "stat_subset_size": 300 + + + "activations": { # defines activation + "range_estimator": { # defines how to estimate statistics + "max": { # right border of the quantizating floating-point range + "aggregator": "max", # use max(x) to aggregate statistics over calibration dataset + "type": "abs_max" # use abs(max(x)) to get per-sample statistics + } + } + } + } + } + ``` + +5. The next option is `stat_subset_size`. It controls the size of the calibration dataset used by POT to collect statistics for quantization parameters initialization. It is assumed that this dataset should contain a sufficient number of representative samples. Thus, varying this parameter may affect accuracy (higher is better). However, we empirically found that 300 samples are sufficient to get representative statistics in most cases. + +6. The last option is `ignored_scope`. It allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision. For example, `DetectionOutput` layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models. One of the sources for the ignored scope can be the Accuracy-aware algorithm which can revert layers back to the original precision (see details below). Find all the possible options and their description in the configuration [specification file](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json) in the POT directory. ## Accuracy-aware Quantization When the steps above do not lead to the accurate quantized model, you may use the so-called [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) algorithm which leads to mixed-precision models. A fragment of Accuracy-aware Quantization configuration with default settings is shown below: + ```python { "name": "AccuracyAwareQuantization", diff --git a/tools/pot/docs/FrequentlyAskedQuestions.md b/tools/pot/docs/FrequentlyAskedQuestions.md index cd98ffc4d50..cb2cfbb0a22 100644 --- a/tools/pot/docs/FrequentlyAskedQuestions.md +++ b/tools/pot/docs/FrequentlyAskedQuestions.md @@ -17,12 +17,12 @@ What else can I do? - When I execute POT CLI, I get "File "/workspace/venv/lib/python3.7/site-packages/nevergrad/optimization/base.py", line 35... SyntaxError: invalid syntax". What is wrong? - What does a message "ModuleNotFoundError: No module named 'some\_module\_name'" mean? - Is there a way to collect an intermidiate IR when the AccuracyAware mechanism fails? -- What do the messages "Output name: not found" or "Output node with is not found in graph" mean? +- What do the messages "Output name: result_operation_name not found" or "Output node with result_operation_name is not found in graph" mean? ### Is the Post-training Optimization Tool (POT) opensourced? -Yes, POT is developed on GitHub as a part of [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) under Apache-2.0 License. +Yes, POT is developed on GitHub as a part of [openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino) under Apache-2.0 License. ### Can I quantize my model without a dataset? @@ -38,7 +38,7 @@ The POT accepts models in the OpenVINO™ Intermediate Representation (IR) f 1. Try quantization using Python API of the Post-training Optimization Tool. For more details see [Default Quantization](@ref pot_default_quantization_usage). 2. If you consider command-line usage only refer to [Accuracy Checker documentation](@ref omz_tools_accuracy_checker) to create the Accuracy Checker configuration file, and try to find the configuration file for your model among the ones available in the Accuracy Checker examples. -3. An alternative way is to quantize the model in the [Simplified mode](#ref pot_docs_simplified_mode) but you will not be able to measure the accuracy. +3. An alternative way is to quantize the model in the [Simplified mode](@ref pot_docs_simplified_mode) but you will not be able to measure the accuracy. ### What is a tradeoff when you go to low precision? @@ -99,5 +99,5 @@ It means that some required python module is not installed in your environment. You can add `"dump_intermediate_model": true` to the POT configuration file and it will drop an intermidiate IR to `accuracy_aware_intermediate` folder. -### What do the messages "Output name: not found" or "Output node with is not found in graph" mean? +### What do the messages "Output name: result_operation_name not found" or "Output node with result_operation_name is not found in graph" mean? Errors are caused by missing output nodes names in a graph when using the POT tool for model quantization. It might appear for some models only for IRs converted from ONNX models using new frontend (which is the default conversion path starting from 2022.1 release). To avoid such errors, use legacy MO frontend to convert a model to IR by passing the --use_legacy_frontend option. Then, use the produced IR for quantization.