Fixed conflicts (#11332)

This commit is contained in:
Alexander Kozlov 2022-03-30 16:10:03 +03:00 committed by GitHub
parent 9fa5150d71
commit 4b4bd7399c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
48 changed files with 878 additions and 615 deletions

View File

@ -44,7 +44,7 @@ Please report questions, issues and suggestions using:
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
[OpenVINO™ Runtime]:https://docs.openvino.ai/latest/openvino_docs_OV_Runtime_User_Guide.html
[Model Optimizer]:https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html
[Post-Training Optimization Tool]:https://docs.openvino.ai/latest/pot_README.html
[Post-Training Optimization Tool]:https://docs.openvino.ai/latest/pot_introduction.html
[Samples]:https://github.com/openvinotoolkit/openvino/tree/master/samples
[tag on StackOverflow]:https://stackoverflow.com/search?q=%23openvino

View File

@ -9,7 +9,7 @@ For more details about low-precision model representation please refer to this [
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
- Independently based on the definition of *FakeQuantize* operation.
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](../OV_Runtime_UG/Int8Inference.md).
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into models with low-precision operations.
Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.

View File

@ -72,11 +72,7 @@ For example, if you would like to infer a model with `Convolution` operation in
> There are several supported quantization approaches on activations and on weights. All supported approaches are described in [Quantization approaches](#quantization-approaches) section below. In demonstrated model [FakeQuantize operation quantization](#fakequantize-operation) approach is used.
### Low precision tools
There are two tools to quantize a model:
1. [Post-Training Optimization Toolkit](@ref pot_docs_LowPrecisionOptimizationGuide) (POT)
2. [Neural Network Compression Framework](https://github.com/openvinotoolkit/nncf) (NNCF)
Additionally, low precision transformations can handle ONNX quantized models.
For more details on how to get a quantized model, refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
## Quantization approaches
LPT transformations support two quantization approaches:

View File

@ -42,7 +42,7 @@ The IR is a pair of files describing the model:
* <code>.bin</code> - Contains the weights and biases binary data.
> **NOTE**: The generated IR can be additionally optimized for inference by [Post-training Optimization tool](../../tools/pot/README.md)
> **NOTE**: The generated IR can be additionally optimized for inference by [Post-training optimization](../../tools/pot/docs/Introduction.md)
> that applies post-training quantization methods.
> **TIP**: You also can work with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Introduction.html) (DL Workbench).

View File

@ -17,4 +17,4 @@ although for the majority of models accuracy degradation is negligible. For deta
compressed `FP16` models refer to [Working with devices](../../OV_Runtime_UG/supported_plugins/Device_Plugins.md) page.
> **NOTE**: `FP16` compression is sometimes used as initial step for `INT8` quantization, please refer to
> [Post-Training Optimization tool](../../../tools/pot/README.md) for more information about that.
> [Post-training optimization](../../../tools/pot/docs/Introduction.md) for more information about that.

View File

@ -3,11 +3,11 @@
## Introduction
OpenVINO Runtime CPU and GPU devices can infer models in the low precision.
For details, refer to [Low Precision Inference on the CPU](../../../OV_Runtime_UG/Int8Inference.md).
For details, refer to [Model Optimization Guide](@ref openvino_docs_model_optimization_guide).
Intermediate Representation (IR) should be specifically formed to be suitable for low precision inference.
Such an IR is called a Low Precision IR and you can generate it in two ways:
- [Quantize regular IR with the Post-Training Optimization tool](@ref pot_README)
- [Quantize regular IR with the Post-Training Optimization tool](@ref pot_introduction)
- Use the Model Optimizer for a model pretrained for Low Precision inference: TensorFlow\* pre-TFLite models (`.pb` model file with `FakeQuantize*` operations) and ONNX\* quantized models.
Both TensorFlow and ONNX quantized models could be prepared by [Neural Network Compression Framework](https://github.com/openvinotoolkit/nncf/blob/develop/README.md).

View File

@ -1,4 +1,4 @@
# Low-Precision 8-bit Integer Inference {#openvino_docs_IE_DG_Int8Inference}
# Low-Precision 8-bit Integer Inference
## Disclaimer
@ -14,9 +14,7 @@ Low-precision 8-bit inference is optimized for:
## Introduction
For 8-bit integer computation, a model must be quantized. You can use a quantized model from [OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel) or quantize a model yourself. For quantization, you can use the following:
- [Post-Training Optimization Tool](@ref pot_docs_LowPrecisionOptimizationGuide) delivered with the Intel® Distribution of OpenVINO™ toolkit release package
- [Neural Network Compression Framework](https://www.intel.com/content/www/us/en/artificial-intelligence/posts/openvino-nncf.html) available on GitHub: https://github.com/openvinotoolkit/nncf
For 8-bit integer computation, a model must be quantized. You can use a quantized model from [OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel) or quantize a model yourself. For more details on how to get quantized model please refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
@ -46,10 +44,10 @@ If you infer the model with the OpenVINO™ CPU plugin and collect performance c
## Low-Precision 8-bit Integer Inference Workflow
For 8-bit integer computations, a model must be quantized. Quantized models can be downloaded from [Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel). If the model is not quantized, you can use the [Post-Training Optimization Tool](@ref pot_README) to quantize the model. The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
For 8-bit integer computations, a model must be quantized. Quantized models can be downloaded from [Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models](@ref omz_models_group_intel). If the model is not quantized, you can use the [Post-Training Optimization Tool](@ref pot_introduction) to quantize the model. The quantization process adds [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers on activations and weights for most layers. Read more about mathematical computations in the [Uniform Quantization with Fine-Tuning](https://github.com/openvinotoolkit/nncf/blob/develop/docs/compression_algorithms/Quantization.md).
8-bit inference pipeline includes two stages (also refer to the figure below):
1. *Offline stage*, or *model quantization*. During this stage, [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in original precision range (`fp32`). `FakeQuantize` layer has `levels` attribute which defines quants count. Quants count defines precision which is used during inference. For `int8` range `levels` attribute value has to be 255 or 256. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_README) delivered with the Intel® Distribution of OpenVINO™ toolkit release package.
1. *Offline stage*, or *model quantization*. During this stage, [FakeQuantize](../ops/quantization/FakeQuantize_1.md) layers are added before most layers to have quantized tensors before layers in a way that low-precision accuracy drop for 8-bit integer inference satisfies the specified threshold. The output of this stage is a quantized model. Quantized model precision is not changed, quantized tensors are in the original precision range (`fp32`). `FakeQuantize` layer has `levels` attribute which defines quants count. Quants count defines precision which is used during inference. For `int8` range `levels` attribute value has to be 255 or 256. To quantize the model, you can use the [Post-Training Optimization Tool](@ref pot_introduction) delivered with the Intel® Distribution of OpenVINO™ toolkit release package.
When you pass the quantized IR to the OpenVINO™ plugin, the plugin automatically recognizes it as a quantized model and performs 8-bit inference. Note, if you pass a quantized model to another plugin that does not support 8-bit inference but supports all operations from the model, the model is inferred in precision that this plugin supports.

View File

@ -47,7 +47,7 @@ CPU plugin supports the following data types as inference precision of internal
Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
u1/u8/i8 data types are used for quantized operations only, i.e. those are not selected automatically for non-quantized operations.
See [low-precision optimization guide](@ref pot_docs_LowPrecisionOptimizationGuide) for more details on how to get quantized model.
See [low-precision optimization guide](@ref openvino_docs_model_optimization_guide) for more details on how to get a quantized model.
> **NOTE**: Platforms that do not support Intel® AVX512-VNNI have a known "saturation issue" which in some cases leads to reduced computational accuracy for u8/i8 precision calculations.
> See [saturation (overflow) issue section](@ref pot_saturation_issue) to get more information on how to detect such issues and possible workarounds.

View File

@ -90,7 +90,7 @@ can cause the user's request to be executed on CPU, thereby unnecessarily increa
Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit (`i8`), 16-bit (`i16`), and 32-bit (`i32`) integer computations.
GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_README) to get a model with quantization hints based on statistics for the provided dataset.
GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_introduction) to get a model with quantization hints based on statistics for the provided dataset.
Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time, so you can run a model without calibration. However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics which may or may not be efficient, depending on the model and dynamic range of input data and this mode is going to be deprecated soon.
@ -101,7 +101,7 @@ GNA plugin supports the following data types as inference precision of internal
[Hello Query Device C++ Sample](@ref openvino_inference_engine_samples_hello_query_device_README) can be used to print out supported data types for all detected devices.
[POT API Usage sample for GNA](@ref pot_sample_speech_README) demonstrates how a model can be quantized for GNA using POT API in 2 modes:
[POT API Usage sample for GNA](@ref pot_example_speech_README) demonstrates how a model can be quantized for GNA using POT API in 2 modes:
* Accuracy (i16 weights)
* Performance (i8 weights)

View File

@ -109,7 +109,7 @@ GPU plugin supports the following data types as inference precision of internal
Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
u1/u8/i8 data types are used for quantized operations only, i.e. those are not selected automatically for non-quantized operations.
See [low-precision optimization guide](@ref pot_docs_LowPrecisionOptimizationGuide) for more details on how to get quantized model.
For more details on how to get a quantized model, refer to [Model Optimization](@ref openvino_docs_model_optimization_guide) document.
Floating-point precision of a GPU primitive is selected based on operation precision in IR except [compressed f16 IR form](../../MO_DG/prepare_model/FP16_Compression.md) which is executed in f16 precision.
@ -298,7 +298,7 @@ The behavior depends on specific parameters of the operations and hardware confi
## GPU Performance Checklist: Summary <a name="gpu-checklist"></a>
Since the OpenVINO relies on the OpenCL&trade; kernels for the GPU implementation. Thus, many general OpenCL tips apply:
- Prefer `FP16` inference precision over `FP32`, as the Model Optimizer can generate both variants and the `FP32` is default. Also, consider [int8 inference](../Int8Inference.md)
- Prefer `FP16` inference precision over `FP32`, as the Model Optimizer can generate both variants and the `FP32` is default. Also, consider [int8 inference](@ref openvino_docs_model_optimization_guide).
- Try to group individual infer jobs by using [automatic batching](../automatic_batching.md)
- Consider [caching](../Model_caching_overview.md) to minimize model load time
- If your application is simultaneously using the inference on the CPU or otherwise loads the host heavily, make sure that the OpenCL driver threads do not starve. You can use [CPU configuration options](./CPU.md) to limit number of inference threads for the CPU plugin.

View File

@ -96,7 +96,7 @@ With the [Model Downloader](@ref omz_tools_downloader) and [Model Optimizer](MO_
The [OpenVINO™ Runtime User Guide](./OV_Runtime_UG/openvino_intro.md) explains the process of creating your own application that runs inference with the OpenVINO™ toolkit. The [API Reference](./api_references.html) defines the OpenVINO Runtime API for Python, C++, and C. The OpenVINO Runtime API is what you'll use to create an OpenVINO™ inference application, use enhanced operations sets and other features. After writing your application, you can use the [Deployment with OpenVINO](./OV_Runtime_UG/deployment/deployment_intro.md) for deploying to target devices.
## Tuning for Performance
The toolkit provides a [Performance Optimization Guide](optimization_guide/dldt_optimization_guide.md) and utilities for squeezing the best performance out of your application, including [Accuracy Checker](@ref omz_tools_accuracy_checker), [Post-Training Optimization Tool](@ref pot_README), and other tools for measuring accuracy, benchmarking performance, and tuning your application.
The toolkit provides a [Performance Optimization Guide](optimization_guide/dldt_optimization_guide.md) and utilities for squeezing the best performance out of your application, including [Accuracy Checker](@ref omz_tools_accuracy_checker), [Post-Training Optimization Tool](@ref pot_introduction), and other tools for measuring accuracy, benchmarking performance, and tuning your application.
## Graphical Web Interface for OpenVINO™ Toolkit
You can choose to use the [OpenVINO™ Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction), a web-based tool that guides you through the process of converting, measuring, optimizing, and deploying models. This tool also serves as a low-effort introduction to the toolkit and provides a variety of useful interactive charts for understanding performance.

View File

@ -11,7 +11,7 @@ OpenVINO™ toolkit is a comprehensive toolkit for quickly developing applicatio
| [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) | `mo` |**Model Optimizer** imports, converts, and optimizes models that were trained in popular frameworks to a format usable by OpenVINO components. <br>Supported frameworks include Caffe\*, TensorFlow\*, MXNet\*, PaddlePaddle\*, and ONNX\*. |
| [Benchmark Tool](../../tools/benchmark_tool/README.md)| `benchmark_app` | **Benchmark Application** allows you to estimate deep learning inference performance on supported devices for synchronous and asynchronous modes. |
| [Accuracy Checker](@ref omz_tools_accuracy_checker) and <br> [Annotation Converter](@ref omz_tools_accuracy_checker_annotation_converters) | `accuracy_check` <br> `convert_annotation` |**Accuracy Checker** is a deep learning accuracy validation tool that allows you to collect accuracy metrics against popular datasets. The main advantages of the tool are the flexibility of configuration and a set of supported datasets, preprocessing, postprocessing, and metrics. <br> **Annotation Converter** is a utility that prepares datasets for evaluation with Accuracy Checker. |
| [Post-Training Optimization Tool](../../tools/pot/README.md)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. Optimizations are also available through the [API](../../tools/pot/openvino/tools/pot/api/README.md). |
| [Post-Training Optimization Tool](../../tools/pot/docs/pot_introduction.md)| `pot` |**Post-Training Optimization Tool** allows you to optimize trained models with advanced capabilities, such as quantization and low-precision optimizations, without the need to retrain or fine-tune models. |
| [Model Downloader and other Open Model Zoo tools](@ref omz_tools_downloader)| `omz_downloader` <br> `omz_converter` <br> `omz_quantizer` <br> `omz_info_dumper`| **Model Downloader** is a tool for getting access to the collection of high-quality and extremely fast pre-trained deep learning [public](@ref omz_models_group_public) and [Intel](@ref omz_models_group_intel)-trained models. These free pre-trained models can be used to speed up the development and production deployment process without training your own models. The tool downloads model files from online sources and, if necessary, patches them to make them more usable with Model Optimizer. A number of additional tools are also provided to automate the process of working with downloaded models:<br> **Model Converter** is a tool for converting Open Model Zoo models that are stored in an original deep learning framework format into the OpenVINO Intermediate Representation (IR) using Model Optimizer. <br> **Model Quantizer** is a tool for automatic quantization of full-precision models in the IR format into low-precision versions using the Post-Training Optimization Tool. <br> **Model Information Dumper** is a helper utility for dumping information about the models to a stable, machine-readable format.
The developer package also installs the OpenVINO™ Runtime package as a dependency.

View File

@ -17,7 +17,7 @@
## Deployment Optimizations Overview {#openvino_docs_deployment_optimization_guide_overview}
Runtime or deployment optimizations are focused on tuning of the inference _parameters_ (e.g. optimal number of the requests executed simultaneously) and other means of how a model is _executed_.
As referenced in the parent [performance introduction topic](./dldt_optimization_guide.md), the [dedicated document](./model_optimization_guide.md) covers the **model-level optimizations** like quantization that unlocks the [int8 inference](../OV_Runtime_UG/Int8Inference.md). Model-optimizations are most general and help any scenario and any device (that accelerated the quantized models). The relevant _runtime_ configuration is `ov::hint::inference_precision` allowing the devices to trade the accuracy for the performance (e.g. by allowing the fp16/bf16 execution for the layers that remain in fp32 after quantization of the original fp32 model).
As referenced in the parent [performance introduction topic](./dldt_optimization_guide.md), the [dedicated document](./model_optimization_guide.md) covers the **model-level optimizations** like quantization that unlocks the 8-bit inference. Model-optimizations are most general and help any scenario and any device (that e.g. accelerates the quantized models). The relevant _runtime_ configuration is `ov::hint::inference_precision` allowing the devices to trade the accuracy for the performance (e.g. by allowing the fp16/bf16 execution for the layers that remain in fp32 after quantization of the original fp32 model).
Then, possible optimization should start with defining the use-case. For example, whether the target scenario emphasizes throughput over latency like processing millions of samples by overnight jobs in the data centers.
In contrast, real-time usages would likely trade off the throughput to deliver the results at minimal latency. Often this is a combined scenario that targets highest possible throughput while maintaining a specific latency threshold.

View File

@ -8,7 +8,7 @@ This provides much better performance for the networks than batching especially
Compared with the batching, the parallelism is somewhat transposed (i.e. performed over inputs, with much less synchronization within CNN ops):
![](../img/cpu_streams_explained.png)
Notice that [high-level performance hints](../OV_Runtime_UG/performance_hints.md) allows the implementation to select the optimal number of the streams, _depending on the model compute demands_ and CPU capabilities (including [int8 inference](../OV_Runtime_UG/Int8Inference.md) hardware acceleration, number of cores, etc).
Note that [high-level performance hints](../OV_Runtime_UG/performance_hints.md) allows the implementation to select the optimal number of the streams, _depending on the model compute demands_ and CPU capabilities (including [int8 inference](@ref openvino_docs_model_optimization_guide) hardware acceleration, number of cores, etc).
## Automatic Batching Internals
As explained in the section on the [automatic batching](../OV_Runtime_UG/automatic_batching.md), the feature performs on-the-fly grouping of the inference requests to improve device utilization.

View File

@ -6,19 +6,19 @@
:maxdepth: 1
:hidden:
pot_README
pot_introduction
docs_nncf_introduction
openvino_docs_IE_DG_Int8Inference
(Experimental) Protecting Model <pot_ranger_README>
@endsphinxdirective
Model optimization assumes applying transformations to the model and relevant data flow to improve the inference performance. These transformations are basically offline and can require the availability of training and validation data. It includes such methods as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
Model optimization is an optional offline step of improving final model performance by applying special optimization methods like quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
- **Post-training Optimization tool [(POT)](../../tools/pot/README.md)** is designed to optimize the inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, like post-training quantization.
- **Model Optimizer** implements optimization to a model, most of them added by default, but you can configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model ([Embedding Preprocessing Computation](../MO_DG/prepare_model/Additional_Optimizations.md)).
- **Neural Network Compression Framework [(NNCF)](./nncf_introduction.md)** provides a suite of advanced algorithms for Neural Networks inference optimization with minimal accuracy drop, for example, quantization, pruning algorithms.
- **Post-training Optimization tool** [(POT)](../../tools/pot/docs/Introduction.md) is designed to optimize the inference of deep learning models by applying post-training methods that do not require model retraining or fine-tuning, for example, post-training 8-bit quantization.
- **Model Optimizer** implements optimization to a model, most of them added by default, but you can configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed-up preprocess of a model ([Embedding Preprocessing Computation](../MO_DG/prepare_model/Additional_Optimizations.md))
- **Neural Network Compression Framework** [(NNCF)](./nncf_introduction.md) provides a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
## Detailed workflow:
@ -27,9 +27,13 @@
To understand which development optimization tool you need, refer to the diagram:
POT is the easiest way to get optimized models, and usually takes several minutes depending on the model size and used HW. NNCF can be considered as an alternative or addition when the first one does not give accurate results.
Post-training methods are limited in terms of achievable accuracy and for challenging use cases accuracy might degrade. In this case, training-time optimization with NNCF is an option.
Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.
![](../img/WHAT_TO_USE.svg)
If you are not familiar with model optimization methods, we recommend starting from [post-training methods](@ref pot_introduction).
## See also
- [Deployment optimization](./dldt_deployment_optimization_guide.md)

View File

@ -1,8 +1,8 @@
# Neural Network Compression Framework {#docs_nncf_introduction}
This document describes the Neural Network Compression Framework (NNCF) which is being developed as a separate project outside of OpenVINO&trade; but it is highly aligned with OpenVINO&trade; in terms of the supported optimization features and models. It is open-sourced and available on [GitHub](https://github.com/openvinotoolkit/nncf).
This document describes the Neural Network Compression Framework (NNCF) which is distributed as a separate tool but is highly aligned with OpenVINO&trade; in terms of the supported optimization features and models. It is open-sourced and available on [GitHub](https://github.com/openvinotoolkit/nncf).
## Introduction
Neural Network Compression Framework (NNCF) is aimed at optimizing Deep Neural Network (DNN) by applying optimization methods, such as quantization, pruning, etc., to the original framework model. It mostly provides in-training optimization capabilities which means that optimization methods require model fine-tuning during and after optimization. The diagram below shows the model optimization workflow using NNCF.
Neural Network Compression Framework (NNCF) is aimed at optimizing Deep Neural Network (DNN) by applying optimization methods, such as quantization, pruning, etc., to the original framework model. It provides in-training optimization capabilities which means that optimization methods require model fine-tuning or even re-training. The diagram below shows the model optimization workflow using NNCF.
![](../img/nncf_workflow.png)
### Features
@ -42,7 +42,6 @@ NNCF provides various examples and tutorials that demonstrate usage of optimizat
### Tutorials
- [Quantization-aware training of PyTorch model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/302-pytorch-quantization-aware-training)
- [Quantization-aware training of TensorFlow model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/305-tensorflow-quantization-aware-training)
- (Experimental) [Post-training quantization of PyTorch model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/112-pytorch-post-training-quantization-nncf)
### Samples
- PyTorch:
@ -58,6 +57,6 @@ NNCF provides various examples and tutorials that demonstrate usage of optimizat
## See also
- [Compressed Model Zoo](https://github.com/openvinotoolkit/nncf#nncf-compressed-model-zoo)
- [NNCF in HuggingFace Optimum](https://github.com/dkurt/optimum-openvino)
- [OpenVINO&trade; Post-training Optimization tool](../../tools/pot/README.md)
- [NNCF in HuggingFace Optimum](https://github.com/openvinotoolkit/openvino_contrib/tree/master/modules/optimum)
- [Post-training optimization](../../tools/pot/docs/Introduction.md)

View File

@ -1,22 +1,4 @@
# Post-Training Optimization Tool {#pot_README}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
pot_InstallationGuide
pot_docs_LowPrecisionOptimizationGuide
pot_compression_algorithms_quantization_README
Best Practices <pot_docs_BestPractices>
Command-line Interface <pot_compression_cli_README>
pot_compression_api_README
pot_configs_README
Deep Neural Network Protection <pot_ranger_README>
pot_docs_FrequentlyAskedQuestions
@endsphinxdirective
# Post-Training Optimization Tool
## Introduction
@ -25,48 +7,35 @@ special methods without model retraining or fine-tuning, for example, post-train
require a training dataset or a pipeline. To apply post-training algorithms from the POT, you need:
* A floating-point precision model, FP32 or FP16, converted into the OpenVINO&trade; Intermediate Representation (IR) format
and run on CPU with the OpenVINO&trade;.
* A representative calibration dataset representing a use case scenario, for example, 300 images.
* A representative calibration dataset representing a use case scenario, for example, 300 samples.
Figure below shows the optimization workflow:
![](docs/images/workflow_simple.png)
### Features
To get started with POT tool refer to the corresponding OpenVINO&trade; [documentation](https://docs.openvino.ai/latest/openvino_docs_model_optimization_guide.html).
* Two post-training 8-bit quantization algorithms: fast [DefaultQuantization](openvino/tools/pot/algorithms/quantization/default/README.md) and precise [AccuracyAwareQuantization](openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md).
* Compression for different hardware targets such as CPU and GPU.
* Multiple domains: Computer Vision, Natural Language Processing, Recommendation Systems, Speech Recognition.
* [Command-line tool](docs/CLI.md) that provides a simple interface for basic use cases.
* [API](openvino/tools/pot/api/README.md) that helps to apply optimization methods within a custom inference script written with OpenVINO Python* API.
* (Experimental) [Ranger algorithm](@ref pot_ranger_README) for the model protection in safety-critical cases.
## Installation
### From PyPI
POT is distributed as a part of OpenVINO&trade; Development Tools package. For installation instruction please refer to this [document](https://docs.openvino.ai/latest/openvino_docs_install_guides_install_dev_tools.html).
For benchmarking results collected for the models optimized with the POT tool, see [INT8 vs FP32 Comparison on Select Networks and Platforms](@ref openvino_docs_performance_int8_vs_fp32).
### From GitHub
As prerequisites, you should install [OpenVINO&trade; Runtime](https://docs.openvino.ai/latest/openvino_docs_install_guides_install_runtime.html) and other dependencies such as [Model Optimizer](https://docs.openvino.ai/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) and [Accuracy Checker](https://docs.openvino.ai/latest/omz_tools_accuracy_checker.html).
POT is open-sourced on GitHub as a part of OpenVINO and available at https://github.com/openvinotoolkit/openvino/tools/pot.
To install POT from source:
- Clone OpenVINO repository
```sh
git clone --recusive https://github.com/openvinotoolkit/openvino.git
```
- Navigate to `openvino/tools/pot/` folder
- Install POT package:
```sh
python3 setup.py install
```
Further documentation presumes that you are familiar with basic Deep Learning concepts, such as model inference, dataset preparation, model optimization, as well as with the OpenVINO&trade; toolkit and its components, such as [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) and [Accuracy Checker Tool](@ref omz_tools_accuracy_checker).
## Get started
### Installation
To install POT, follow the [Installation Guide](docs/InstallationGuide.md).
### Usage options
![](docs/images/use_cases.png)
The POT provides three basic usage options:
* **Command-line interface (CLI)**:
* [**Simplified mode**](@ref pot_docs_simplified_mode): use this option if the model belongs to the **Computer Vision** domain and you have an **unannotated dataset** for optimization. This optimization method does not allow measuring model accuracy and might cause its deviation.
* [**Model Zoo flow**](@ref pot_compression_cli_README): this option is recommended if the model is similar to the model from OpenVINO&trade; [Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) or there is a valid [Accuracy Checker Tool](@ref omz_tools_accuracy_checker)
configuration file for the model that allows validating model accuracy using [Accuracy Checker Tool](@ref omz_tools_accuracy_checker).
* [**Python\* API**](@ref pot_compression_api_README): this option allows integrating the optimization methods implemented in POT into
a Python* inference script that uses [OpenVINO Python* API](https://docs.openvino.ai/latest/openvino_inference_engine_ie_bridges_python_docs_api_overview.html).
After installation POT is available as a Python library under `openvino.tools.pot.*` and in the command line by the `pot` alias. To verify it, run `pot -h`.
POT is also integrated into [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench), a web-based graphical environment
that enables you to to import, optimize, benchmark, visualize, and compare performance of deep learning models.
### Examples
## Examples
OpenVINO provides several examples to demonstrate the POT optimization workflow:
@ -81,12 +50,10 @@ OpenVINO provides several examples to demonstrate the POT optimization workflow:
* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
* [Quantization of Object Detection model with controable accuracy](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection)
* [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
* [Quantizatin of speech model for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
## See Also
* [Low Precision Optimization Guide](docs/LowPrecisionOptimizationGuide.md)
* [Post-Training Optimization Best Practices](docs/BestPractices.md)
* [POT Frequently Asked Questions](docs/FrequentlyAskedQuestions.md)
* [Performance Benchmarks](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks_openvino.html)
* [INT8 Quantization by Using Web-Based Interface of the DL Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Int_8_Quantization.html)

View File

@ -44,8 +44,7 @@ There are two options to define engine parameters in this mode:
## Compression Parameters
This section defines optimization algorithms and their parameters. For more details about parameters of the concrete optimization algorithm, please refer to the corresponding
[documentation](@ref pot_compression_algorithms_quantization_README).
For more details about parameters of the concrete optimization algorithm, see descriptions of [Default Quantization](@ref pot_compression_algorithms_quantization_default_README) and [Accuracy-aware Quantizatoin](@ref accuracy_aware_README) methods.
## Examples of the Configuration File
@ -57,4 +56,3 @@ For details on how to run the Post-Training Optimization Tool with a sample conf
## See Also
* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
* [POT API](@ref pot_compression_api_README)

View File

@ -0,0 +1,182 @@
# Quantizing Model with Accuracy Control{#pot_accuracyaware_usage}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
AccuracyAwareQuantization Method <accuracy_aware_README>
@endsphinxdirective
## Introduction
This document assumes that you already tried [Default Quantization](@ref pot_default_quantization_usage) for the same model. In case when it introduces a significant accuracy degradation, the Accuracy-aware Quantization algorithm can be used to remain accuracy within the pre-defined range. This may cause a
degradation of performance in comparison to [Default Quantization](@ref pot_default_quantization_usage) algorithm because some layers can be reverted back to the original precision.
> **NOTE**: In case of GNA `target_device`, the Accuracy-aware Quantization algorithm behavior is different. It is searching for the best configuration selecting between INT8 and INT16 precisions for weights of each layer. The algorithm works for the `performance` preset only. For the `accuracy` preset, this algorithm is not helpful since the whole model is already in INT16 precision.
A script for Accuracy-aware Quantization includes four steps:
1. Prepare data and dataset interface
2. Define accuracy metric
3. Select quantization parameters
4. Define and run quantization process
## Prepare data and dataset interface
This step is the same as in the case of [Default Quantization](@ref pot_default_quantization_usage). The only difference is that `__getitem__()` method should return `(data, annotation)` or `(data, annotation, metadata)` where `annotation` is required and its format should correspond to the expectations of the `Metric` class. `metadata` is an optional field that can be used to store additional information required for post-processing.
## Define accuracy metric
To control accuracy during the optimization a `openvino.tools.pot.Metric` interface should be implemented. Each implementation should override the following properties:
- `value` - returns the accuracy metric value for the last model output in a format of `Dict[str, numpy.array]`.
- `avg_value` - returns the average accuracy metric over collected model results in a format of `Dict[str, numpy.array]`.
- `higher_better` should return `True` if a higher value of the metric corresponds to better performance, otherwise, returns `False`. Default implementation returns `True`.
and methods:
- `update(output, annotation)` - calculates and updates the accuracy metric value using the last model output and annotation. The model output and annotation should be passed in this method. It should also contain the model-specific post-processing in case the model returns the raw output.
- `reset()` - resets collected accuracy metric.
- `get_attributes()` - returns a dictionary of metric attributes:
```
{metric_name: {attribute_name: value}}
```
Required attributes:
- `direction` - (`higher-better` or `higher-worse`) a string parameter defining whether metric value
should be increased in accuracy-aware algorithms.
- `type` - a string representation of metric type. For example, 'accuracy' or 'mean_iou'.
Below is an example of the accuracy top-1 metric implementation with POT API:
```python
from openvino.tools.pot import Metric
class Accuracy(Metric):
# Required methods
def __init__(self, top_k=1):
super().__init__()
self._top_k = top_k
self._name = 'accuracy@top{}'.format(self._top_k)
self._matches = [] # container of the results
@property
def value(self):
""" Returns accuracy metric value for all model outputs. """
return {self._name: self._matches[-1]}
@property
def avg_value(self):
""" Returns accuracy metric value for all model outputs. """
return {self._name: np.ravel(self._matches).mean()}
def update(self, output, target):
""" Updates prediction matches.
:param output: model output
:param target: annotations
"""
if len(output) > 1:
raise Exception('The accuracy metric cannot be calculated '
'for a model with multiple outputs')
if isinstance(target, dict):
target = list(target.values())
predictions = np.argsort(output[0], axis=1)[:, -self._top_k:]
match = [float(t in predictions[i]) for i, t in enumerate(target)]
self._matches.append(match)
def reset(self):
""" Resets collected matches """
self._matches = []
def get_attributes(self):
"""
Returns a dictionary of metric attributes {metric_name: {attribute_name: value}}.
Required attributes: 'direction': 'higher-better' or 'higher-worse'
'type': metric type
"""
return {self._name: {'direction': 'higher-better',
'type': 'accuracy'}}
```
An instance of the `Metric` implementation should be passed to `IEEngine` object responsible for model inference.
```python
metric = Accuracy()
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)
```
## Select quantization parameters
Accuracy-aware Quantization uses the Default Quantization algorithm at the initialization step so that all its parameters are also valid and can be specified. Here, we
describe only Accuracy-aware Quantization required parameters:
- `"maximal_drop"` - maximum accuracy drop which has to be achieved after the quantization. Default value is `0.01` (1%).
## Run quantization
The code example below shows a basic quantization workflow with accuracy control. `UserDataLoader()` is a placeholder for the implementation of `DataLoader`.
```python
from openvino.tools.pot import IEEngine
from openvino.tools.pot load_model, save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline
# Model config specifies the model name and paths to model .xml and .bin file
model_config = Dict(
{
"model_name": "model",
"model": path_to_xml,
"weights": path_to_bin,
}
)
# Engine config
engine_config = Dict({"device": "CPU"})
algorithms = [
{
"name": "AccuracyAwareQuantization",
"params": {
"target_device": "ANY",
"stat_subset_size": 300,
'maximal_drop': 0.02
},
}
]
# Step 1: implement and create user's data loader
data_loader = UserDataLoader()
# Step 2: implement and create user's data loader
metric = Accuracy()
# Step 3: load model
model = load_model(model_config=model_config)
# Step 4: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader, metric=metric)
# Step 5: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
compressed_model = pipeline.run(model=model)
# Step 6 (Optional): Compress model weights to quantized precision
# in order to reduce the size of the final .bin file.
compress_model_weights(compressed_model)
# Step 7: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved
compressed_model_paths = save_model(
model=compressed_model,
save_path="optimized_model",
model_name="optimized_model",
)
# Step 8 (Optional): Evaluate the compressed model. Print the results.
metric_results = pipeline.evaluate(compressed_model)
```
It is worth noting that now the `evaluate` method that can compute accuracy on demand is also available in the `Pipeline` object.
In case when Accuracy-aware Quantization does not allow achieving the desired accuracy-performance trade-off, it is recommended to try Quantization-aware Training from [NNCF](@ref docs_nncf_introduction).
## Examples
* [Quantization of Object Detection model with control of accuracy](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection)

View File

@ -1,10 +1,18 @@
# Post-Training Optimization Best Practices {#pot_docs_BestPractices}
This document describes the most common insights about model optimization using the Post-training Optimization Tool (POT). The post-training optimization usually is
the fastest and easiest way to get a low-precision model because it does not require model fine-tuning and thus, there is no need in the training dataset, pipeline and availability of
the powerful training hardware. In some cases, it may lead to not satisfactory accuracy drop, especially when optimizing the whole model.
However, it can be still helpful for fast performance evaluation in order to understand the possible speed up
when applying one or another optimization method. Before going into details
we suggest reading the following [POT documentation](../README.md).
# Post-Training Quantization Best Practices {#pot_docs_BestPractices}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
Saturation Issue <pot_saturation_issue>
@endsphinxdirective
## Introduction
The [Default Quantization](@ref pot_default_quantization_usage) of the Post-training Optimization Tool (POT) is
the fastest and easiest way to get a quantized model because it requires only some unannotated representative dataset to be provided in most cases. Thus, it is recommended to use it as a starting point when it comes to model optimization. However, it can lead to significant accuracy deviation in some cases. This document is aimed at providing tips to address this issue.
> **NOTE**: POT uses inference on the CPU during model optimization. It means the ability to infer the original
> floating-point model is a prerequisite for model optimization.
@ -12,69 +20,53 @@ we suggest reading the following [POT documentation](../README.md).
> architecture when optimizing for CPU or VNNI-based CPU when quantizing for a non-CPU device, such as GPU, VPU, or GNA.
> It should help to avoid the impact of the [saturation issue](@ref pot_saturation_issue) that occurs on AVX and SSE based CPU devices.
## Get Started with Post-Training Quantization
Post-training quantization is a basic feature of the POT and it has lots of knobs that can be used to get an accurate
quantized model. However, as a starting point we suggest using the `DefaultQuantization` algorithm with default settings.
In many cases it leads to satisfied accuracy and performance speedup.
A fragment of the configuration file (`openvino/tools/pot/configs/templates/default_quantization_template.json` in the POT directory) with default settings is shown below:
```
"compression": {
"target_device": "ANY", // Target device, the specificity of which will be taken into account during optimization.
// The default value "ANY" stands for compatible quantization supported by any HW.
"algorithms": [
{
"name": "DefaultQuantization", // Optimization algorithm name
"params": {
"preset": "performance", // Preset [performance, mixed] which control the quantization
// mode (symmetric, mixed (weights symmetric and activations asymmetric)
// and fully asymmetric respectively)
"stat_subset_size": 300 // Size of subset to calculate activations statistics that can be used
// for quantization parameters calculation
}
## Improving accuracy after the Default Quantization
Parameters of the Default Quantization algorithm with basic settings are shown below:
```python
{
"name": "DefaultQuantization", # Optimization algorithm name
"params": {
"preset": "performance", # Preset [performance, mixed] which controls
# the quantization scheme. For the CPU:
# performance - symmetric quantization of weights and activations
# mixed - symmetric weights and asymmetric activations
"stat_subset_size": 300 # Size of subset to calculate activations statistics that can be used
# for quantization parameters calculation
}
]
}
```
In the case of substantial accuracy degradation after applying the `DefaultQuantization` algorithm there are two alternatives to use:
In the case of substantial accuracy degradation after applying this method there are two alternatives:
1. Hyperparameters tuning
2. AccuracyAwareQuantization algorithm
## Tuning Hyperparameters of the DefaultQuantization
The `DefaultQuantization` algorithm provides multiple hyperparameters which can be used in order to improve accuracy results for the fully-quantized model.
Below is a list of best practices which can be applied to improve accuracy without a substantial performance reduction with respect to default settings:
1. The first option that we recommend is to change is `preset` from `performance` to `mixed`. This enables asymmetric quantization of
activations and can be helpful for NNs with non-ReLU activation functions, e.g. YOLO, EfficientNet, etc.
### Tuning Hyperparameters of the Default Quantization
The Default Quantization algorithm provides multiple hyperparameters which can be used in order to improve accuracy results for the fully-quantized model.
Below is a list of best practices that can be applied to improve accuracy without a substantial performance reduction with respect to default settings:
1. The first recommended option is to change the `preset` from `performance` to `mixed`. This enables asymmetric quantization of
activations and can be helpful for models with non-ReLU activation functions, for example, YOLO, EfficientNet, etc.
2. The next option is `use_fast_bias`. Setting this option to `false` enables a different bias correction method which is more accurate, in general,
and applied after model quantization as a part of the `DefaultQuantization` algorithm.
and applied after model quantization as a part of the Default Quantization algorithm.
> **NOTE**: Changing this option can substantially increase quantization time in the POT tool.
3. Another important option is a `range_estimator`. It defines how to calculate the minimum and maximum of quantization range for weights and activations.
For example, the following `range_estimator` for activations can improve the accuracy for Faster R-CNN based networks:
```
"compression": {
"target_device": "ANY",
"algorithms": [
{
"name": "DefaultQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 300
```python
{
"name": "DefaultQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 300
"activations": {
"range_estimator": {
"max": {
"aggregator": "max",
"type": "abs_max"
}
}
"activations": { # defines activation
"range_estimator": { # defines how to estimate statistics
"max": { # right border of the quantizating floating-point range
"aggregator": "max", # use max(x) to aggregate statistics over calibration dataset
"type": "abs_max" # use abs(max(x)) to get per-sample statistics
}
}
}
]
}
}
```
@ -85,43 +77,30 @@ It is assumed that this dataset should contain a sufficient number of representa
However, we empirically found that 300 samples are sufficient to get representative statistics in most cases.
5. The last option is `ignored_scope`. It allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision.
For example, `DetectionOutput` layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models.
One of the sources for the ignored scope can be the AccuracyAware algorithm which can revert layers back to the original precision (see details below).
One of the sources for the ignored scope can be the Accuracy-aware algorithm which can revert layers back to the original precision (see details below).
## AccuracyAwareQuantization
In case when the steps above do not lead to the accurate quantized model you may use the so-called `AccuracyAwareQuantization` algorithm which leads to mixed-precision models.
The whole idea behind that is to revert quantized layers back to floating-point precision based on their contribution to the accuracy drop until the desired accuracy degradation with respect to
the full-precision model is satisfied.
## Accuracy-aware Quantization
In case when the steps above do not lead to the accurate quantized model you may use the so-called [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) algorithm which leads to mixed-precision models.
A fragment of Accuracy-aware Quantization configuration with default settings is shown below below:
```python
{
"name": "AccuracyAwareQuantization",
"params": {
"preset": "performance",
"stat_subset_size": 300,
A fragment of the configuration file with default settings is shown below (`openvino/tools/pot/configs/templates/accuracy_aware_quantization_template.json`):
```
"compression": {
"target_device": "ANY", // Target device, the specificity of which will be taken into account during optimization.
// The default value "ANY" stands for compatible quantization supported by any HW.
"algorithms": [
{
"name": "AccuracyAwareQuantization", // Optimization algorithm name
"params": {
"preset": "performance", // Preset [performance, mixed, accuracy] which control the quantization
// mode (symmetric, mixed (weights symmetric and activations asymmetric)
// and fully asymmetric respectively)
"stat_subset_size": 300, // Size of subset to calculate activations statistics that can be used
// for quantization parameters calculation
"maximal_drop": 0.01 // Maximum accuracy drop which has to be achieved after the quantization
}
}
]
"maximal_drop": 0.01 # Maximum accuracy drop which has to be achieved after the quantization
}
}
```
Since the `AccuracyAwareQuantization` calls the `DefaultQuantization` at the first step it means that all the parameters of the latter one are also valid and can be applied to the
accuracy-aware scenario.
Since the Accuracy-aware Quantization calls the Default Quantization at the first step it means that all the parameters of the latter one are also valid and can be applied to the accuracy-aware scenario.
> **NOTE**: In general case, possible speedup after applying the `AccuracyAwareQuantization` algorithm is less than after the `DefaultQuantization` when the model gets fully-quantized.
> **NOTE**: In general case, possible speedup after applying the Accuracy-aware Quantization algorithm is less than after the Default Quantization when the model gets fully quantized.
### Reducing the performance gap of Accuracy-aware Quantization
To improve model performance after Accuracy-aware Quantization, you can try the `"tune_hyperparams"` setting and set it to `True`. It will enable searching for optimal quantization parameters before reverting layers to the "backup" precision. Note, that this can increase the overall quantization time.
If you do not achieve the desired accuracy and performance after applying the
`AccuracyAwareQuantization` algorithm or you need an accurate fully-quantized model,
we recommend either using layer-wise hyperparameters tuning with TPE or using
Quantization-Aware training from [the supported frameworks](LowPrecisionOptimizationGuide.md).
Accuracy-aware Quantization algorithm or you need an accurate fully-quantized model, we recommend either using Quantization-Aware Training from [NNCF](@ref docs_nncf_introduction).

View File

@ -6,38 +6,29 @@
:maxdepth: 1
:hidden:
Simplified mode <pot_docs_simplified_mode>
End-to-end CLI example <pot_configs_examples_README>
Simplified Mode <pot_docs_simplified_mode>
pot_configs_README
@endsphinxdirective
POT command-line interface (CLI) is designed to optimize models that are supported by the [Accuracy Checker Tool](@ref omz_tools_accuracy_checker) used for accuracy measurement.
If your model is exactly from the OpenVINO&trade; [Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) or it is similar to one of
its models then you can employ POT CLI to optimize your model.
In other cases, you should consider using POT [API](@ref pot_compression_api_README). To start with POT CLI please refer to the
following [example](@ref pot_configs_examples_README).
## Introduction
Note: There is also the so-called [**Simplified mode**](@ref pot_docs_simplified_mode) aimed at INT8 quantization if the model is from the Computer Vision domain and has a simple dataset preprocessing, like image resize and crop. In this case, you can also use POT CLI for
optimization. However, the accuracy results are not guaranteed in this case. Moreover, you are also limited in the
optimization methods choice since the accuracy measurement is not available.
POT command-line interface (CLI) is aimed at optimizing models that are similar to the models from OpenVINO&trade; [Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) or if there is a valid [AccuracyChecker Tool](@ref omz_tools_accuracy_checker) configuration file for the model. Examples of AccuracyChecker configuration files can be found on [GitHub](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public). Each model folder contains YAML configuration file that can be used with POT as is.
> **NOTE**: There is also the so-called [Simplified mode](@ref pot_docs_simplified_mode) aimed at optimizatoin of models from the Computer Vision domain and has a simple dataset preprocessing, like image resize and crop. In this case, you can also use POT CLI for optimization. However, the accuracy results are not guaranteed in this case. Moreover, you are also limited in the optimization methods choice since the accuracy measurement is not available.
## Prerequisites
1. Install POT following the [Installation Guide](@ref pot_InstallationGuide).
2. Convert your model from the framework representation into the OpenVINO&trade; IR format with the
[Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide).
3. Prepare the Accuracy Checker configuration file and make sure that the model can be successfully inferred and achieves
similar accuracy numbers as the reference model from the original framework.
4. Activate the Python environment in the command-line shell where the POT and the Accuracy Checker were installed.
## Run POT CLI
There are two ways how to run POT via command line:
- **Basic usage**. In this case you can run POT with basic setting just specifying all the options via command line:
- **Basic usage for DefaultQuantization**. In this case you can run POT with basic setting just specifying all the options via command line. `-q default` stands for [DefaultQuantization](../openvino/tools/pot/algorithms/quantization/default/README.md) algorithm:
```sh
pot -q default -m <path_to_xml> -w <path_to_bin> --ac-config <path_to_AC_config_yml>
```
- **Basic usage for AccuracyAwareQauntization**. You can also run [AccuracyAwareQuantization](../openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md) method with basic options. `--max-drop 0.01` option defines maximum accuracy deviation to 1 absolute percent from the original model:
```sh
pot -q accuracy_aware -m <path_to_xml> -w <path_to_bin> --ac-config <path_to_AC_config_yml> --max-drop 0.01
```
- **Advanced usage**. In this case you should prepare a configuration file for the POT where you can specify advanced options for the optimization
methods available. See [POT configuration file description](@ref pot_configs_README) for more details.
To launch the command-line tool with the configuration file run:

View File

@ -0,0 +1,152 @@
# Quantizing Model {#pot_default_quantization_usage}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
DefaultQuantization Method <pot_compression_algorithms_quantization_default_README>
@endsphinxdirective
## Introduction
This document describes how to apply model quantization with the Default Quantization method without accuracy control using an unannotated dataset. To use this method, you need to create a Python* script using an API of Post-Training Optimization Tool (POT) and implement data preparation logic and quantization pipeline. In case you are not familiar with Python*, you can try [command-line interface](@ref pot_compression_cli_README) of POT which is designed to quantize models from OpenVINO&trade; [Model Zoo](https://github.com/openvinotoolkit/open_model_zoo). The figure below shows the common workflow of the quantization script implemented with POT API.
![](./images/default_quantization_flow.png)
The script should include three basic steps:
1. Prepare data and dataset interface
2. Select quantization parameters
3. Define and run quantization process
## Prepare data and dataset interface
In most cases, it is required to implement only `openvino.tools.pot.DataLoader` interface which allows acquiring data from a dataset and applying model-specific pre-processing providing access by index. Any implementation should override the following methods:
- `__len__()`, returns the size of the dataset
- `__getitem__()`, provides access to the data by index in range of 0 to `len(self)`. It also can encapsulate the logic of model-specific pre-processing. The method should return data in the following format:
- `(data, annotation)`
where `data` is the input that is passed to the model at inference so that it should be properly preprocessed. `data` can be either `numpy.array` object or dictionary, where the key is the name of the model input and value is `numpy.array` which corresponds to this input. Since `annotation` is not used by the Default Quantization method this object can be `None` in this case.
You can wrap framework data loading classes by `openvino.tools.pot.DataLoader` interface which is usually straightforward. For example, `torch.utils.data.Dataset` has a similar interface as `openvino.tools.pot.DataLoader` so that its TorchVision implementations can be easily wrapped by POT API.
> **NOTE**: Model-specific preprocessing, for example, mean/scale normalization can be embedded into the model at the conversion step using Model Optimizer component. This should be considered during the implementation of the DataLoader interface to avoid "double" normalization which can lead to the loss of accuracy after optimization.
The code example below defines `DataLoader` for three popular use cases: images, text, and audio.
@sphinxtabset
@sphinxtab{Images}
@snippet tools/pot/docs/code/data_loaders.py image_loader
@endsphinxtab
@sphinxtab{Text}
@snippet tools/pot/docs/code/data_loaders.py text_loader
@endsphinxtab
@sphinxtab{Audio}
@snippet tools/pot/docs/code/data_loaders.py audio_loader
@endsphinxtab
@endsphinxtabset
## Select quantization parameters
Default Quantization algorithm has mandatory and optional parameters which are defined as a dictionary:
```python
{
"name": "DefaultQuantization",
"params": {
"target_device": "ANY",
"stat_subset_size": 300
},
}
```
- `"target_device"` - currently, only two options are available: `"ANY"` (or `"CPU"`) - to quantize model for CPU, GPU, or VPU, and `"GNA"` - for inference on GNA.
- `"stat_subset_size"` - size of data subset to calculate activations statistics used for quantization. The whole dataset is used if no parameter specified. We recommend using not less than 300 samples.
Full specification of the Default Quantization method is available in this [document](@ref pot_compression_algorithms_quantization_default_README).
## Run quantization
POT API provides its own methods to load and save model objects from OpenVINO Intermediate Representation: `load_model` and `save_model`. It also has a concept of `Pipeline` that sequentially applies specified optimization methods to the model. `create_pipeine` method is used to instantiate a `Pipeline` object.
A code example below shows a basic quantization workflow:
```python
from openvino.tools.pot import IEEngine
from openvino.tools.pot load_model, save_model
from openvino.tools.pot import compress_model_weights
from openvino.tools.pot import create_pipeline
# Model config specifies the model name and paths to model .xml and .bin file
model_config =
{
"model_name": "model",
"model": path_to_xml,
"weights": path_to_bin,
}
# Engine config
engine_config = {"device": "CPU"}
algorithms = [
{
"name": "DefaultQuantization",
"params": {
"target_device": "ANY",
"stat_subset_size": 300
},
}
]
# Step 1: Implement and create user's data loader
data_loader = ImageLoader("<path_to_images>")
# Step 2: Load model
model = load_model(model_config=model_config)
# Step 3: Initialize the engine for metric calculation and statistics collection.
engine = IEEngine(config=engine_config, data_loader=data_loader)
# Step 4: Create a pipeline of compression algorithms and run it.
pipeline = create_pipeline(algorithms, engine)
compressed_model = pipeline.run(model=model)
# Step 5 (Optional): Compress model weights to quantized precision
# to reduce the size of the final .bin file.
compress_model_weights(compressed_model)
# Step 6: Save the compressed model to the desired path.
# Set save_path to the directory where the model should be saved
compressed_model_paths = save_model(
model=compressed_model,
save_path="optimized_model",
model_name="optimized_model",
)
```
The output of the script is the quantized model that can be used for inference in the same way as the original full-precision model.
If accuracy degradation after applying the Default Quantization method is high, it is recommended to try tips from [Quantization Best Practices](@ref pot_docs_BestPractices) document or use [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) method.
## Quantizing cascaded models
In some cases, when the optimizing model is a cascaded model, i.e. consists of several submodels, for example, MT-CNN, you will need to implement a complex inference pipeline that can properly handle different submodels and data flow between them. POT API provides an `Engine` interface for this purpose which allows customization of the inference logic. However, we suggest inheriting from `IEEngine` helper class that already contains all the logic required to do the inference based on OpenVINO&trade; Python API. See the following [example](@ref pot_example_face_detection_README).
## Examples
* Tutorials:
* [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
* [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
* [Quantization of Segmentation model for medical data](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize)
* [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
* Samples:
* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
* [Quantizatin of speech model for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)

View File

@ -1,11 +1,11 @@
# End-to-end Command-line Interface example {#pot_configs_examples_README}
# End-to-end Command-line Interface Example {#pot_configs_examples_README}
This tutorial describes an example of running post-training quantization for **MobileNet v2 model from PyTorch** framework,
particularly by the DefaultQuantization algorithm.
The example covers the following steps:
- Environment setup
- Model preparation and converting it to the OpenVINO™ Intermediate Representation (IR) format
- Performance benchmarking of the original full-precision model and the converted one to the IR
- Performance benchmarking of the original full-precision model
- Dataset preparation
- Accuracy validation of the full-precision model in the IR format
- Model quantization by the DefaultQuantization algorithm and accuracy validation of the quantized model
@ -18,31 +18,19 @@ The example has been verified in Ubuntu 18.04 Operating System with Python 3.6 i
In case of issues while running the example, refer to [POT Frequently Asked Questions](@ref pot_docs_FrequentlyAskedQuestions) for help.
## Environment Setup
1. Install OpenVINO&trade; toolkit and Model Optimizer, Accuracy Checker and Post-training Optimization Tool components following the [Installation Guide](@ref pot_InstallationGuide).
2. Activate the Python* environment and OpenVINO environment as described in the [Installation Guide](@ref pot_InstallationGuide).
3. Create a separate working directory and navigate to it.
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, e.g. `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
## Model Preparation
1. Navigate to `<EXAMPLE_DIR>`.
2. Download the MobileNet v2 PyTorch model using [Model Downloader](@ref omz_tools_downloader) tool from the Open Model Zoo repository:
```sh
python3 ./downloader.py --name mobilenet-v2-pytorch
omz_downloader --name mobilenet-v2-pytorch
```
After that the original full-precision model is located in `<EXAMPLE_DIR>/public/mobilenet-v2-pytorch/`.
3. Convert the model to the OpenVINO™ Intermediate Representation (IR) format using [Model Converter](@ref omz_tools_downloader) tool:
```sh
python3 ./converter.py --name mobilenet-v2-pytorch
omz_converter --name mobilenet-v2-pytorch
```
After that the full-precision model in the IR format is located in `<EXAMPLE_DIR>/public/mobilenet-v2-pytorch/FP32/`.
@ -50,19 +38,9 @@ For more information about the Model Optimizer, refer to its [documentation](@re
## Performance Benchmarking of Full-Precision Models
1. Check the performance of the original model using [Deep Learning Benchmark](@ref openvino_inference_engine_tools_benchmark_tool_README) tool:
Check the performance of the full-precision model in the IR format using [Deep Learning Benchmark](@ref openvino_inference_engine_tools_benchmark_tool_README) tool:
```sh
python3 ./benchmark_app.py -m <EXAMPLE_DIR>/public/mobilenet-v2-pytorch/mobilenet-v2.onnx
```
Note that the results might be different dependently on characteristics of your machine. On a machine with Intel&reg; Core&trade; i9-10920X CPU @ 3.50GHz it is like:
```sh
Latency: 4.09 ms
Throughput: 1456.84 FPS
```
2. Check the performance of the full-precision model in the IR format using [Deep Learning Benchmark](@ref openvino_inference_engine_tools_benchmark_tool_README) tool:
```sh
python3 ./benchmark_app.py -m <EXAMPLE_DIR>/public/mobilenet-v2-pytorch/FP32/mobilenet-v2-pytorch.xml
benchmark_app -m <EXAMPLE_DIR>/public/mobilenet-v2-pytorch/FP32/mobilenet-v2-pytorch.xml
```
Note that the results might be different dependently on characteristics of your machine. On a machine with Intel&reg; Core&trade; i9-10920X CPU @ 3.50GHz it is like:
```sh
@ -190,7 +168,7 @@ specify the full-precision model in the IR format, `"config": "./mobilenet_v2_py
Check the performance of the quantized model using [Deep Learning Benchmark](@ref openvino_inference_engine_tools_benchmark_tool_README) tool:
```sh
python3 ./benchmark_app.py -m <INT8_MODEL>
benchmark_app -m <INT8_MODEL>
```
where `<INT8_MODEL>` is the path to the quantized model.
Note that the results might be different dependently on characteristics of your machine. On a machine with Intel&reg; Core&trade; i9-10920X CPU @ 3.50GHz it is like:

View File

@ -0,0 +1,17 @@
# Examples {#pot_examples_description}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
API Examples <pot_example_README>
Command-line Example <pot_configs_examples_README>
@endsphinxdirective
This section provides a set of examples that demonstrate how to apply the post-training optimization methods to optimize various models from different domains. It contains optimization recipes for concrete models, that unnecessarily cover your case, but which should be sufficient to reuse these recipes to optimize custom models:
- [API Examples](@ref pot_example_README)
- [Commanad-line Example](@ref pot_configs_examples_README)

View File

@ -17,6 +17,7 @@ What else can I do?</a>
- <a href="#python">When I execute POT CLI, I get "File "/workspace/venv/lib/python3.6/site-packages/nevergrad/optimization/base.py", line 35... SyntaxError: invalid syntax". What is wrong?</a>
- <a href="#nomodule">What does a message "ModuleNotFoundError: No module named 'some\_module\_name'" mean?</a>
- <a href="#dump">Is there a way to collect an intermidiate IR when the AccuracyAware mechanism fails?</a>
- <a name="#outputs"> What do the messages "Output name: <result_operation_name> not found" or "Output node with <result_operation_name> is not found in graph" mean?</a>
### <a name="opensourced">Is the Post-training Optimization Tool (POT) opensourced?</a>
@ -26,9 +27,7 @@ Yes, POT is developed on GitHub as a part of [https://github.com/openvinotoolkit
### <a name="dataset">Can I quantize my model without a dataset?</a>
In general, you should have a dataset. The dataset should be annotated if you want to validate the accuracy.
If your dataset is not annotated, you can still quantize the model in the Simplified mode but you will not be able to measure the accuracy.
See [Post-Training Optimization Best Practices](BestPractices.md) for more details.
You can also use [POT API](../openvino/tools/pot/api/README.md) to integrate the post-training quantization into the custom inference pipeline.
If your dataset is not annotated, you can use [Default Quantization](@ref pot_default_quantization_usage) to quantize the model or command-line interface with [Simplified mode](@ref pot_docs_simplified_mode).
### <a name="framework">Can a model in any framework be quantized by the POT?</a>
@ -37,10 +36,11 @@ The POT accepts models in the OpenVINO&trade; Intermediate Representation (IR) f
### <a name="noac">I'd like to quantize a model and I've converted it to IR but I don't have the Accuracy Checker config. What can I do?</a>
To create the Accuracy Checker configuration file, refer to [Accuracy Checker documentation](@ref omz_tools_accuracy_checker) and
try to find the configuration file for your model among the ones available in the Accuracy Checker examples. An alternative way is to quantize the model
in the Simplified mode but you will not be able to measure the accuracy. See [Post-Training Optimization Best Practices](BestPractices.md) for more details.
Also, you can use [POT API](../openvino/tools/pot/api/README.md) to integrate the post-training quantization into your pipeline without the Accuracy Checker.
1. Try quantization using Python* API of the Post-training Optimization Tool. For more details see [Default Quantization](@ref pot_default_quantization_usage.
2. If you consider command-line usage only refer to [Accuracy Checker documentation](@ref omz_tools_accuracy_checker) to create the Accuracy Checker configuration file, and
try to find the configuration file for your model among the ones available in the Accuracy Checker examples.
3. An alternative way is to quantize the model
in the [Simplified mode](#ref pot_docs_simplified_mode) but you will not be able to measure the accuracy.
### <a name="tradeoff">What is a tradeoff when you go to low precision?</a>
@ -53,10 +53,10 @@ The other benefit of having a model in low precision is its smaller size.
First of all, you should validate the POT compression pipeline you are running, which can be done with the following steps:
1. Make sure the accuracy of the original uncompressed model has the value you expect. Run your POT pipeline with an empty compression config and evaluate the resulting model metric. Compare this uncompressed model accuracy metric value with your reference.
2. Run your compression pipeline with a single compression algorithm ([DefaultQuantization](../openvino/tools/pot/algorithms/quantization/default/README.md) or [AccuracyAwareQuantization](../openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md)) without any parameter values specified in the config (except for `preset` and `stat_subset_size`). Make sure you get the undesirable accuracy drop/performance gain in this case.
2. Run your compression pipeline with a single compression algorithm ([Default Quantization](@ref pot_default_quantization_usage) or [Accuracy-aware Quantization](@ref pot_accuracyaware_usage)) without any parameter values specified in the config (except for `preset` and `stat_subset_size`). Make sure you get the desirable accuracy drop/performance gain in this case.
Finally, if you have done the steps above and the problem persists, you could try to compress your model using the [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf_pytorch).
Note that NNCF usage requires you to have a PyTorch-based training pipeline of your model in order to perform compression-aware fine-tuning. See [Low Precision Optimization Guide](LowPrecisionOptimizationGuide.md) for more details.
Note that NNCF usage requires you to have a PyTorch or TensorFlow 2 based training pipeline of your model to perform Quantization-aware Training. See [Model Optimization Guide](@ref openvino_docs_model_optimization_guide) for more details.
### <a name="memory">I get “RuntimeError: Cannot get memory” and “RuntimeError: Output data was not allocated” when I quantize my model by the POT.</a>
@ -73,12 +73,13 @@ which is usually more accurate and takes more time but requires less memory. See
It can happen due to the following reasons:
- A wrong or not representative dataset was used during the quantization and accuracy validation. Please make sure that your data and labels are correct and they sufficiently reflect the use case.
- A wrong Accuracy Checker configuration file was used during the quantization. Refer to [Accuracy Checker documentation](@ref omz_tools_accuracy_checker) for more information.
- If the command-line interface was used for quantization, a wrong Accuracy Checker configuration file could lead to this problem. Refer to [Accuracy Checker documentation](@ref omz_tools_accuracy_checker) for more information.
- If [Default Quantization](@ref pot_default_quantization_usage) was used for quantization you can also try [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) method that allows controlling maximum accuracy deviation.
### <a name="longtime">The quantization process of my model takes a lot of time. Can it be decreased somehow?</a>
Quantization time depends on multiple factors such as the size of the model and the dataset. It also depends on the algorithm:
the [DefaultQuantization](../openvino/tools/pot/algorithms/quantization/default/README.md) algorithm takes less time than the [AccuracyAwareQuantization](../openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md) algorithm.
the [Default Quantization](@ref pot_default_quantization_usage) algorithm takes less time than the [ [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) algorithm.
The following configuration parameters also impact the quantization time duration
(see details in [Post-Training Optimization Best Practices](BestPractices.md)):
- `use_fast_bias`: when set to `false`, it increases the quantization time
@ -88,18 +89,9 @@ The following configuration parameters also impact the quantization time duratio
- `eval_requests_number`: the lower number, the more time might be required for the quantization
Note that higher values of `stat_requests_number` and `eval_requests_number` increase memory consumption by POT.
### <a name="import">I get "Import Error:... No such file or directory". How can I avoid it?</a>
It happens when some needed library is not available in your environment. To avoid it, execute the following command:
```sh
source <INSTALL_DIR>/bin/setupvars.sh
```
where `<INSTALL_DIR>` is the directory where the OpenVINO&trade; toolkit is installed.
### <a name="python">When I execute POT CLI, I get "File "/workspace/venv/lib/python3.6/site-packages/nevergrad/optimization/base.py", line 35... SyntaxError: invalid syntax". What is wrong?</a>
This error is reported when you have an older python version than 3.6 in your environment. Upgrade your python version. Refer to more details about the prerequisites
on the [Post-Training Optimization Tool](../README.md) page.
This error is reported when you have a Python version older than 3.6 in your environment. Upgrade your Python version.
### <a name="nomodule">What does a message "ModuleNotFoundError: No module named 'some\_module\_name'" mean?</a>
@ -108,3 +100,6 @@ It means that some required python module is not installed in your environment.
### <a name="dump">Is there a way to collect an intermidiate IR when the AccuracyAware mechanism fails?</a>
You can add `"dump_intermediate_model": true` to the POT configuration file and it will drop an intermidiate IR to `accuracy_aware_intermediate` folder.
### <a name="outputs"> What do the messages "Output name: <result_operation_name> not found" or "Output node with <result_operation_name> is not found in graph" mean?</a>
Errors are caused by missing output nodes names in a graph when using the POT tool for model quantization. It might appear for some models only for IRs converted from ONNX models using new frontend (which is the default conversion path starting from 2022.1 release). To avoid such errors, use legacy MO frontend to convert a model to IR by passing the --use_legacy_frontend option. Then, use the produced IR for quantization.

View File

@ -1,17 +1,15 @@
# Post-Training Optimization Tool Installation Guide {#pot_InstallationGuide}
## Prerequisites
* Python* 3.6 or higher
* [OpenVINO&trade;](https://docs.openvino.ai/latest/index.html)
The minimum and the recommended requirements to run the Post-training Optimization Tool (POT) are the same as in [OpenVINO&trade;](https://docs.openvino.ai/latest/index.html).
# Installation Guide
## Install POT from PyPI
The simplest way to get the Post-training Optimization Tool and OpenVINO&trade; installed is to use PyPI. Follow the steps below to do that:
1. Create a separate [Python* environment](https://docs.python.org/3/tutorial/venv.html) and activate it
2. To install OpenVINO&trade;, run `pip install openvino`.
3. To install POT and other OpenVINO&trade; developer tools, run `pip install openvino-dev`.
POT is distributed as a part of OpenVINO&trade; Development Tools package. For installation instruction, refer to this [document](@ref openvino_docs_install_guides_install_dev_tools).
Now the Post-training Optimization Tool is available in the command line by the `pot` alias. To verify it, run `pot -h`.
## Install POT from GitHub
The latest version of the Post-training Optimization Tool is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot) and can be installed from source. As prerequisites, you need to install [OpenVINO&trade; Runtime](@ref openvino_docs_install_guides_install_runtime) and other dependencies such as [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) and [Accuracy Checker](@ref omz_tools_accuracy_checker).
To install POT from source:
- Clone the OpenVINO repository
```sh
git clone --recusive https://github.com/openvinotoolkit/openvino.git
```
After installation, POT is available as a Python library under `openvino.tools.pot.*` and in the command line by the `pot` alias. To verify it, run `pot -h`.

View File

@ -0,0 +1,51 @@
# Optimizing models post-training {#pot_introduction}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
Quantizing Model <pot_default_quantization_usage>
Quantizing Model with Accuracy Control <pot_accuracyaware_usage>
Quantization Best Practices <pot_docs_BestPractices>
API Reference <pot_compression_api_README>
Command-line Interface <pot_compression_cli_README>
Examples <pot_examples_description>
pot_docs_FrequentlyAskedQuestions
@endsphinxdirective
## Introduction
Post-training model optimization is the process of applying special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization. Therefore, this process does not require a training dataset or a training pipeline in the source DL framework. To apply post-training methods in OpenVINO&trade;, you need:
* A floating-point precision model, FP32 or FP16, converted into the OpenVINO&trade; Intermediate Representation (IR) format
and run on CPU with the OpenVINO&trade;.
* A representative calibration dataset representing a use case scenario, for example, 300 samples.
* In case of accuracy constraints, a validation dataset and accuracy metrics should be available.
For the needs of post-training optimization, OpenVINO&trade; provides a Post-training Optimization Tool (POT) which supports the uniform integer quantization method. This method allows substantially increasing inference performance and reducing the model size.
Figure below shows the optimization workflow with POT:
![](./images/workflow_simple.png)
## Quantizing models with POT
POT provides two main quantization methods that can be used depending on the user's needs and requirements:
* [Default Quantization](@ref pot_default_quantization_usage) is a recommended method that provides fast and accurate results in most cases. It requires only a unannotated dataset for quantization. For details, see the [Default Quantization algorithm](@ref pot_compression_algorithms_quantization_default_README) documentation.
* [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) is an advanced method that allows keeping accuracy at a predefined range at the cost of performance improvement in case when `Default Quantization` cannot guarantee it. The method requires annotated representative dataset and may require more time for quantization. For details, see the
[Accuracy-aware Quantization algorithm](@ref accuracy_aware_README) documentation.
HW platforms support different integer precisions and quantization parameters, for example 8-bit in CPU, GPU, VPU, 16-bit for GNA. POT abstracts this complexity by introducing a concept of "target device" that is used to set quantization settings specific to the device. The `target_device` parameter is used for this purpose.
> **NOTE**: There is a special `target_device: "ANY"` which leads to portable quantized models compatible with CPU, GPU, and VPU devices. GNA-quantized models are compatible only with CPU.
For benchmarking results collected for the models optimized with the POT tool, refer to [INT8 vs FP32 Comparison on Select Networks and Platforms](@ref openvino_docs_performance_int8_vs_fp32).
## See Also
* [Performance Benchmarks](https://docs.openvino.ai/latest/openvino_docs_performance_benchmarks_openvino.html)
* [INT8 Quantization by Using Web-Based Interface of the DL Workbench](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Int_8_Quantization.html)

View File

@ -1,4 +1,4 @@
# Low Precision Optimization Guide {#pot_docs_LowPrecisionOptimizationGuide}
# Low Precision Optimization Guide
## Introduction
This document provides the best-known methods on how to use low-precision capabilities of the OpenVINO™ toolkit to transform models

View File

@ -1,4 +1,4 @@
# Low-precision model representation {#pot_docs_model_representation}
# Low-precision model representation
## Introduction
The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime.

View File

@ -1,4 +1,4 @@
# Saturation (overflow) issue workaround {#pot_saturation_issue}
# Saturation (overflow) Issue Workaround {#pot_saturation_issue}
## Introduction
8-bit instructions of previous generations of Intel&reg; CPUs, namely those based on SSE, AVX-2, AVX-512 instruction sets, admit so-called saturation (overflow) of the intermediate buffer when calculating the dot product which is an essential part of Convolutional or MatMul operations. This saturation can lead to an accuracy drop on the mentioned architectures during the inference of 8-bit quantized models. However, it is not possible to predict such degradation since most of the computations are executed in parallel during DL model inference which makes this process non-deterministic. This problem is typical for models with non-ReLU activation functions and low level of redundancy, for example, optimized or efficient models. It can prevent deploying the model on legacy hardware or creating cross-platform applications. The problem does not occur on the CPUs with Intel Deep Learning Boost (VNNI) technology and further generations, as well as on GPUs.

View File

@ -1,4 +1,4 @@
# Optimization with Simplified mode {#pot_docs_simplified_mode}
# Optimization with Simplified Mode {#pot_docs_simplified_mode}
## Introduction

View File

@ -0,0 +1,122 @@
# Copyright (C) 2018-2022 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
#! [image_loader]
import os
import numpy as np
import cv2 as cv
from openvino.tools.pot import DataLoader
class ImageLoader(DataLoader):
""" Loads images from a folder """
def __init__(self, dataset_path):
# Use OpenCV to gather image files
# Collect names of image files
self._files = []
all_files_in_dir = os.listdir(dataset_path)
for name in all_files_in_dir:
file = os.path.join(dataset_path, name)
if cv.haveImageReader(file):
self._files.append(file)
# Define shape of the model
self._shape = (224,224)
def __len__(self):
""" Returns the length of the dataset """
return len(self._files)
def __getitem__(self, index):
""" Returns image data by index in the NCHW layout
Note: model-specific preprocessing is omitted, consider adding it here
"""
if index >= len(self):
raise IndexError("Index out of dataset size")
image = cv.imread(self._files[index]) # read image with OpenCV
image = cv.resize(image, self._shape) # resize to a target input size
image = np.expand_dims(image, 0) # add batch dimension
image = image.transpose(0, 3, 1, 2) # convert to NCHW layout
return image, None # annotation is set to None
#! [image_loader]
#! [text_loader]
import os
from pathlib import Path
from datasets import load_dataset #pip install datasets
from transformers import AutoTokenizer #pip install transformers
from openvino.tools.pot import DataLoader
class TextLoader(DataLoader):
""" Loads content of .txt files from a folder """
def __init__(self, dataset_path):
# HuggingFace dataset API is used to process text files
# Collect names of text files
extension = ".txt"
files = sorted(str(p.stem) for p in
Path(dataset_path).glob("*" + extension))
files = [os.path.join(dataset_path, file + extension) for file in files]
self._dataset = load_dataset('text', data_files=files)
# replace with your tokenizer
self._tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
self._dataset = self._dataset.map(self._encode, batched=False)
# replace with names of model inputs
self._dataset.set_format(type='numpy',
columns=['input_ids', 'token_type_ids', 'attention_mask'])
def _encode(self, examples):
""" Tokenization of the input text """
return self._tokenizer(examples['text'], truncation=True, padding='max_length')
def __len__(self):
""" Returns the length of the dataset """
return len(self._dataset['train'])
def __getitem__(self, index):
""" Returns data by index as a (dict[str, np.array], None) """
if index >= len(self):
raise IndexError("Index out of dataset size")
data = self._dataset['train'][index]
return {'input_ids': data['input_ids'],
'token_type_ids': data['token_type_ids'],
'attention_mask': data['attention_mask']}, None # annotation is set to None
#! [text_loader]
#! [audio_loader]
import os
from pathlib import Path
import torchaudio # pip install torch torchaudio
from openvino.tools.pot import DataLoader
class AudioLoader(DataLoader):
""" Loads content of .wav files from a folder """
def __init__(self, dataset_path):
# Collect names of wav files
self._extension = ".wav"
self._dataset_path = dataset_path
self._files = sorted(str(p.stem) for p in
Path(self._dataset_path).glob("*" + self._extension))
def __len__(self):
""" Returns the length of the dataset """
return len(self._files)
def __getitem__(self, index):
""" Returns wav data by index
Note: model-specific preprocessing is omitted, consider adding it here
"""
if index >= len(self):
raise IndexError("Index out of dataset size")
file_name = self._files[index] + self._extension
file_path = os.path.join(self._dataset_path, file_name)
waveform, _ = torchaudio.load(file_path) # use a helper from torchaudio to load data
return waveform.numpy(), None # annotation is set to None
#! [audio_loader]

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8c9a0f1cccf731e0740cb91a3a48ff73b9f54cee0a39a03d421316a5599e7955
size 37302

View File

@ -6,7 +6,7 @@
<tab type="user" title="Low Precision Optimization Guide" url="@ref pot_docs_LowPrecisionOptimizationGuide"/>
<tab type="usergroup" title="Quantization" url="@ref pot_compression_algorithms_quantization_README">
<tab type="user" title="DefaultQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_default_README"/>
<tab type="user" title="AccuracyAwareQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_accuracy_aware_README"/>
<tab type="user" title="AccuracyAwareQuantization Algorithm" url="@ref accuracy_aware_README"/>
<tab type="user" title="Saturation issue workaround" url="@ref pot_saturation_issue"/>
<tab type="user" title="Low-precision model representation" url="@ref pot_docs_model_representation"/>
</tab>

View File

@ -1,23 +1,10 @@
# Quantization {#pot_compression_algorithms_quantization_README}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
DefaultQuantization Algorithm <pot_compression_algorithms_quantization_default_README>
AccuracyAwareQuantization Algorithm <pot_compression_algorithms_quantization_accuracy_aware_README>
TunableQuantization Algorithm <pot_compression_algorithms_quantization_tunable_quantization_README>
Saturation Issue Workaround <pot_saturation_issue>
Low-precision Model Representation <pot_docs_model_representation>
@endsphinxdirective
# Quantization
## Introduction
The primary optimization feature of the Post-training Optimization Tool (POT) is 8-bit uniform quantization which allows substantially increasing inference performance on all the platforms that have 8-bit instructions, for example, modern generations of CPU and GPU. Another benefit of quantization is a significant reduction of model footprint which in most cases achieves 4x.
The primary optimization feature of the Post-training Optimization Tool (POT) is the uniform integer quantization which allows substantially increasing inference performance and reducing the model size. Different HW platforms can support different integer precisions and POT is designed to support all of them, for example, 8-bit for CPU, GPU, VPU, 16-bit for GNA. Moreover, POT makes the specification of HW settings transparent for the user by introducing a concept of the `target_device` parameter.
> **NOTE**: There is a special `target_device: "ANY"` which leads to portable quantized models compatible with CPU, GPU, and VPU devices. GNA-quantized models are compatible only with CPU.
During the quantization process, the POT tool runs inference of the optimizing model to estimate quantization parameters for input activations of the quantizable operation. It means that a calibration dataset is required to perform quantization. This dataset may have or not have annotation depending on the quantization algorithm that is used.
@ -25,12 +12,11 @@ During the quantization process, the POT tool runs inference of the optimizing m
Currently, the POT provides two algorithms for 8-bit quantization, which are verified and guarantee stable results on a
wide range of DNN models:
* [**DefaultQuantization**](@ref pot_compression_algorithms_quantization_default_README) is a default method that provides fast and in most cases accurate results for 8-bit
quantization. It requires only a unannotated dataset for quantization. For details, see the [DefaultQuantization Algorithm](@ref pot_compression_algorithms_quantization_default_README) documentation.
* [**DefaultQuantization**](@ref pot_compression_algorithms_quantization_default_README) is a default method that provides fast and in most cases accurate results. It requires only an unannotated dataset for quantization. For details, see the [DefaultQuantization Algorithm](@ref pot_compression_algorithms_quantization_default_README) documentation.
* [**AccuracyAwareQuantization**](@ref pot_compression_algorithms_quantization_accuracy_aware_README) enables remaining at a predefined range of accuracy drop after quantization at the cost
* [**AccuracyAwareQuantization**](@ref accuracy_aware_README) enables remaining at a predefined range of accuracy drop after quantization at the cost
of performance improvement. The method requires annotated representative dataset and may require more time for quantization. For details, see the
[AccuracyAwareQuantization Algorithm](@ref pot_compression_algorithms_quantization_accuracy_aware_README) documentation.
[AccuracyAwareQuantization Algorithm](@ref accuracy_aware_README) documentation.
For more details about the representation of the low-precision model please refer to this [document](@ref pot_docs_model_representation).

View File

@ -1,33 +1,22 @@
# AccuracyAwareQuantization Algorithm {#pot_compression_algorithms_quantization_accuracy_aware_README}
# AccuracyAwareQuantization Algorithm {#accuracy_aware_README}
## Overview
AccuracyAware algorithm is designed to perform accurate quantization and allows the model to stay in the
pre-defined range of accuracy drop, for example 1%, defined by the user in the configuration file. This may cause a
degradation in performance in comparison to [DefaultQuantization](../default/README.md) algorithm because some layers can be reverted back to the original precision. The algorithm requires annotated dataset and cannot be used with the [Simplified mode](@ref pot_docs_simplified_mode).
> **NOTE**: In case of GNA `target_device`, POT moves INT8 weights to INT16 to stay in the pre-defined range of the accuracy drop. Thus, the algorithm works for the `performance` (INT8) preset only. For the `accuracy` preset, this algorithm is not performed, but the parameters tuning is available (if `tune_hyperparams` option is enabled).
Generally, the algorithm consists of the following steps:
1. The model gets fully quantized using the DefaultQuantization algorithm.
2. The quantized and full-precision models are compared on a subset of the validation set in order to find mismatches in the target accuracy metric. A ranking subset is extracted based on the mismatches.
3. Optionally, if the accuracy criteria cannot be satisfied with fully symmetric quantization, the quantized model gets converted to mixed mode, and step 2 is repeated.
4. A layer-wise ranking is performed in order to get a contribution of each quantized layer into the accuracy drop. To
get this ranking we revert every layer (one-by-one) back to floating-point precision and measure how it affects accuracy.
5. Based on the ranking, the most "problematic" layer is reverted back to the original precision. This change is followed by the evaluation of the obtained model on the full validation set in order to get a new accuracy drop.
6. If the accuracy criteria are satisfied for all pre-defined accuracy metrics defined in the configuration file,
the algorithm finishes. Otherwise, it continues reverting the next "problematic" layer.
7. It may happen that regular reverting does not get any accuracy improvement or even worsen the accuracy. Then the
re-ranking is triggered as it is described in step 4. However, it is possible to specify the maximum number of reverting
layers using a special parameter. Moreover, the algorithm saves intermediate results (models) that can be used at any time
without a need to wait until it finishes.
The figure below shows the diagram of the algorithm.
![](../../../../../../docs/images/aa_quantization_pipeline.png)
## Introduction
AccuracyAwareQuantization algorithm is aimed at accurate quantization and allows the model's accuracy to stay within the
pre-defined range. This may cause a
degradation in performance in comparison to [DefaultQuantization](../default/README.md) algorithm because some layers can be reverted back to the original precision.
## Parameters
Since the [DefaultQuantization](../default/README.md) algorithm is used as an initialization, all its parameters are also valid and can be specified. Here we
describe only AccuracyAware specific parameters:
Since the [DefaultQuantization](../default/README.md) algorithm is used as an initialization, all its parameters are also valid and can be specified. Here is an example of the definition of the `AccuracyAwareQuantization` method and its parameters:
```json
{
"name": "AccuracyAwareQuantization", // the name of optimization algorithm
"params": {
...
}
}
```
Below is the description of AccuracyAwareQuantization-specific parameters:
- `"ranking_subset_size"` - size of a subset that is used to rank layers by their contribution to the accuracy drop.
Default value is `300`. The more samples it has the better ranking you have, potentially.
- `"max_iter_num"` - maximum number of iterations of the algorithm, in other words maximum number of layers that may
@ -57,16 +46,10 @@ quantization time. Default value is `False`.
## Examples
A template and full specification for AccuracyAwareQuantization algorithm:
* [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/openvino/tools/pot/configs/templates/accuracy_aware_quantization_template.json)
* [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/accuracy_aware_quantization_spec.json)
Example of using POT API with Accuracy-aware algorithm:
Example:
* [Quantization of Object Detection model with control of accuracy](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection)
## See also
* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
* [Use POT Command-line for Model Zoo models](@ref pot_compression_cli_README)
* [POT API](@ref pot_compression_api_README)
* [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)
A template and full specification for AccuracyAwareQuantization algorithm for POT command-line interface:
* [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/accuracy_aware_quantization_template.json)
* [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/accuracy_aware_quantization_spec.json)

View File

@ -1,35 +1,19 @@
# DefaultQuantization Algorithm {#pot_compression_algorithms_quantization_default_README}
## Overview
DefaultQuantization algorithm is designed to perform a fast and in many cases accurate 8-bits quantization of NNs.
![](../../../../../../docs/images/default_quantization_pipeline.png)
The algorithm consists of three methods that are sequentially applied to a model:
* ActivationChannelAlignment - Used as a preliminary step before quantization and allows you to align ranges of output activations of Convolutional layers in order to reduce the quantization error.
* MinMaxQuantization - This is a vanilla quantization method that automatically inserts [FakeQuantize](@ref openvino_docs_ops_quantization_FakeQuantize_1) operations into the model graph based on the specified target hardware and initializes them
using statistics collected on the calibration dataset.
* FastBiasCorrection - Adjusts biases of Convolutional and Fully-Connected layers based on the quantization error of the layer in order to make the overall error unbiased.
This algorithm uses a two-stage statistic collection procedure, where the model is being inferred over the calibration
subset, so the wall-time of quantization basically depends on the size of the subset.
## Introduction
DefaultQuantization algorithm is designed to do a fast and, in many cases, accurate quantization. It does not have any control of accuracy metric but provides a lot of knobs that can be used to improve it.
## Parameters
The algorithm accepts all the parameters introduced by three algorithms that it relies on. These parameters should be
described in the corresponding section in the configuration file (see example below):
```json
"compression": {
"algorithms": [
{
"name": "DefaultQuantization", // the name of optimization algorithm
"params": {
...
}
}
]
DefaultQuantization algorithm has mandatory and optional parameters. For more details on how to use these parameters please refer to [Best Practices](@ref pot_docs_BestPractices) document. Here is an example of the definition of DefualtQuantization method and its parameters:
```python
{
"name": "DefaultQuantization", # the name of optimization algorithm
"params": {
...
}
}
```
DefaultQuantization algorithm' parameters can be roughly divided into two groups: mandatory and optional.
### Mandatory parameters
- `"preset"` - preset which controls the quantization mode (symmetric and asymmetric). It can take two values:
- `"performance"` (default) - stands for symmetric quantization of weights and activations. This is the most
@ -114,28 +98,22 @@ mode on the existing HW.
Enabling this option may increase compressed model accuracy, but will result in increased execution time and memory consumption.
## Examples
A template and full specification for DefaultQuantization algorithm:
* [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/openvino/tools/pot/configs/templates/default_quantization_template.json)
* [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json)
Command-line example:
* [Quantization of Image Classification model](https://docs.openvino.ai/latest/pot_configs_examples_README.html)
API tutorials:
Tutorials:
* [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
* [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
* [Quantization of Segmentation model for mediacal data](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize)
* [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
API examples:
Examples:
* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
* [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
* [Quantizatin of speech model for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
Command-line example:
* [Quantization of Image Classification model](https://docs.openvino.ai/latest/pot_configs_examples_README.html)
A template and full specification for DefaultQuantization algorithm for POT command-line inferface:
* [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_template.json)
* [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json)
## See also
* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
* [Use POT Command-line for Model Zoo models](@ref pot_compression_cli_README)
* [POT API](@ref pot_compression_api_README)
* [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)

View File

@ -1,8 +1,7 @@
# Experimental: Deep neural network protection through range supervision ("Ranger") {#pot_ranger_README}
# Experimental: Protecting Deep Learning Model through Range Supervision ("Ranger") {#pot_ranger_README}
# Overview
## Introdution
## Method and workflow
Deep neural network find applications in many scenarios where the prediction is a critical component for safety-relevant decisions. Such workloads can benefit from additional protection against underlying errors. For example, memory bit flips (**"soft errors"** originating, e.g., from external radiation or internal electrical disturbances within the circuitry) in der platform hosting the network inference can corrupt the learned network parameters and lead to incorrect predictions. Typically, errors resulting in very large parameter values have a more drastic impact on the network behavior. **The range supervision algorithm ("Ranger") described here establishes and inserts additional protection layers after already present activation layers**. Those layers truncate values that are found to be out of an expected activation range in order to mitigate the traces of potential platform errors. They do so during inference by applying a *clamp* operation to any activation *x* in the input to the Ranger layer,
```math
@ -18,18 +17,8 @@ The process flow follows the diagram [Fig 1](#Schematic). Starting from the inte
*Fig 1: Schematic of Ranger process flow.*
## Example
The following example shows a traffic camera image and predicted objects using a Yolov3 pretrained on the Coco dataset. A single weight fault was injected in a randomly chosen convolution layer of Yolo, flipping the most significant bit of the selected network parameter. If range supervision is applied, the original network performance is recovered despite the presence of the fault.
![](../../../../../../docs/ranger/images/img_combined_2.png)
*Fig 2: Example of fault mitigation via range supervision.*
# Usage
## Supported activation layers
### Supported activation layers
The following activation layers are currently supported for range supervision:
@ -43,42 +32,36 @@ The following activation layers are currently supported for range supervision:
This means that any activation layer of one of the above types, that the model under consideration contains, will be protected with an appropriate subsequent Ranger layer.
## Algorithm configuration
The range supervision algorithm is part of the post-training-optimizer tool and can be called by
## Usage
Ranger protection can be used the same way as [DefaultQuantization](@ref pot_default_quantization_usage) method.
pot -c config_file
Here, config_file is a separate file that has to be created to specify all details of the process. The minimal structure is (here in json format):
### Algorithm configuration
Algorithm has a minimal configuration. Below is an example of such configuration:
```json
{
"compression": {
"algorithms": [
{
"name": "Ranger",
"params": {
"stat_subset_size": 300
}
}
]
...
}
...
"name": "Ranger",
"params": {
"stat_subset_size": 300
}
}
```
The protected model will be saved in IR format in a new folder *./results/\<model_name\>_Ranger/...* .
Mandatory parameters:
- `"stat_subset_size"`: This parameter defines *how many images* of the specified dataset in "engine: config" are used to extract the bounds (images are randomly chosen if a subset is chosen). This value is set to **300** by default. The more images are selected for the bound generation, the more accurate the estimation of an out-of-bound event will be, at the cost of increasing extraction time.
- `"compression"` :
- `"algorithms"`:
- `"name"`: Algorithm name, here **"Ranger"**.
- `"params"`:
- `"stat_subset_size"`: This parameter defines *how many images* of the specified dataset in "engine: config" are used to extract the bounds (images are randomly chosen if a subset is chosen). This value is set to **300** by default. The more images are selected for the bound generation, the more accurate the estimation of an out-of-bound event will be, at the cost of increasing extraction time.
## Example of Ranger results
The following example shows a traffic camera image and predicted objects using a Yolov3 pretrained on the Coco dataset. A single weight fault was injected in a randomly chosen convolution layer of Yolo, flipping the most significant bit of the selected network parameter. If range supervision is applied, the original network performance is recovered despite the presence of the fault.
# Resources:
![](../../../../../../docs/ranger/images/img_combined_2.png)
*Fig 2: Example of fault mitigation via range supervision.*
## Resources:
- Z. Chen, G. Li, and K. Pittabiraman, "A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction", 2020. https://arxiv.org/abs/2003.13874
- F. Geissler, Q. Syed, S. Roychowdhury, A. Asgari, Y. Peng, A. Dhamasia, R. Graefe, K. Pattabiraman, and M. Paulitsch, "Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision", 2021. https://arxiv.org/abs/2108.07019

View File

@ -1,4 +1,4 @@
# TunableQuantization Algorithm {#pot_compression_algorithms_quantization_tunable_quantization_README}
# TunableQuantization Algorithm
## Overview

View File

@ -1,74 +1,6 @@
# Post-Training Optimization Tool API {#pot_compression_api_README}
# API Reference {#pot_compression_api_README}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
API Samples <pot_sample_README>
@endsphinxdirective
## Overview
The Post-Training Optimization Tool (POT) Python* API allows injecting optimization methods supported by POT into a
model inference script written with OpenVINO&trade; [Python* API](ie_python_api/api.html).
Thus, POT API helps to implement a custom
optimization pipeline for a single or cascaded/composite DL model (set of joint models). By the optimization pipeline,
we mean the consecutive application of optimization algorithms to the model. The input for the optimization pipeline is
a full-precision model, and the result is an optimized model. The optimization pipeline is configured to sequentially
apply optimization algorithms in the order they are specified. The key requirement for applying the optimization
algorithm is the availability of the calibration dataset for statistics collection and validation dataset for accuracy
validation which in practice can be the same. The Python* POT API provides simple interfaces for implementing:
- custom model inference pipeline with OpenVINO Inference Engine,
- data loading and pre-processing on an arbitrary dataset,
- custom accuracy metrics,
to make it possible to use optimization algorithms from the POT.
The Python* POT API provides `Pipeline` class for creating and configuring the optimization pipeline and applying it to
the model. The `Pipeline` class depends on the implementation of the following model specific interfaces which
should be implemented according to the custom DL model:
- `Engine` is responsible for model inference and provides statistical data and accuracy metrics for the model.
> **NOTE**: The POT has the implementation of the Engine class with the class name IEEngine located in
> `<POT_DIR>/engines/ie_engine.py`, where `<POT_DIR>` is a directory where the Post-Training Optimization Tool is installed.
> It is based on the [OpenVINO™ Inference Engine Python* API](ie_python_api/api.html)
> and can be used as a baseline engine in the customer pipeline instead of the abstract Engine class.
- `DataLoader` is responsible for the dataset loading, including the data pre-processing.
- `Metric` is responsible for calculating the accuracy metric for the model.
> **NOTE**: Metric is required if you want to use accuracy-aware optimization algorithms, such as `AccuracyAwareQuantization`
> algorithm.
The pipeline with implemented model specific interfaces such as `Engine`, `DataLoader` and `Metric` we will call the custom
optimization pipeline (see the picture below that shows relationships between classes).
![](../../../../docs/images/api.png)
## Use Cases
Before diving into the Python* POT API, it is highly recommended to read [Best Practices](@ref pot_docs_BestPractices) document where various
scenarios of using the Post-Training Optimization Tool are described.
The POT Python* API for model optimization can be used in the following cases:
- [Accuracy Checker](@ref omz_tools_accuracy_checker) tool does not support the model or dataset.
- POT does not support the model in the [Simplified Mode](@ref pot_docs_BestPractices) or produces the optimized model with low
accuracy in this mode.
- You already have the Python* script to validate the accuracy of the model using the [OpenVINO&trade; Runtime](@ref openvino_docs_OV_Runtime_User_Guide).
## Examples
* API tutorials:
* [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
* [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
* [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
* API examples:
* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
* [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
## API Description
Below is a detailed explanation of POT Python* APIs which should be implemented in order to create a custom optimization
pipeline.
Post-training Optimization Tool API provides a full set of interfaces and helpers that allow users to implement a custom optimization pipeline for various types of DL models including cascaded or compound models. Below is a full specification of this API:
### DataLoader
@ -81,7 +13,15 @@ The base class for all DataLoaders.
by index.
All subclasses should override `__len__()` function, which should return the size of the dataset, and `__getitem__()`,
which supports integer indexing in range of 0 to `len(self)`
which supports integer indexing in the range of 0 to `len(self)`. `__getitem__()` method can return data in one of the possible formats:
```
(data, annotation)
```
or
```
(data, annotation, metadata)
```
`data` is the input that is passed to the model at inference so that it should be properly preprocessed. `data` can be either `numpy.array` object or dictionary where the key is the name of the model input and value is `numpy.array` which corresponds to this input. The format of `annotation` should correspond to the expectations of the `Metric` class. `metadata` is an optional field that can be used to store additional information required for post-processing.
### Metric
@ -90,10 +30,15 @@ class openvino.tools.pot.Metric()
```
An abstract class representing an accuracy metric.
All subclasses should override the following properties:
- `value` - returns the accuracy metric value for the last model output.
- `avg_value` - returns the average accuracy metric value for all model outputs.
- `attributes` - returns a dictionary of metric attributes:
All instances should override the following properties:
- `value` - returns the accuracy metric value for the last model output in a format of `Dict[str, numpy.array]`.
- `avg_value` - returns the average accuracy metric over collected model results in a format of `Dict[str, numpy.array]`.
- `higher_better` should return `True` if a higher value of the metric corresponds to better performance, otherwise, returns `False`. Default implementation returns `True`.
and methods:
- `update(output, annotation)` - calculates and updates the accuracy metric value using the last model output and annotation. The model output and annotation should be passed in this method. It should also contain the model-specific post-processing in case the model returns the raw output.
- `reset()` - resets collected accuracy metric.
- `get_attributes()` - returns a dictionary of metric attributes:
```
{metric_name: {attribute_name: value}}
```
@ -102,10 +47,6 @@ All subclasses should override the following properties:
should be increased in accuracy-aware algorithms.
- `type` - a string representation of metric type. For example, 'accuracy' or 'mean_iou'.
All subclasses should override the following methods:
- `update(output, annotation)` - calculates and updates the accuracy metric value using last model output and annotation.
- `reset()` - resets collected accuracy metric.
### Engine
```
@ -232,11 +173,11 @@ The following methods can be overridden in subclasses:
`IEEngine` supports data returned by `DataLoader` in the format:
```
(img_id, img_annotation), image)
(data, annotation)
```
or
```
((img_id, img_annotation), image, image_metadata)
(data, annotation, metadata)
```
Metric values returned by a `Metric` instance are expected to be in the format:

View File

@ -1,35 +1,29 @@
# API usage sample for segmentation task {#pot_sample_3d_segmentation_README}
# Quantizatiing 3D Segmentation Model {#pot_example_3d_segmentation_README}
This sample demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a 3D segmentation model.
This example demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a 3D segmentation model.
The [Brain Tumor Segmentation](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/brain-tumor-segmentation-0002/brain-tumor-segmentation-0002.md) model from PyTorch* is used for this purpose.
A custom `DataLoader` is created to load images in NIfTI format from [Medical Segmentation Decathlon BRATS 2017](http://medicaldecathlon.com/) dataset for 3D semantic segmentation task
and the implementation of Dice Index metric is used for the model evaluation. In addition, this sample demonstrates how one can use image metadata obtained during image reading and
preprocessing to post-process the model raw output.
and the implementation of Dice Index metric is used for the model evaluation. In addition, this example demonstrates how one can use image metadata obtained during image reading and
preprocessing to post-process the model raw output. The code of the example is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation).
## How to prepare the data
To run this sample, you will need to download the Brain Tumors 2017 part of the Medical Segmentation Decathlon image database http://medicaldecathlon.com/.
To run this example, you will need to download the Brain Tumors 2017 part of the Medical Segmentation Decathlon image database http://medicaldecathlon.com/.
3D MRI data in NIfTI format can be found in the `imagesTr` folder, and segmentation masks are in `labelsTr`.
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
## How to Run the example
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `brain-tumor-segmentation-0002` model from the Open Model Zoo repository.
1. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `brain-tumor-segmentation-0002` model from the Open Model Zoo repository.
```sh
python3 ./downloader.py --name brain-tumor-segmentation-0002
omz_downloader --name brain-tumor-segmentation-0002
```
3. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
2. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
```sh
python3 ./converter.py --name brain-tumor-segmentation-0002 --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
omz_converter --name brain-tumor-segmentation-0002
```
4. Launch the sample script:
3. Launch the example script from the example directory:
```sh
python3 <POT_DIR>/api/samples/3d_segmentation/3d_segmentation_sample.py -m <PATH_TO_IR_XML> -d <BraTS_2017/imagesTr> --mask-dir <BraTS_2017/labelsTr>
python3 ./3d_segmentation_example.py -m <PATH_TO_IR_XML> -d <BraTS_2017/imagesTr> --mask-dir <BraTS_2017/labelsTr>
```
Optional: you can specify .bin file of IR directly using the `-w`, `--weights` options.

View File

@ -1,4 +1,4 @@
# Post-training Optimization Tool API samples {#pot_sample_README}
# Post-training Optimization Tool API Examples {#pot_example_README}
@sphinxdirective
@ -6,53 +6,46 @@
:maxdepth: 1
:hidden:
Image Classification Quantization Sample <pot_sample_classification_README>
Accuracy-Aware Quantization Sample <pot_sample_object_detection_README>
Cascaded Model Quantization Sample <pot_sample_face_detection_README>
Semantic segmentation quantization sample <pot_sample_segmentation_README>
3D Segmentation quantization sample <pot_sample_3d_segmentation_README>
GNA speech sample <pot_sample_speech_README>
Quantizatiing Image Classification Model <pot_example_classification_README>
Quantizatiing Object Detection Model with Accuracy Control <pot_example_object_detection_README>
Quantizatiing Cascaded Model <pot_example_face_detection_README>
Quantizatiing Semantic Segmentation Model <pot_example_segmentation_README>
Quantizatiing 3D Segmentation Model <pot_example_3d_segmentation_README>
Quantizatiing for GNA Device <pot_example_speech_README>
@endsphinxdirective
The Post-training Optimization Tool contains multiple samples that demonstrate how to use its [Software API](@ref pot_compression_api_README)
to optimize DL models which require special inference pipeline, data loading or metric calculation that
are not supported through the `AccuracyCheker` or `Simplified` engines (see [Best Practices](../../../../../docs/BestPractices.md) for more details).
The Post-training Optimization Tool contains multiple examples that demonstrate how to use its [API](@ref pot_compression_api_README)
to optimize DL models. All available examples can be found on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples).
All available samples can be found in `<POT_DIR>/api/samples` folder, where `<POT_DIR>` is a directory where the Post-Training Optimization Tool is installed.
> **NOTE**: - `<POT_DIR>` is referred to `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
> environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
> `<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
The following examples demonstrate the implementation of `Engine`, `Metric`, and `DataLoader` interfaces for various use cases:
There are currently the following samples that demonstrate the implementation of `Engine`, `Metric` and `DataLoader` interfaces
for classification, detection and segmentation tasks:
1. [Classification sample](./classification/README.md)
1. [Quantizing Image Classification model](./classification/README.md)
- Uses single `MobilenetV2` model from TensorFlow*
- Implements `DataLoader` to load .JPEG images and annotations of Imagenet database
- Implements `Metric` interface to calculate Accuracy at top-1 metric
- Uses DefaultQuantization algorithm for quantization model
2. [Object Detection sample](./object_detection/README.md)
2. [Quantizing Object Detection Model with Accuracy Control](./object_detection/README.md)
- Uses single `MobileNetV1 FPN` model from TensorFlow*
- Implements `Dataloader` to load images of COCO database
- Implements `Metric` interface to calculate mAP@[.5:.95] metric
- Uses `AccuracyAwareQuantization` algorithm for quantization model
3. [Segmentation sample](./segmentation/README.md)
3. [Quantizing Semantic Segmentation Model](./segmentation/README.md)
- Uses single `DeepLabV3` model from TensorFlow*
- Implements `DataLoader` to load .JPEG images and annotations of Pascal VOC 2012 database
- Implements `Metric` interface to calculate Mean Intersection Over Union metric
- Uses DefaultQuantization algorithm for quantization model
4. [3D Segmentation sample](./3d_segmentation/README.md)
4. [Quantizing 3D Segmentation Model](./3d_segmentation/README.md)
- Uses single `Brain Tumor Segmentation` model from PyTorch*
- Implements `DataLoader` to load images in NIfTI format from Medical Segmentation Decathlon BRATS 2017 database
- Implements `Metric` interface to calculate Dice Index metric
- Demonstrates how to use image metadata obtained during data loading to post-process the raw model output
- Uses DefaultQuantization algorithm for quantization model
5. [Face Detection sample](./face_detection/README.md)
5. [Quantizing Cascaded model](./face_detection/README.md)
- Uses cascaded (composite) `MTCNN` model from Caffe* that consists of three separate models in an OpenVino&trade; Intermediate Representation (IR)
- Implements `Dataloader` to load .jpg images of WIDER FACE database
- Implements `Metric` interface to calculate Recall metric
@ -61,9 +54,9 @@ for classification, detection and segmentation tasks:
OpenVino&trade; Inference Engine and process raw model output for the correct statistics collection
- Uses DefaultQuantization algorithm for quantization model
6. [GNA speech sample](./speech/README.md)
6. [Quantizing for GNA Device](./speech/README.md)
- Uses models from Kaldi*
- Implements `DataLoader` to data in .ark format
- Uses DefaultQuantization algorithm for quantization model
After execution of each sample above the quantized model is placed into the folder `optimized`. The accuracy validation of the quantized model is performed right after the quantization.
After execution of each example above the quantized model is placed into the folder `optimized`. The accuracy validation of the quantized model is performed right after the quantization.

View File

@ -1,32 +1,27 @@
# API usage sample for classification task {#pot_sample_classification_README}
# Quantizing Image Classification Model {#pot_example_classification_README}
This sample demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a classification model.
This example demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a classification model.
The [MobilenetV2](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/mobilenet-v2-1.0-224/mobilenet-v2-1.0-224.md) model from TensorFlow* is used for this purpose.
A custom `DataLoader` is created to load the [ImageNet](http://www.image-net.org/) classification dataset and the implementation of Accuracy at top-1 metric is used for the model evaluation.
A custom `DataLoader` is created to load the [ImageNet](http://www.image-net.org/) classification dataset and the implementation of Accuracy at top-1 metric is used for the model evaluation. The code of the example is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/classification).
## How to prepare the data
To run this sample, you need to [download](http://www.image-net.org/download-faq) the validation part of the ImageNet image database and place it in a separate folder,
To run this example, you need to [download](http://www.image-net.org/download-faq) the validation part of the ImageNet image database and place it in a separate folder,
which will be later referred as `<IMAGES_DIR>`. Annotations to images should be stored in a separate .txt file (`<IMAGENET_ANNOTATION_FILE>`) in the format `image_name label`.
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
## How to Run the example
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `mobilenet-v2-1.0-224` model from the Open Model Zoo repository.
1. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `mobilenet-v2-1.0-224` model from the Open Model Zoo repository.
```sh
python3 ./downloader.py --name mobilenet-v2-1.0-224
omz_downloader --name mobilenet-v2-1.0-224
```
3. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
2. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
```sh
python3 ./converter.py --name mobilenet-v2-1.0-224 --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
omz_converter --name mobilenet-v2-1.0-224 --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
```
4. Launch the sample script:
3. Launch the example script from the example directory:
```sh
python3 <POT_DIR>/api/samples/classification/classification_sample.py -m <PATH_TO_IR_XML> -a <IMAGENET_ANNOTATION_FILE> -d <IMAGES_DIR>
python3 ./classification_example.py -m <PATH_TO_IR_XML> -a <IMAGENET_ANNOTATION_FILE> -d <IMAGES_DIR>
```
Optional: you can specify .bin file of IR directly using the `-w`, `--weights` options.

View File

@ -1,37 +1,32 @@
# API usage sample for face detection task {#pot_sample_face_detection_README}
# Quantizing Cascaded Face detection Model {#pot_example_face_detection_README}
This sample demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a face detection model.
This example demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a face detection model.
The [MTCNN](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/mtcnn/mtcnn.md) model from Caffe* is used for this purpose.
A custom `DataLoader` is created to load [WIDER FACE](http://shuoyang1213.me/WIDERFACE/) dataset for a face detection task
and the implementation of Recall metric is used for the model evaluation. In addition, this sample demonstrates how one can implement
and the implementation of Recall metric is used for the model evaluation. In addition, this example demonstrates how one can implement
an engine to infer a cascaded (composite) model that is represented by multiple submodels in an OpenVino&trade; Intermediate Representation (IR)
and has a complex staged inference pipeline.
and has a complex staged inference pipeline. The code of the example is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection).
## How to prepare the data
To run this sample, you need to download the validation part of the Wider Face dataset http://shuoyang1213.me/WIDERFACE/.
To run this example, you need to download the validation part of the Wider Face dataset http://shuoyang1213.me/WIDERFACE/.
Images with faces divided into categories are placed in the `WIDER_val/images` folder.
Annotations in .txt format containing the coordinates of the face bounding boxes of the validation part of the dataset
can be downloaded separately and are located in the `wider_face_split/wider_face_val_bbx_gt.txt` file.
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
## How to Run the example
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `mtcnn` model from the Open Model Zoo repository.
1. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `mtcnn` model from the Open Model Zoo repository.
```sh
python3 ./downloader.py --name mtcnn*
omz_downloader --name mtcnn*
```
3. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
2. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
```sh
python3 ./converter.py --name mtcnn* --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
omz_converter --name mtcnn* --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
```
4. Launch the sample script:
3. Launch the example script from the example directory:
```sh
python3 <POT_DIR>/api/samples/face_detection/face_detection_sample.py -pm <PATH_TO_IR_XML_OF_PNET_MODEL>
python3 ./face_detection_example.py -pm <PATH_TO_IR_XML_OF_PNET_MODEL>
-rm <PATH_TO_IR_XML_OF_RNET_MODEL> -om <PATH_TO_IR_XML_OF_ONET_MODEL> -d <WIDER_val/images> -a <wider_face_split/wider_face_val_bbx_gt.txt>
```
Optional: you can specify .bin files of corresponding IRs directly using the `-pw/--pnet-weights`, `-rw/--rnet-weights` and `-ow/--onet-weights` options.

View File

@ -1,31 +1,26 @@
# API usage sample for object_detection {#pot_sample_object_detection_README}
# Quantizing Object Detection Model with Accuracy Control {#pot_example_object_detection_README}
This sample demonstrates the use of the [Post-training Optimization Toolkit API](@ref pot_compression_api_README) to
quantize an object detection model in the [accuracy-aware mode](@ref pot_compression_algorithms_quantization_accuracy_aware_README).
This example demonstrates the use of the [Post-training Optimization Toolkit API](@ref pot_compression_api_README) to
quantize an object detection model in the [accuracy-aware mode](@ref accuracy_aware_README).
The [MobileNetV1 FPN](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/ssd_mobilenet_v1_fpn_coco/ssd_mobilenet_v1_fpn_coco.md) model from TensorFlow* for object detection task is used for this purpose.
A custom `DataLoader` is created to load the [COCO](https://cocodataset.org/) dataset for object detection task
and the implementation of mAP COCO is used for the model evaluation.
and the implementation of mAP COCO is used for the model evaluation. The code of the example is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection).
## How to prepare the data
To run this sample, you will need to download the validation part of the [COCO](https://cocodataset.org/). The images should be placed in a separate folder, which will be later referred as `<IMAGES_DIR>` and annotation file `instances_val2017.json` later referred as `<ANNOTATION_FILE>`.
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
To run this example, you will need to download the validation part of the [COCO](https://cocodataset.org/). The images should be placed in a separate folder, which will be later referred to as `<IMAGES_DIR>` and the annotation file `instances_val2017.json` later referred to as `<ANNOTATION_FILE>`.
## How to Run the example
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `ssd_mobilenet_v1_fpn_coco` model from the Open Model Zoo repository.
1. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `ssd_mobilenet_v1_fpn_coco` model from the Open Model Zoo repository.
```sh
python3 ./downloader.py --name ssd_mobilenet_v1_fpn_coco
3. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
omz_downloader --name ssd_mobilenet_v1_fpn_coco
2. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
```sh
python3 ./converter.py --name ssd_mobilenet_v1_fpn_coco --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
omz_converter --name ssd_mobilenet_v1_fpn_coco --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
```
4. Launch the sample script:
3. Launch the example script from the example directory:
```sh
python <POT_DIR>/api/samples/object_detection/object_detection_sample.py -m <PATH_TO_IR_XML> -d <IMAGES_DIR> --annotation-path <ANNOTATION_FILE>
python ./object_detection_example.py -m <PATH_TO_IR_XML> -d <IMAGES_DIR> --annotation-path <ANNOTATION_FILE>
```
* Optional: you can specify .bin file of IR directly using the `-w`, `--weights` options.

View File

@ -1,34 +1,29 @@
# API usage sample for segmentation task {#pot_sample_segmentation_README}
# Quantizing Semantic Segmentation Model {#pot_example_segmentation_README}
This sample demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a segmentation model.
This example demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a segmentation model.
The [DeepLabV3](https://github.com/openvinotoolkit/open_model_zoo/blob/master/models/public/deeplabv3/deeplabv3.md) model from TensorFlow* is used for this purpose.
A custom `DataLoader` is created to load the [Pascal VOC 2012](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) dataset for semantic segmentation task
and the implementation of Mean Intersection Over Union metric is used for the model evaluation.
and the implementation of Mean Intersection Over Union metric is used for the model evaluation. The code of the example is available on [GitHub](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/segmentation).
## How to prepare the data
To run this sample, you will need to download the validation part of the Pascal VOC 2012 image database http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#data.
To run this example, you will need to download the validation part of the Pascal VOC 2012 image database http://host.robots.ox.ac.uk/pascal/VOC/voc2012/#data.
Images are placed in the `JPEGImages` folder, ImageSet file with the list of image names for the segmentation task can be found at `ImageSets/Segmentation/val.txt`
and segmentation masks are kept in the `SegmentationClass` directory.
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
## How to Run the example
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `deeplabv3` model from the Open Model Zoo repository.
1. Launch [Model Downloader](@ref omz_tools_downloader) tool to download `deeplabv3` model from the Open Model Zoo repository.
```sh
python3 ./downloader.py --name deeplabv3
omz_downloader --name deeplabv3
```
3. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
2. Launch [Model Converter](@ref omz_tools_downloader) tool to generate Intermediate Representation (IR) files for the model:
```sh
python3 ./converter.py --name deeplabv3 --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
omz_converter --name deeplabv3 --mo <PATH_TO_MODEL_OPTIMIZER>/mo.py
```
4. Launch the sample script:
3. Launch the example script from the example directory:
```sh
python3 <POT_DIR>/api/samples/segmentation/segmentation_sample.py -m <PATH_TO_IR_XML> -d <VOCdevkit/VOC2012/JPEGImages> --imageset-file <VOCdevkit/VOC2012/ImageSets/Segmentation/val.txt> --mask-dir <VOCdevkit/VOC2012/SegmentationClass>
python3 ./segmentation_example.py -m <PATH_TO_IR_XML> -d <VOCdevkit/VOC2012/JPEGImages> --imageset-file <VOCdevkit/VOC2012/ImageSets/Segmentation/val.txt> --mask-dir <VOCdevkit/VOC2012/SegmentationClass>
```
Optional: you can specify .bin file of IR directly using the `-w`, `--weights` options.

View File

@ -1,30 +1,25 @@
# API usage sample for speech task on GNA {#pot_sample_speech_README}
# Quantizing for GNA Device {#pot_example_speech_README}
This sample demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a speech model for [GNA](@ref openvino_docs_OV_UG_supported_plugins_GNA) device.
This example demonstrates the use of the [Post-training Optimization Tool API](@ref pot_compression_api_README) for the task of quantizing a speech model for [GNA](@ref openvino_docs_OV_UG_supported_plugins_GNA) device.
Quantization for GNA is different from CPU quantization due to device specific: GNA supports quantized inputs in INT16 and INT32 (for activations) precision and quantized weights in INT8 and INT16 precision.
This sample contains pre-selected quantization options based on the DefaultQuantization algorithm and created for models from [Kaldi](http://kaldi-asr.org/doc/) framework, and its data format.
A custom `ArkDataLoader` is created to load the dataset from files with .ark extension for speech analysis task.
This example contains pre-selected quantization options based on the DefaultQuantization algorithm and created for models from [Kaldi](http://kaldi-asr.org/doc/) framework, and its data format.
A custom `ArkDataLoader` is created to load the dataset from files with .ark extension for speech analysis task.
## How to prepare the data
To run this sample, you will need to use the .ark files for each model input from your `<DATA_FOLDER>`.
To run this example, you will need to use the .ark files for each model input from your `<DATA_FOLDER>`.
For generating data from original formats to .ark, please, follow the [Kaldi data preparation tutorial](https://kaldi-asr.org/doc/data_prep.html).
## How to Run the Sample
In the instructions below, the Post-Training Optimization Tool directory `<POT_DIR>` is referred to:
- `<ENV>/lib/python<version>/site-packages/` in the case of PyPI installation, where `<ENV>` is a Python*
environment where OpenVINO is installed and `<version>` is a Python* version, for example `3.6`.
`<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
## How to Run the example
1. To get started, follow the [Installation Guide](@ref pot_InstallationGuide).
2. Launch [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) with the necessary options (for details follow the [instructions for Kaldi](@ref openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi) to generate Intermediate Representation (IR) files for the model:
1. Launch [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) with the necessary options (for details follow the [instructions for Kaldi](@ref openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi) to generate Intermediate Representation (IR) files for the model:
```sh
python3 <PATH_TO_MODEL_OPTIMIZER>/mo.py --input_model <PATH_TO_KALDI_MODEL> [MODEL_OPTIMIZER_OPTIONS]
mo --input_model <PATH_TO_KALDI_MODEL> [MODEL_OPTIMIZER_OPTIONS]
```
3. Launch the sample script:
2. Launch the example script:
```sh
python3 <POT_DIR>/api/samples/speech/gna_sample.py -m <PATH_TO_IR_XML> -w <PATH_TO_IR_BIN> -d <DATA_FOLDER> --input_names [LIST_OF_MODEL_INPUTS] --files_for_input [LIST_OF_INPUT_FILES]
python3 <POT_DIR>/api/examples/speech/gna_example.py -m <PATH_TO_IR_XML> -w <PATH_TO_IR_BIN> -d <DATA_FOLDER> --input_names [LIST_OF_MODEL_INPUTS] --files_for_input [LIST_OF_INPUT_FILES]
```
Required parameters:
- `-i`, `--input_names` option. Defines list of model inputs;
@ -35,4 +30,4 @@ In the instructions below, the Post-Training Optimization Tool directory `<POT_D
- `-p`, `--preset` option. Defines preset for quantization: `performance` for INT8 weights, `accuracy` for INT16 weights;
- `-s`, `--subset_size` option. Defines subset size for calibration;
- `-o`, `--output` option. Defines output folder for quantized model.
4. Validate your INT8 model using `./speech_sample` from the Inference Engine samples. Follow the [speech sample description link](@ref openvino_inference_engine_samples_speech_sample_README) for details.
3. Validate your INT8 model using `./speech_example` from the Inference Engine examples. Follow the [speech example description link](@ref openvino_inference_engine_samples_speech_sample_README) for details.

View File

@ -16,5 +16,5 @@ def get_version():
version = f.readline().replace('\n', '')
return version
logger.warning('POT is not installed correctly. Please follow openvino/tools/pot/docs/InstallationGuide.md')
logger.warning('POT is not installed correctly. Please follow README.md')
return INVALID_VERSION