[POT] Documentation update (#10068)

* Updated main README * Added saturation fix desciption * Changed Low-precision model representation document * Added Simplified mode desciption. Updated DefaultQuantization, AccuracyAware, API descriptions. * Added Data-free model description. Adjusted other Readmes accordingly * Revised Configuration file description * Revised AA method description * Changed Quantization readme * Cross-links in quantization methods * Fixed reference * Fixed the structure * Removed data-free * Update tools/pot/docs/CLI.md Co-authored-by: Nikita Malinin <nikita.malinin@intel.com> * Update tools/pot/openvino/tools/pot/api/README.md Co-authored-by: Nikita Malinin <nikita.malinin@intel.com> * Applied comments * Fixed comments * Applied more comment * Applied comments * Fixed build errors * Fixed build errors * Small changes * Fixed a typo Co-authored-by: Nikita Malinin <nikita.malinin@intel.com>
2022-02-20 09:43:14 +03:00 · 2022-02-20 09:43:14 +03:00 · 5c7be85435
commit 5c7be85435
parent 5671ca2cf5
17 changed files with 292 additions and 420 deletions
--- a/tools/pot/README.md
+++ b/tools/pot/README.md
@ -13,6 +13,7 @@
   Command-line Interface <pot_compression_cli_README>
   pot_compression_api_README
   pot_configs_README
+   Deep neural network protection <pot_ranger_README>
   pot_docs_FrequentlyAskedQuestions

@endsphinxdirective
@ -20,50 +21,71 @@
 ## Introduction

 Post-training Optimization Tool (POT) is designed to accelerate the inference of deep learning models by applying
-special methods without model retraining or fine-tuning, like post-training quantization. Therefore, the tool does not
+special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization. Therefore, the tool does not
 require a training dataset or a pipeline. To apply post-training algorithms from the POT, you need:
 * A floating-point precision model, FP32 or FP16, converted into the OpenVINO&trade; Intermediate Representation (IR) format
 and run on CPU with the OpenVINO&trade;.
-* A representative calibration dataset representing a use case scenario, for example, 300 images.
+* A representative calibration dataset representing a use case scenario, for example, 300 images. 

-Post-training Optimization Tool provides the following key
-features:
+Figure below shows the optimization workflow:
+![](docs/images/workflow_simple.png) 
+
+### Features

 * Two post-training 8-bit quantization algorithms: fast [DefaultQuantization](openvino/tools/pot/algorithms/quantization/default/README.md) and precise [AccuracyAwareQuantization](openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md).
 * Compression for different hardware targets such as CPU and GPU.
 * Multiple domains: Computer Vision, Natural Language Processing, Recommendation Systems, Speech Recognition.
+* [Command-line tool](docs/CLI.md) that provides a simple interface for basic use cases.
 * [API](openvino/tools/pot/api/README.md) that helps to apply optimization methods within a custom inference script written with OpenVINO Python* API.
-* Symmetric and asymmetric quantization schemes. For details, see the [Quantization](openvino/tools/pot/algorithms/quantization/README.md) section.
-* Per-channel quantization for Convolutional and Fully-Connected layers.
-
-The tool is aimed to fully automate the model transformation process without a need to change the model on the user's side. For details about 
-the low-precision flow in OpenVINO&trade;, see the [Low Precision Optimization Guide](docs/LowPrecisionOptimizationGuide.md).
+* (Experimental) [Ranger algorithm](@ref pot_ranger_README) for model prodection in safity-critical cases.

 For benchmarking results collected for the models optimized with POT tool, see [INT8 vs FP32 Comparison on Select Networks and Platforms](@ref openvino_docs_performance_int8_vs_fp32).

-POT is opensourced on GitHub as a part of [https://github.com/openvinotoolkit/openvino](https://github.com/openvinotoolkit/openvino).
+POT is opensourced on GitHub as a part of OpenVINO and available at https://github.com/openvinotoolkit/openvino/tools/pot.

-Further documentation presumes that you are familiar with the basic Deep Learning concepts, such as model inference,
-dataset preparation, model optimization, as well as with the OpenVINO&trade; toolkit and its components such 
-as  [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) 
+Further documentation presumes that you are familiar with basic Deep Learning concepts, such as model inference,
+dataset preparation, model optimization, as well as with the OpenVINO&trade; toolkit and its components, such as  [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide) 
 and [Accuracy Checker Tool](@ref omz_tools_accuracy_checker).

-## Use POT
-![](docs/images/workflow.png) 
+## Get started

-The POT provides three basic usage scenarios:
-* **[Command-line interface](docs/CLI.md)**: this is the recommended path if the model from OpenVINO&trade; 
-[Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) or there is a valid [Accuracy Checker Tool](@ref omz_tools_accuracy_checker)
-configuration file for the model that allows validating model accuracy using [Accuracy Checker Tool](@ref omz_tools_accuracy_checker).
-* **[Python* API](openvino/tools/pot/api/README.md)**: it allows integrating optimization methods implemented in POT into
-a Python* inference script written with [Python* API](ie_python_api/api.html). 
-This flow is recommended if it is not possible to use [Accuracy Checker Tool](@ref omz_tools_accuracy_checker)
-for validation on the dedicated dataset.
-* **[Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench)**: a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare performance of deep learning models.
+### Installation
+To install POT, follow the [Installation Guide](docs/InstallationGuide.md).

-> **NOTE**: POT also supports optimization in the so-called *Simplified mode* (see [Configuration File Description](configs/README.md)) which is essentially a local implementation of the POT Python API aimed at quantizing Computer Vision with simple pre-processing and inference flow. However using this mode can lead to an inaccurate model after optimization due to the difference in the model preprocessing.
+### Usage options
+
+![](docs/images/use_cases.png) 
+
+The POT provides three basic usage options:
+* **Command-line interface (CLI)**:
+  * [**Simplified mode**](@ref pot_docs_simplified_mode):  use this option if the model belongs to the Computer Vision domain and you do have an unannotated dataset for optimization. Note that this optimization method can cause a deviation of model accuracy.
+  * [**Model Zoo flow**](@ref pot_compression_cli_README): this option is recommended if the model is imported from OpenVINO&trade; 
+[Model Zoo](https://github.com/openvinotoolkit/open_model_zoo) or there is a valid [Accuracy Checker Tool](@ref omz_tools_accuracy_checker_README)
+configuration file for the model that allows validating model accuracy using [Accuracy Checker Tool](@ref omz_tools_accuracy_checker_README).
+* [**Python\* API**](@ref pot_compression_api_README): this option allows integrating the optimization methods implemented in POT into
+a Python* inference script that uses [OpenVINO Python* API](https://docs.openvino.ai/latest/openvino_inference_engine_ie_bridges_python_docs_api_overview.html). 
+
+
+POT is also integrated into [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench), a web-based graphical environment 
+that enables you to optimize, tune, analyze, visualize, and compare performance of deep learning models. 
+
+### Examples
+
+OpenVINO provides several examples to demonstrate the POT optimization workflow:
+
+* Command-line example:
+  * [Quantization of Image Classification model](https://docs.openvino.ai/latest/pot_configs_examples_README.html) 
+* API tutorials:
+  * [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
+  * [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
+  * [Quantization of Segmentation model for medical data](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize)
+  * [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
+* API examples:
+  * [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
+  * [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
+  * [Quantization of Object Detection model with controable accuracy](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection)
+  * [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)

-To get started with POT, follow the [Installation Guide](docs/InstallationGuide.md).

 ## See Also

--- a/tools/pot/configs/README.md
+++ b/tools/pot/configs/README.md
@ -1,9 +1,4 @@
 # Configuration File Description {#pot_configs_README}
-
-In the instructions below, the Post-training Optimization Tool directory `<INSTALL_DIR>/deployment_tools/tools/post_training_optimization_toolkit` is referred to as `<POT_DIR>`. `<INSTALL_DIR>` is the directory where Intel&reg; Distribution of OpenVINO&trade; toolkit is installed.
-> **NOTE**: Installation directory is different in the case of PyPI installation and does not contain examples of 
-> configuration files.   
-
 The tool is designed to work with the configuration file where all the parameters required for the optimization are specified. These parameters are organized as a dictionary and stored in
 a JSON file. JSON file allows using comments that are supported by the `jstyleson` Python* package.
 Logically all parameters are divided into three groups:
@ -34,21 +29,18 @@ This section contains only three parameters:
        "config": "./configs/examples/accuracy_checker/mobilenet_v2.yaml"
    }
 ```
-The main parameter is `"type"` which can take two possible options: `"accuracy_checher"` (default) and `"simplified"`,
-which specify the engine that is used for model inference and validation (if supported):
- **Simplified mode** engine. This engine can be used only with `DefaultQuantization` algorithm to get fully quantized model 
-using a subset of images. It does not use the Accuracy Checker tool and annotation. To measure accuracy, you should implement 
-your own validation pipeline with OpenVINO API.  
-  - To run the simplified mode, define engine section similar to the example `mobilenetV2_tf_int8_simple_mode.json` file from the `<POT_DIR>/configs/examples/quantization/classification/` directory.
- **Accuracy Checker** engine. It relies on the [Deep Learning Accuracy Validation Framework](@ref omz_tools_accuracy_checker) (Accuracy Checker) when inferencing DL models and working with datasets.
-The benefit of this mode is you can compute accuracy in case you have annotations. It is possible to use accuracy aware
-algorithms family when this mode is selected.
-There are two options to define engine parameters in that mode:
+The main parameter is `"type"` which can take two possible options: `"accuracy_checher"` (default) or `"simplified"`. It specifies the engine used for model inference and validation (if supported):
+- **Simplified mode** engines. These engines can be used only with `DefaultQuantization` algorithm to get a fully quantized model. They do not use the Accuracy Checker tool and annotation. In the case, of this mode the following parameters are applicable:
+  - `"data_source"` Specifies the path to the directory where to calibration data is stored.
+  - `"layout"` - (Optional) Layout of input data. Supported values: [`"NCHW"`, `"NHWC"`, `"CHW"`, `"CWH"`].
+- **Accuracy Checker** engine. It relies on the [Deep Learning Accuracy Validation Framework](@ref omz_tools_accuracy_checker_README) (Accuracy Checker) when inferencing DL models and working with datasets.
+The benefit of this mode is you can compute accuracy in case you have annotations.  When this mode is selected, you can use the accuracy aware algorithms family.
+There are two options to define engine parameters in this mode:
  - Refer to the existing Accuracy Checker configuration file which is represented by the YAML file. It can be a file used for full-precision model validation. In this case, you should define only the `"config"` parameter containing a path to the AccuracyChecker configuration file.
  - Define all the [required Accuracy Checker parameters](@ref omz_tools_accuracy_checker_dlsdk_launcher)
    directly in the JSON file. In this case, POT just passes the corresponding dictionary of parameters to the Accuracy Checker when instantiating it.
    For more details, refer to the corresponding Accuracy Checker information and examples of configuration files provided with the tool:
-    - For the SSD-MobileNet model:<br>`<POT_DIR>/configs/examples/quantization/object_detection/ssd_mobilenetv1_int8.json`
+    - 8-bit quantization of [SSD-MobileNet model](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/examples/quantization/object_detection/ssd_mobilenetv1_int8.json)

 ## Compression Parameters

@ -57,8 +49,11 @@ This section defines optimization algorithms and their parameters. For more deta

 ## Examples of the Configuration File

-For a quick start, many examples of configuration files are provided and placed to the `<POT_DIR>/configs/examples`
- folder. There you can find ready-to-use configurations for the models from various domains: Computer Vision (Image 
+For a quick start, many examples of configuration files are provided [here](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/examples). There you can find ready-to-use configurations for the models from various domains: Computer Vision (Image 
 Classification, Object Detection, Segmentation), Natural Language Processing, Recommendation Systems. We basically 
 put configuration files for the models which require non-default configuration settings in order to get accurate results.
-For details on how to run the Post-Training Optimization Tool with a sample configuration file, see the [instructions](@ref pot_configs_examples_README).
+For details on how to run the Post-Training Optimization Tool with a sample configuration file, see the [example](@ref pot_configs_examples_README).
+
+## See Also
+* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
+* [POT API](@ref pot_compression_api_README)
--- a/tools/pot/docs/BestPractices.md
+++ b/tools/pot/docs/BestPractices.md
@ -10,7 +10,7 @@ we suggest reading the following [POT documentation](../README.md).
 > floating-point model is a prerequisite for model optimization. 
 > It is also worth mentioning that in the case of 8-bit quantization it is recommended to run POT on the same CPU
 > architecture when optimizing for CPU or VNNI-based CPU when quantizing for a non-CPU device, such as GPU, VPU, or GNA.
-> It should help to avoid the impact of the saturation issue that occurs on AVX and SSE based CPU devices. 
+> It should help to avoid the impact of the [saturation issue](@ref pot_saturation_issue) that occurs on AVX and SSE based CPU devices. 

 ## Get Started with Post-Training Quantization

--- a/tools/pot/docs/CLI.md
+++ b/tools/pot/docs/CLI.md
@ -1,4 +1,4 @@
-# Use Post-Training Optimization Tool Command-Line Interface {#pot_compression_cli_README}
+# Use Post-Training Optimization Tool Command-Line Interface (Model Zoo flow){#pot_compression_cli_README}

@sphinxdirective

@ -6,6 +6,7 @@
   :maxdepth: 1
   :hidden:
   
+   Simplified mode <pot_docs_simplified_mode>
   End-to-end CLI example <pot_configs_examples_README>

@endsphinxdirective
@ -16,7 +17,7 @@ its models then you can employ POT CLI to optimize your model.
 In other cases, you should consider using POT [API](@ref pot_compression_api_README). To start with POT CLI please refer to the
 following [example](@ref pot_configs_examples_README).

-Note: There is also the so-called [**Simplified mode**](@ref pot_configs_README) that is basically aimed at INT8 quantization if the model is from the Computer Vision domain and has a simple dataset preprocessing, like image resize and crop. In this case, you can also use POT CLI for 
+Note: There is also the so-called [**Simplified mode**](@ref pot_docs_simplified_mode) that is basically aimed at INT8 quantization if the model is from the Computer Vision domain and has a simple dataset preprocessing, like image resize and crop. In this case, you can also use POT CLI for 
 optimization. However, the accuracy results are not guaranteed in this case. Moreover, you are also limited in the 
 optimization methods choice since the accuracy measurement is not available.
 
@ -29,12 +30,6 @@ optimization methods choice since the accuracy measurement is not available.
 3. Prepare the Accuracy Checker configuration file and make sure that the model can be successfully inferred and achieves 
 similar accuracy numbers as the reference model from the original framework. 
 4. Activate the Python environment in the command-line shell where the POT and the Accuracy Checker were installed.
-5. (Optional). Set up the OpenVINO&trade; environment in the command-line shell with the following script if you 
-installed it from form the distribution file:
-   ```sh
-   source <INSTALL_DIR>/bin/setupvars.sh
-   ```
-   > **NOTE**: This step is not required if you use PyPI distribution.

 ## Run POT CLI 
 There are two ways how to run POT via command line:
@ -68,7 +63,9 @@ The following command-line options are available to run the tool:
 | `--preset`                                        | Use `performance` for fully symmetric quantization or `mixed` preset for symmetric quantization of weight and asymmetric quantization of activations. Applicable only when `-q` option is used.|
 | `-m`, `--model`                                   | Path to the optimizing model file (.xml). Applicable only when `-q` option is used. |
 | `-w`, `--weights`                                 | Path to the weights file of the optimizing model (.bin). Applicable only when `-q` option is used. |
-| `-n`, `--name`                                    | Model name. Applicable only when `-q` option is used. |
+| `-n`, `--name`                                    | Optional. Model name. Applicable only when `-q` option is used. |
+| `--engine {accuracy_checker, simplified}`         | Engine type used to specify CLI mode. Default: `accuracy_checker`. |
+| `--data-source DATA_DIR`                          | Optional. Valid and required for Simplified mode only. Specifies the path to calibration data. |
 | `--ac-config`                                     | Path to the Accuracy Checker configuration file. Applicable only when `-q` option is used. |
 | `--max-drop`                                      | Optional. Maximum accuracy drop. Valid only for accuracy-aware quantization. Applicable only when `-q` option is used and `accuracy_aware` method is selected. |
 | `-c CONFIG`, `--config CONFIG`                    | Path to a config file with task- or model-specific parameters.         |
@ -83,6 +80,5 @@ The following command-line options are available to run the tool:


 ## See Also
-
-* [Installation Guide](@ref pot_InstallationGuide)
+* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
 * [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)
--- a/tools/pot/docs/ModelRepresentation.md
+++ b/tools/pot/docs/ModelRepresentation.md
@ -1,29 +1,22 @@
-# Representation of Low-Precision Models
+# Low-precision model representation {#pot_docs_model_representation}
+
+## Introduction
 The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime. 
-Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
+Currently, there are two groups of optimization methods that can change the IR after applying them to the full-precision model:
 - **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
 - **Quantization**. The rest of this document is dedicated to the representation of quantized models.

 ## Representation of quantized models
-The OpenVINO Toolkit represents all the quantized models using the so-called [FakeQuantize](@ref openvino_docs_ops_quantization_FakeQuantize_1) operation. This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
-In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize between operation and data blobs.  The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for weights and one for activations (bias is quantized using the same parameters).
+The OpenVINO Toolkit represents all the quantized models using the so-called [FakeQuantize](https://docs.openvino.ai/latest/openvino_docs_MO_DG_prepare_model_convert_model_Legacy_IR_Layers_Catalog_Spec.html#fakequantize-layer) operation. This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization/dequantization process which happens at runtime. The figure below shows a part of the DL model, namely the Convolutional layer, that undergoes various transformations on way from being a floating-point model to an integer model executed in the OpenVINO runtime. Column 2 of this figure below shows a model quantized with [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf).
+![](images/model_flow.png) 

-![](./images/quantized_convolution.png)  
-<div align="center">Figure 1. Example of quantized Convolution operation.</div><br/>
+To reduce memory footprint weights of quantized models are transformed to a target data type, e.g. in the case of 8-bit quantization, this is int8. During this transformation, the floating-point weights tensor and one of the FakeQuantize operations that correspond to it are replaced with 8-bit weight tensor and the sequence of Convert, Subtract, Multiply operations that represent the typecast and dequantization parameters (scale and zero-point) as it is shown in column 3 of the figure.

-Starting from OpenVINO 2020.2 release all the quantized models are represented in the compressed form. It means that the weights of low-precision operations are converted into the target precision (e.g. INT8). It helps to substantially reduce the model size. The rest of the parameters can be represented by FLOAT32 or FLOAT16 precision depending on the input full-precision model used in the quantization process. Fig. 2 below shows an example of the part of the compressed IR.
+## Interpreting FakeQuantize at runtime
+At inference time, the quantized model undergoes the second set of transformations that allows interpreting floating-point operations with quantization rules as integer operations. OpenVINO Deep Learning Deployment Toolkit has a special component which is called Low-Precision Transformations (LPT) for that purpose.
+At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize** (column 4). The former is aimed to transform the input data into the target precision while the latter transforms the resulting values back to the original range. *Dequantize* operations can be propagated forward through the linear layers, such as *Convolution* or *Fully-Connected*, and in some cases fused with the following *Quantize* operation for the next layer into the so-called *Requantize* operation (column 5).

-![](./images/quantized_model_example.png) 
-<div align="center">Figure 2. Example of compressed quantized model.</div>  
-
-### Interpreting FakeQuantize at runtime
-One important question that arises at inference time is how to correctly interpret quantized models and specifically FakeQuantize operations. OpenVINO Deep Learning Deployment Toolkit has a special component which is called Low-Precision Transformations (LPT). It is responsible for the translation of "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](https://docs.openvino.ai/latest/_docs_IE_DG_Int8Inference.html). Here we provide only a high-level overview of the interpretation rules of FakeQuantize operation. 
-At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**. The former one is aimed to transform the input data into the target precision while the latter transforms the resulting values back to the original range and precision. In practice *Dequantize* operations can be propagated forward through the linear low-precision layers, such as *Convolution* or *Fully-Connected*, and in some cases fused with the following *Quantize* operation for the next layer into the so-called *Requantize* operation (see Fig. 3).
-
-![](./images/qdq_propagation.png)
-<div align="center">Figure 3. Quantization operations propagation at runtime. Q, DQ, RQ stand for Quantize, Dequantize, and Requantize correspondingly.</div><br/>
-
-From the calculation standpoint, the FakeQuantize formula also is split into two parts accordingly:  
+From the computation standpoint, the FakeQuantize formula also is split into two parts accordingly:  
 `output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low`  
 The first part of this fomula represents *Quantize* operation:  
 `q = round((x - input_low) / (input_high - input_low) * (levels-1))`  
--- a/tools/pot/docs/SaturationIssue.md
+++ b/tools/pot/docs/SaturationIssue.md
@ -0,0 +1,38 @@
+# Saturation (overflow) issue workaround {#pot_saturation_issue}
+
+## Introduction
+8-bit instructions of previous generations of Intel&reg; CPUs, namely that based on SSE, AVX-2, AVX-512 instruction sets, admit so-called saturation (overflow) of the intermediate buffer when calculating the dot product which is an essential part of Convolutional or MatMul operations. This saturation can lead to an accuracy drop on the aforementioned architectures during the inference of 8-bit quantized models. However, it is not possible to predict such degradation since most of the computations are executed in parallel during DL model inference which makes this process non-deterministic. This problem is typical for models with non-ReLU activation functions and a low level of redundancy, e.g. optimized or efficient models. It can prevent deploying the model on legacy HW or creating cross-platform applications. The problem does not occur on the CPUs with Intel Deep Learning Boost (VNNI) technology and further generations as well as GPUs.
+
+## How to detect
+The only way to detect saturation issue is to run inference on the CPU that admits it and on the HW that does not have such a problem (e.g. VNNI-based CPU). If the accuracy difference is significant (e.g. more than 1%) this is the main indicator of the saturation issue impact.
+
+## Workaround
+There is a workaround that helps fully address the saturation issue during the inference. The idea is to use only 7 bits to represent weights (of Convolutional or Fully-Connected layers) while quantizing activations using the full range of 8-bit data types. However, such a trick can lead to accuracy degradation itself due to the reduced representation of weights. On the other hand, using this trick for the first layer can help to mitigate the saturation issue for many models.
+
+POT tool provides three options to deal with the saturation issue which can be enabled in POT configuration file using the "saturation_fix" parameter:
+
+* (Default) Fix saturation issue for the first layer: "first_layer" option
+* Apply for all layers in the model: "all" option
+* Not apply saturation fix at all: "no" option
+
+Below is an example of the section in POT configuration file with the `saturation_fix` option:
+```json
+"algorithms": [
+    {
+        "name": "DefaultQuantization",
+        "params": {
+            "preset": "performance",
+            "stat_subset_size": 300,
+            "saturation_fix": "all" // Apply the saturation fix to all the layers
+        }
+    }
+]
+```
+## Recommendations
+If you observe the saturation issue we recommend trying the option "all" during the model quantization. If it does not help to improve the accuracy we recommend using [Quantization-aware training from NNCF](https://github.com/openvinotoolkit/nncf) and fine-tune the model.
+
+If you are not planning to use legacy CPU HW you can use the option "no" which can also lead to slightly better accuracy.
+
+## See Also
+* [Lower Numerical Precision Deep Learning Inference and Training blogpost](https://www.intel.com/content/www/us/en/developer/articles/technical/lower-numerical-precision-deep-learning-inference-and-training.html)
+* [Configuration file desciption](@ref pot_configs_README)
--- a/tools/pot/docs/SimplifiedMode.md
+++ b/tools/pot/docs/SimplifiedMode.md
@ -0,0 +1,31 @@
+# Optimization with Simplified mode {#pot_docs_simplified_mode}
+
+## Introduction
+Simplified mode is designed to simplify data preparation for the model optimization process. The mode is represented by an implementation of Engine interface from the POT API that allows reading data from an arbitrary folder specified by the user. For more details about POT API please refer to the corresponding [description](pot_compression_api_README). Currently, Simplified mode is available only for image data stored in a single folder in PNG or JPEG formats.
+
+Note: This mode cannot be used with accuracy-aware methods, i.e. there is no way to control accuracy after optimization. Nevertheless, this mode can be helpful to estimate performance benefits when using model optimizations.
+
+## Usage
+To use Simplified mode you should prepare data and place them in a separate folder. No other files should be presented in this folder. There are two options to run POT in the Simplified mode:
+* Using command-line options only. Here is an example for 8-bit quantization:
+  
+  `pot -q default -m <path_to_xml> -w <path_to_bin> --engine simplified --data-source <path_to_data>`
+* To provide more options you can use the corresponding `"engine"` section in the POT configuration file as follows:
+    ```json
+    "engine": {
+        "type": "simplified",
+        "layout": "NCHW",               // Layout of input data. Supported ["NCHW",
+                                        // "NHWC", "CHW", "CWH"] layout
+        "data_source": "PATH_TO_SOURCE" // You can specify path to directory with images 
+                                        // Also you can specify template for file names to filter images to load.
+                                        // Templates are unix style (This option valid only in simplified mode)
+    }
+    ```
+
+
+A template of configuration file for 8-bit quantization using Simplified mode can be found [here](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/simplified_mode_template.json).
+
+For more details about how to use POT via CLI please refer to this [document](@ref pot_compression_cli_README).
+
+## See Also
+ * [Configuration File Description](@ref pot_configs_README)
--- a/tools/pot/docs/images/api.png
+++ b/tools/pot/docs/images/api.png
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2a5bd3b61d61b7eecb51fa0e932bc8215659d8f5b92f96abba927d9d3f94f277
+size 38993
--- a/tools/pot/docs/images/model_flow.png
+++ b/tools/pot/docs/images/model_flow.png
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5183c57dc825af40051782818d9bf40236bd6be8fbee3ae4e7a982000e4d6af8
+size 89875
--- a/tools/pot/docs/images/use_cases.png
+++ b/tools/pot/docs/images/use_cases.png
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:d5650775fe986b294278186c12b91fadbb758e06783f500b9fd399e474eafe2c
+size 34217
--- a/tools/pot/docs/images/workflow_simple.png
+++ b/tools/pot/docs/images/workflow_simple.png
@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:791f253493350d04c62e53f40b086fb73ceb1b96d346c9772e82de9892fee7a4
+size 33789
--- a/tools/pot/docs/pot_docs.xml
+++ b/tools/pot/docs/pot_docs.xml
@ -6,15 +6,13 @@
            <tab type="user" title="Low Precision Optimization Guide" url="@ref pot_docs_LowPrecisionOptimizationGuide"/>
            <tab type="usergroup" title="Quantization" url="@ref pot_compression_algorithms_quantization_README">
                <tab type="user" title="DefaultQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_default_README"/>
-                <tab type="user" title="AccuracyAwareQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_accuracy_aware_README"/>
-                <tab type="usergroup" title="TunableQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_tunable_quantization_README">
-                    <tab type="usergroup" title="Tree-Structured Parzen Estimator (TPE)" url="@ref pot_compression_optimization_tpe_README">
-                        <tab type="user" title="TPE Multiple Node Configuration Based on MongoDB Database" url="@ref pot_compression_optimization_tpe_multinode"/>
-                    </tab>
-                </tab>    
+                <tab type="user" title="AccuracyAwareQuantization Algorithm" url="@ref pot_compression_algorithms_quantization_accuracy_aware_README"/> 
+                <tab type="user" title="Saturation issue workaround" url="@ref pot_saturation_issue"/>
+                <tab type="user" title="Low-precision model representation" url="@ref pot_docs_model_representation"/>
            </tab>
            <tab type="user" title="Best Practices" url="@ref pot_docs_BestPractices"/>
            <tab type="user" title="Command-line Interface" url="@ref pot_compression_cli_README">
+                <tab type="user" title="Simplified mode" url="@ref pot_docs_simplified_mode"/>
                <tab type="user" title="End-to-end CLI example" url="@ref pot_configs_examples_README"/>
            </tab>
            <tab type="user" title="API" url="@ref pot_compression_api_README">
@ -28,6 +26,7 @@
                </tab>
            </tab>
            <tab type="user" title="Configuration File Description" url="@ref pot_configs_README"/>
+            <tab type="user" title="Deep neural network protection through range supervision" url="@ref pot_ranger_README"/>
            <tab type="user" title="Frequently Asked Questions" url="@ref pot_docs_FrequentlyAskedQuestions"/>
        </tab>
    </navindex>
--- a/tools/pot/openvino/tools/pot/algorithms/quantization/README.md
+++ b/tools/pot/openvino/tools/pot/algorithms/quantization/README.md
@ -8,152 +8,38 @@
   
   DefaultQuantization Algorithm <pot_compression_algorithms_quantization_default_README>
   AccuracyAwareQuantization Algorithm <pot_compression_algorithms_quantization_accuracy_aware_README>
-   TunableQuantization Algorithm <pot_compression_algorithms_quantization_tunable_quantization_README>
+   TunableQuantization algorithm <pot_compression_algorithms_quantization_tunable_quantization_README>
+   Saturation issue workaround <pot_saturation_issue>
+   Low-precision model representation <pot_docs_model_representation>
+

@endsphinxdirective

-The primary optimization feature of the Post-training Optimization Tool (POT) is uniform quantization. In general,
-this method supports an arbitrary number of bits, greater or equal to two, which represents weights and activations.
-During the quantization process, the method inserts [FakeQuantize](@ref openvino_docs_ops_quantization_FakeQuantize_1)
-operations into the model graph automatically based on a predefined hardware target in order to produce the most
-hardware-friendly optimized model:
-![](../../../../../docs/images/convolution_quantization.png)
+## Introduction

-After that, different quantization algorithms can tune the `FakeQuantize` parameters or remove some of them in order to
-meet the accuracy criteria. The resulting *fakequantized* models are interpreted and transformed to real low-precision
-models during inference at the OpenVINO™ Inference Engine runtime giving real performance improvement.
+The primary optimization feature of the Post-training Optimization Tool (POT) is 8-bit uniform quantization which allows substantially increasing inference performance on all the platforms that have 8-bit instructions, for example, modern generations of CPU and GPU. Another benefit of quantization is a significant reduction of model footprint which in most cases achieves 4x. 
+
+During the quantization process, the POT tool runs inference of the optimizing model to estimate quantization parameters for input activations of the quantizable operation. It means that a calibration dataset is required to perform quantization. This dataset may have or not have annotation depending on the quantization algorithm that is used.

 ## Quantization Algorithms

-Currently, the POT provides two algorithms for 8-bit quantization, which are verified and provide stable results on a
+Currently, the POT provides two algorithms for 8-bit quantization, which are verified and guarantee stable results on a
 wide range of DNN models:
-*  **DefaultQuantization** is a default method that provides fast and in most cases accurate results for 8-bit
-   quantization. For details, see the [DefaultQuantization Algorithm](@ref pot_compression_algorithms_quantization_default_README) documentation.
+*  [**DefaultQuantization**](@ref pot_compression_algorithms_quantization_default_README) is a default method that provides fast and in most cases accurate results for 8-bit
+   quantization. It requires only a non-annotated dataset for quantization. For details, see the [DefaultQuantization Algorithm](@ref pot_compression_algorithms_quantization_default_README) documentation.

-*  **AccuracyAwareQuantization** enables remaining at a predefined range of accuracy drop after quantization at the cost
-   of performance improvement. It may require more time for quantization. For details, see the
+*  [**AccuracyAwareQuantization**](@ref pot_compression_algorithms_quantization_accuracy_aware_README) enables remaining at a predefined range of accuracy drop after quantization at the cost
+   of performance improvement. The method requires annotated representative dataset and may require more time for quantization. For details, see the
   [AccuracyAwareQuantization Algorithm](@ref pot_compression_algorithms_quantization_accuracy_aware_README) documentation.

-## Quantization Formula
+For more details about the representation of the low-precision model please refer to this [document](@ref pot_docs_model_representation).

-Quantization is parametrized by clamping the range and the number of quantization levels:
-
-\f[  
-output = \frac{\left\lfloor (clamp(input; input\_low, input\_high)-input\_low)  *s\right \rceil}{s} + input\_low\\  
-\f]
-
-\f[
-clamp(input; input\_low, input\_high) = min(max(input, input\_low), input\_high)))
-\f]
-
-\f[
-s=\frac{levels-1}{input\_high - input\_low}
-\f]
-
-In the formulas:
-* `input_low` and `input_high` represent the quantization range 
-* \f[\left\lfloor\cdot\right \rceil\f] denotes rounding to the nearest integer
-
-The POT supports symmetric and asymmetric quantization of weights and activations, which are controlled by the `preset`.
-The main difference between them is that in the symmetric mode the floating-point zero is mapped directly to the integer
-zero, while in asymmetric the mode it can be an arbitrary integer number. In any mode, the floating-point zero is mapped
-directly to the quant without rounding an error. See this [tutorial](@ref pot_docs_BestPractices) for details.
-
-Below is the detailed description of quantization formulas for both modes. These formulas are used both in the POT to
-quantize weights of the model and in the OpenVINO™ Inference Engine runtime when quantizing activations during the
-inference.
-
-####  Symmetric Quantization
-
-The formula is parametrized by the `scale` parameter that is tuned during the quantization process:
-
-\f[
-input\_low=scale*\frac{level\_low}{level\_high}
-\f]
-
-\f[
-input\_high=scale
-\f]
+## See also
+* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
+* [Use POT Command-line for Model Zoo models](@ref pot_compression_cli_README)
+* [POT API](@ref pot_compression_api_README)
+* [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)


-Where `level_low` and `level_high` represent the range of the discrete signal.
-* For weights:
-
-\f[
-level\_low=-2^{bits-1}+1
-\f]
-
-\f[
-level\_high=2^{bits-1}-1
-\f]
-
-\f[
-levels=255
-\f]
-
-* For unsigned activations:
-
-\f[
-level\_low=0
-\f]
-
-\f[
-level\_high=2^{bits}-1
-\f]
-
-\f[
-levels=256
-\f]
-
-* For signed activations:
-
-\f[
-level\_low=-2^{bits-1}
-\f]
-
-\f[
-level\_high=2^{bits-1}-1
-\f]


-\f[
-levels=256
-\f]
-
-####  Asymmetric Quantization
-
-The quantization formula is parametrized by `input_low` and `input_range` that are tunable parameters:
-
-\f[
-input\_high=input\_low + input\_range
-\f]
-
-\f[
-levels=256
-\f]
-
-For weights and activations the following quantization mode is applied:
-
-\f[
-{input\_low}' = min(input\_low, 0)
-\f]
-
-\f[
-{input\_high}' = max(input\_high, 0)
-\f]
-
-\f[
-ZP= \left\lfloor \frac{-{input\_low}'*(levels-1)}{{input\_high}'-{input\_low}'} \right \rceil 
-\f]
-
-\f[
-{input\_high}''=\frac{ZP-levels+1}{ZP}*{input\_low}'
-\f]
-
-\f[
-{input\_low}''=\frac{ZP}{ZP-levels+1}*{input\_high}'
-\f]
-
-\f[
-{input\_low,input\_high} = \begin{cases} {input\_low}',{input\_high}', & ZP \in $\{0,levels-1\}$ \\ {input\_low}',{input\_high}'', & {input\_high}'' - {input\_low}' > {input\_high}' - {input\_low}'' \\ {input\_low}'',{input\_high}', & {input\_high}'' - {input\_low}' <= {input\_high}' - {input\_low}''\\ \end{cases}
-\f]
--- a/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md
+++ b/tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md
@ -1,9 +1,9 @@
 # AccuracyAwareQuantization Algorithm {#pot_compression_algorithms_quantization_accuracy_aware_README}

 ## Overview
-AccuracyAware algorithm is designed to perform accurate 8-bit quantization and allows the model to stay in the 
+AccuracyAware algorithm is designed to perform accurate quantization and allows the model to stay in the 
 pre-defined range of accuracy drop, for example 1%, defined by the user in the configuration file. This may cause a 
-degradation in performance in comparison to [DefaultQuantization](../default/README.md) algorithm because some layers can be reverted back to the original precision.
+degradation in performance in comparison to [DefaultQuantization](../default/README.md) algorithm because some layers can be reverted back to the original precision. The algorithm requires annotated dataset and cannot be used with the [Simplified mode](@ref pot_docs_simplified_mode).

 > **NOTE**: In case of GNA `target_device`, POT moves INT8 weights to INT16 to stay in the pre-defined range of the accuracy drop. Thus, the algorithm works for the `performance` (INT8) preset only. For the `accuracy` preset, this algorithm is not performed, but the parameters tuning is available (if `tune_hyperparams` option is enabled).

@ -55,27 +55,18 @@ Default value is `0.5`.
 to the floating-point precision. It can bring additional performance and accuracy boost but increase overall 
 quantization time. Default value is `False`.

- Below is a fragment of the configuration file that shows overall structure of parameters for this algorithm.
+## Examples
+
+A template and full specification for AccuracyAwareQuantization algorithm can be found:
+ * [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/accuracy_aware_quantization_template.json)
+ * [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/accuracy_aware_quantization_spec.json)
+
+Example of using POT API with Accuracy-aware algorithm:
+ * [Quantization of Object Detection model with control of accuracy](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/object_detection)
+
+ ## See also
+* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
+* [Use POT Command-line for Model Zoo models](@ref pot_compression_cli_README)
+* [POT API](@ref pot_compression_api_README)
+* [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)

-```
-"name": "AccuracyAwareQuantization", // compression algorithm name
-    "params": {
-        "ranking_subset_size": 300, // A size of a subset which is used to rank layers by their contribution to the accuracy drop
-        "max_iter_num": 30,    // Maximum number of iterations of the algorithm (maximum of layers that may be reverted back to full-precision)
-        "maximal_drop": 0.005,      // Maximum accuracy drop which has to be achieved after the quantization
-        "drop_type": "absolute",    // Drop type of the accuracy metric: relative or absolute (default)
-        "use_prev_if_drop_increase": false,      // Whether to use NN snapshot from the previous algorithm iteration in case if drop increases
-        "base_algorithm": "DefaultQuantization", // Base algorithm that is used to quantize model at the beginning
-        "convert_to_mixed_preset": false,  // Whether to convert the model to mixed mode if the accuracy criteria 
-                                           // of the symmetrically quantized model are not satisfied
-        "metrics": [                    // An optional list of metrics that are taken into account during optimization
-                                        // If not specified, all metrics defined in engine config are used
-            {
-                "name": "accuracy",     // Metric name to optimize
-                "baseline_value": 0.72  // Baseline metric value of the original model
-            }
-        ],
-        "metric_subset_ratio": 0.5  // A part of the validation set that is used to compare element-wise full-precision and 
-                                    // quantized models in case of predefined metric values of the original model
-    }
-```
--- a/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
+++ b/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
@ -112,81 +112,30 @@ mode on the existing HW.
            - `"outlier_prob"` - outlier probability used in the "quantile" estimator
 - `"use_layerwise_tuning"` - enables layer-wise fine-tuning of model parameters (biases, Convolution/MatMul weights and FakeQuantize scales) by minimizing the mean squared error between original and quantized layer outputs.
 Enabling this option may increase compressed model accuracy, but will result in increased execution time and memory consumption.
-            
- Below is a fragment of the configuration file that shows overall structure of parameters for this algorithm.

-```
-"compression": {
-    "model_type": "None",   //  An optional parameter, needed for additional patterns in the model, 
-                                default value is None (supported only "Transformer" now)
-    "inplace_statistic": true, // An optional parameter, needed for change method collect statistics,
-                                    reduces the amount of memory consumed, but increases the calibration time
-    "algorithms": [
-        "name": "DefaultQuantization", // optimization algorithm name
-        "params": {
-                /* Preset is a collection of optimization algorithm parameters that will specify to the algorithm
-                to improve which metric the algorithm needs to concentrate. Each optimization algorithm supports
-                [performance, mixed, accuracy] presets which control the quantization mode (symmetric, mixed(weights symmetric and activations asymmetric), and fully asymmetric respectively)*/
-                "preset": "mixed",
-                "stat_subset_size": 300, // Size of subset to calculate activations statistics that can be used
-                                         // For quantization parameters calculation.
-                "ignored": {
-                    "scope": [
-                        "<NODE_NAME>" // List of nodes that are excluded from optimization
-                    ],
-                    "operations": [ // List of types that are excluded from optimization
-                        {
-                            "type": "<NODE_TYPE>", // Type of ignored operation
-                            "attributes": { // If attributes are defined they will be considered during the ignorance
-                                "<NAME>": "<VALUE>" // Lists of values to filter by
-                            }
-                        }
-                    ]
-                },
-                /* Manually specified quantization parameters */
-                /* Quantization parameters for weights */
-                "weights": {  // Weights quantization parameters used by MinMaxAlgorithm
-                    "bits": 8, // Bit-width, default is 8
-                    "mode": "symmetric", // Quantization mode, default is "symmetric"
-                    "level_low": 0,      // Minimum level in the integer range in which we quantize to, default is 0 for unsigned range, -2^(bit-1) - for signed
-                    "level_high": 255,   // Maximum level in the integer range in which we quantize to, default is 2^bits-1 for unsigned range, 2^(bit-1)-1 - for signed
-                    "granularity": "perchannel", // Quantization scale granularity: ["pertensor" (default), "perchannel"]
-                    "range_estimator": {         // Range estimator that is used to get the quantization ranges and filter outliers based on the statistics
-                        "max": {                 // Parameters to estimate top quantization border
-                            "type": "quantile",    // Estimator type: ["max" (default), "quantile"]
-                            "outlier_prob": 0.0001 // Outlier probability used in the "quantile" estimator
-                        },
-                        "min": {                   // Parameters to estimate bottom quantization border (used only in asymmetric mode)
-                            "type": "quantile",    // Estimator type: ["max" (default), "quantile"]
-                            "outlier_prob": 0.0001 // Outlier probability used in the "quantile" estimator
-                        }
-        
-                    }
-                },
-                /* Quantization parameters for activations */
-                "activations": {
-                    "bits": 8, // Number of quantization bits
-                    "mode": "symmetric", // Quantization mode
-                    "granularity": "pertensor", // Granularity: one scale for output tensor
-                    "range_estimator": {           // Range estimator that is used to get the quantization ranges and filter outliers based on the statistics
-                        "preset": "quantile",
-                        /* OR */
-                        /* minimum of quantization range */
-                        /* maximum of quantization range */
-                        "max": {                   // Parameters to estimate top quantization border
-                            "aggregator": "mean",  // Batch aggregation type: ["mean" (default), "max", "min", "median", "mean_no_outliers", "median_no_outliers", "hl_estimator"]
-                            "type": "quantile",    // Estimator type: ["max" (default), "quantile"]
-                            "outlier_prob": 0.0001 // Outlier probability used in the "quantile" estimator
-                        },
-                        "min": {                   // Parameters to estimate top quantization border
-                            "aggregator": "mean",  // Batch aggregation type: ["mean" (default), "max", "min", "median", "mean_no_outliers", "median_no_outliers", "hl_estimator"]
-                            "type": "quantile",    // Estimator type [min, max, abs_max, quantile, abs_quantile]
-                            "outlier_prob": 0.0001 // Outlier probability used in the "quantile" estimator
-                        }
-                    }
-                }
-                "use_layerwise_tuning": false // An optional parameter, enables layer-wise fine-tuning, false by default
-            }
-        ]
-    }
-```
+## Examples
+            
+ A template and full specification for DefaultQuantization algorithm can be found:
+ * [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_template.json)
+ * [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json)
+
+Command-line example:
+* [Quantization of Image Classification model](https://docs.openvino.ai/latest/pot_configs_examples_README.html) 
+
+API tutorials:
+* [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
+* [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
+* [Quantization of Segmentation model for mediacal data](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize)
+* [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
+
+API examples:
+* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
+* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
+* [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
+
+## See also
+* [Optimization with Simplified mode](@ref pot_docs_simplified_mode)
+* [Use POT Command-line for Model Zoo models](@ref pot_compression_cli_README)
+* [POT API](@ref pot_compression_api_README)
+* [Post-Training Optimization Best Practices](@ref pot_docs_BestPractices)
+
--- a/tools/pot/openvino/tools/pot/algorithms/quantization/ranger/README.md
+++ b/tools/pot/openvino/tools/pot/algorithms/quantization/ranger/README.md
@ -1,4 +1,4 @@
-# Experimental: Deep neural network protection through range supervision ("Ranger")
+# Experimental: Deep neural network protection through range supervision ("Ranger") {#pot_ranger_README}

 # Overview

@ -14,7 +14,7 @@ where $`T_{low}`$ and $`T_{up}`$ are the lower and upper bounds for the particul
 The process flow follows the diagram [Fig 1](#Schematic). Starting from the internal representation (IR) of an OpenVINO model, the POT Ranger algorithm is called to **add protection layers into the model graph**. This step requires **appropriate threshold values that are automatically extracted from a specified test dataset**. The result is an IR representation of the model with additional "Ranger" layers after each supported activation layer. The original and the modified model can be called in the same way through the OpenVINO inference engine to evaluate the impact on accuracy, performance, and dependability in the presence of potential soft errors (for example using the *benchmark_app* and *accuracy_checker* functions). **The algorithm is designed to provide efficient protection at negligible performance overhead or accuracy impact in the absence of faults.** Bound extraction is a one-time effort and the protected IR model returned by the Ranger algorithm can be used independently from there on. No changes in the learned parameters of the network are needed.


-![Schematic](../../../../docs/ranger/images/scheme3.png)
+![Schematic](../../../../../../docs/ranger/images/scheme3.png)

 *Fig 1: Schematic of Ranger process flow.*

@ -22,7 +22,7 @@ The process flow follows the diagram [Fig 1](#Schematic). Starting from the inte
 The following example shows a traffic camera image and predicted objects using a Yolov3 pretrained on the Coco dataset. A single weight fault was injected in a randomly chosen convolution layer of Yolo, flipping the most significant bit of the selected network parameter. If range supervision is applied, the original network performance is recovered despite the presence of the fault.


-![](../../../../docs/ranger/images/img_combined_2.png)
+![](../../../../../../docs/ranger/images/img_combined_2.png)

 *Fig 2: Example of fault mitigation via range supervision.*

--- a/tools/pot/openvino/tools/pot/api/README.md
+++ b/tools/pot/openvino/tools/pot/api/README.md
@ -42,7 +42,7 @@ should be implemented according to the custom DL model:
 The pipeline with implemented model specific interfaces such as `Engine`, `DataLoader` and `Metric` we will call the custom 
 optimization pipeline (see the picture below that shows relationships between classes).

-![](./custom_optimization_pipeline.png)
+![](../../../../docs/images/api.png)

 ## Use Cases
 Before diving into the Python* POT API, it is highly recommended to read [Best Practices](@ref pot_docs_BestPractices) document where various 
@ -54,6 +54,17 @@ The POT Python* API for model optimization can be used in the following cases:
 accuracy in this mode.
 - You already have the Python* script to validate the accuracy of the model using the [OpenVINO&trade; Runtime](@ref openvino_docs_OV_Runtime_User_Guide).  

+## Examples
+
+* API tutorials:
+  * [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
+  * [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
+  * [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
+* API examples:
+  * [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
+  * [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
+  * [Speech example for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
+
 ## API Description

 Below is a detailed explanation of POT Python* APIs which should be implemented in order to create a custom optimization
@ -62,7 +73,7 @@ pipeline.
 ### DataLoader

 ```
-class openvino.tools.pot.api.DataLoader(config)
+class openvino.tools.pot.DataLoader(config)
 ```
 The base class for all DataLoaders.

@ -75,7 +86,7 @@ which supports integer indexing in range of 0 to `len(self)`
 ### Metric

 ```
-class openvino.tools.pot.api.Metric()
+class openvino.tools.pot.Metric()
 ```
 An abstract class representing an accuracy metric.

@ -98,7 +109,7 @@ All subclasses should override the following methods:
 ### Engine

 ```
-class openvino.tools.pot.api.Engine(config, data_loader=None, metric=None)
+class openvino.tools.pot.Engine(config, data_loader=None, metric=None)
 ```
 Base class for all Engines.

@ -112,7 +123,7 @@ The engine provides model inference, statistics collection for activations and c
 All subclasses should override the following methods:
 - `set_model(model)` - sets/resets a model.<br><br>
  *Parameters*
-  - `model` - `CompressedModel` instance for inference (see details below).
+  - `model` - `CompressedModel` instance for inference.

 - `predict(stats_layout=None, sampler=None, metric_per_sample=False, print_progress=False)` - performs model inference 
 on the specified subset of data.<br><br>
@ -157,6 +168,46 @@ on the specified subset of data.<br><br>
  }
  ```

+### Pipeline
+
+```
+class openvino.tools.pot.Pipeline(engine)
+```
+Pipeline class represents the optimization pipeline.
+
+*Parameters* 
+- `engine` - instance of `Engine` class for model inference.
+
+The pipeline can be applied to the DL model by calling `run(model)` method where `model` is the `NXModel` instance.
+
+#### Create a pipeline
+
+The POT Python* API provides the utility function to create and configure the pipeline:
+```
+openvino.tools.pot.create_pipeline(algo_config, engine)
+```
+*Parameters* 
+- `algo_config` - a list defining optimization algorithms and their parameters included in the optimization pipeline. 
+  The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. 
+
+  Example of the algorithm configuration of the pipeline:
+  ``` 
+  algo_config = [
+      {
+          'name': 'DefaultQuantization',
+          'params': {
+              'preset': 'performance',
+              'stat_subset_size': 500
+          }
+       },
+      ...
+  ]
+  ```
+- `engine` - instance of `Engine` class for model inference.
+
+*Returns*
+- instance of the `Pipeline` class.
+
 ## Helpers and Internal Model Representation
 In order to simplify implementation of optimization pipelines we provide a set of ready-to-use helpers. Here we also 
 describe internal representation of the DL model and how to work with it.
@ -164,7 +215,7 @@ describe internal representation of the DL model and how to work with it.
 ### IEEngine

 ```
-class openvino.tools.pot.engines.ie_engine.IEEngine(config, data_loader=None, metric=None)
+class openvino.tools.pot.IEEngine(config, data_loader=None, metric=None)
 ```
 IEEngine is a helper which implements Engine class based on [OpenVINO&trade; Inference Engine Python* API](ie_python_api/api.html).
 This class support inference in synchronous and asynchronous modes and can be reused as-is in the custom pipeline or 
@ -216,11 +267,11 @@ represented as an instance of this class. The cascaded model is stored as a list
 - `models` - list of models of the cascaded model.
 - `is_cascade` - returns True if the loaded model is cascaded model.
  
-#### Loading model from IR
+### Read model from OpenVINO IR

 The Python* POT API provides the utility function to load model from the OpenVINO&trade; Intermediate Representation (IR):
 ```
-openvino.tools.pot.graph.model_utils.load_model(model_config)
+openvino.tools.pot.load_model(model_config)
 ```
 *Parameters*
 - `model_config` - dictionary describing a model that includes the following attributes:
@ -263,10 +314,10 @@ openvino.tools.pot.graph.model_utils.load_model(model_config)
 *Returns*
 - `CompressedModel` instance

-#### Saving model to IR
+#### Save model to IR
 The Python* POT API provides the utility function to save model in the OpenVINO&trade; Intermediate Representation (IR):
 ```
-openvino.tools.pot.graph.model_utils.save_model(model, save_path, model_name=None, for_stat_collection=False)
+openvino.tools.pot.save_model(model, save_path, model_name=None, for_stat_collection=False)
 ```
 *Parameters*
 - `model` - `CompressedModel` instance.
@ -314,94 +365,3 @@ class openvino.tools.pot.samplers.batch_sampler.BatchSampler(data_loader, batch_
 Sampler provides an iterable over the dataset subset if `subset_indices` is specified or over the whole dataset with 
 given `batch_size`. Returns a list of data items.

-## Pipeline
-
-```
-class openvino.tools.pot.pipeline.pipeline.Pipeline(engine)
-```
-Pipeline class represents the optimization pipeline.
-
-*Parameters* 
- `engine` - instance of `Engine` class for model inference.
-
-The pipeline can be applied to the DL model by calling `run(model)` method where `model` is the `CompressedModel` instance.
-
-#### Create a pipeline
-
-The POT Python* API provides the utility function to create and configure the pipeline:
-```
-openvino.tools.pot.pipeline.initializer.create_pipeline(algo_config, engine)
-```
-*Parameters* 
- `algo_config` - a list defining optimization algorithms and their parameters included in the optimization pipeline. 
-  The order in which they are applied to the model in the optimization pipeline is determined by the order in the list. 
-
-  Example of the algorithm configuration of the pipeline:
-  ``` 
-  algo_config = [
-      {
-          'name': 'DefaultQuantization',
-          'params': {
-              'preset': 'performance',
-              'stat_subset_size': 500
-          }
-       },
-      ...
-  ]
-  ```
- `engine` - instance of `Engine` class for model inference.
-
-*Returns*
- instance of the `Pipeline` class.
-
-## Usage Example
-Before running the optimization tool it's highly recommended to make sure that
- The model was converted to the OpenVINO&trade; Intermediate Representation (IR) from the source framework using [Model Optimizer](@ref openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide).
- The model can be successfully inferred with OpenVINO&trade; Inference Engine in floating-point precision.
- The model achieves the same accuracy as in the original training framework.
-
-As was described above, `DataLoader`, `Metric` and `Engine` interfaces should be implemented in order to create 
-the custom optimization pipeline for your model. There might be a case you have the Python* validation script for your 
-model using the [OpenVINO&trade; Runtime](@ref openvino_docs_OV_Runtime_User_Guide),
-which in practice includes loading a dataset, model inference, and calculating the accuracy metric.
-So you just need to wrap the existing functions of your validation script in `DataLoader`, `Metric` and `Engine` interfaces. 
-In another case, you need to implement interfaces from scratch. 
-
-For facilitation of using Python* POT API, we implemented `IEEngine` class providing the model inference of the most models 
-from the Vision Domain which can be reused for an arbitrary model.      
-
-After `YourDataLoader`, `YourMetric`, `YourEngine` interfaces are implemented, the custom optimization pipeline can be 
-created and applied to the model as follows:
- 
-```
-# Step 1: Load the model.
-model_config = {
-        'model_name': 'your_model',
-        'model': <PATH_TO_MODEL>/your_model.xml,
-        'weights': <PATH_TO_WEIGHTS/your_model.bin>
-}
-model = load_model(model_config)
-
-# Step 2: Initialize the data loader.
-dataset_config = {} # dictionary with the dataset parameters 
-data_loader = YourDataLoader(dataset_config)
-
-# Step 3 (Optional. Required for AccuracyAwareQuantization): Initialize the metric.
-metric = YourMetric()
-
-# Step 4: Initialize the engine for metric calculation and statistics collection.
-engine_config = {} # dictionary with the engine parameters
-engine = YourEngine(engine_config, data_loader, metric)
-
-# Step 5: Create a pipeline of compression algorithms.
-pipeline = create_pipeline(algorithms, engine)
-
-# Step 6: Execute the pipeline.
-compressed_model = pipeline.run(model)
-
-# Step 7: Save the compressed model.
-save_model(compressed_model, "path_to_save_model")
-```
-
-For in-depth examples of using Python* POT API, browse the samples included into the OpenVINO&trade; toolkit installation 
-and available in the `<POT_DIR>/api/samples` directory. There are currently five samples that demonstrate the implementation of `Engine`, `Metric` and `DataLoader` interfaces for classification, detection and segmentation tasks.