From 8fad140a0258ea00e28526851bc9d4ce01d21af6 Mon Sep 17 00:00:00 2001
From: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
Date: Fri, 31 Mar 2023 09:06:52 +0200
Subject: [PATCH] DOCS shift to rst - Quantization articles (#16596)

---
 tools/pot/docs/BestPractices.md               | 157 ++++++------
 .../algorithms/quantization/default/README.md | 232 ++++++++++--------
 2 files changed, 211 insertions(+), 178 deletions(-)

diff --git a/tools/pot/docs/BestPractices.md b/tools/pot/docs/BestPractices.md
index a64470209fe..2b56905efa2 100644
--- a/tools/pot/docs/BestPractices.md
+++ b/tools/pot/docs/BestPractices.md
@@ -5,108 +5,111 @@
 .. toctree::
    :maxdepth: 1
    :hidden:
-   
+
    Saturation Issue <pot_saturation_issue>
 
-@endsphinxdirective
 
-The [Default Quantization](@ref pot_default_quantization_usage) of the Post-training Optimization Tool (POT) is 
+The :doc:`Default Quantization <pot_default_quantization_usage>` of the Post-training Optimization Tool (POT) is
 the fastest and easiest way to get a quantized model. It requires only some unannotated representative dataset to be provided in most cases. Therefore, it is recommended to use it as a starting point when it comes to model optimization. However, it can lead to significant accuracy deviation in some cases. The purpose of this article is to provide tips to address this issue.
 
-> **NOTE**: POT uses inference on the CPU during model optimization. It means that ability to infer the original
-> floating-point model is essential for model optimization. 
-> It is also worth mentioning that in case of the 8-bit quantization, it is recommended to run POT on the same CPU
-> architecture when optimizing for CPU or VNNI-based CPU when quantizing for a non-CPU device, such as GPU, VPU, or GNA.
-> It should help to avoid the impact of the [saturation issue](@ref pot_saturation_issue) that occurs on AVX and SSE based CPU devices. 
+.. note::
+
+   POT uses inference on the CPU during model optimization. It means that ability to infer the original floating-point model is essential for model optimization. It is also worth mentioning that in case of the 8-bit quantization, it is recommended to run POT on the same CPU architecture when optimizing for CPU or VNNI-based CPU when quantizing for a non-CPU device, such as GPU, VPU, or GNA. It should help to avoid the impact of the :doc:`saturation issue <pot_saturation_issue>` that occurs on AVX and SSE based CPU devices.
+
+
+Improving accuracy after the Default Quantization
+#################################################
 
-## Improving accuracy after the Default Quantization
 Parameters of the Default Quantization algorithm with basic settings are presented below:
-```python
-{
-    "name": "DefaultQuantization", # Optimization algorithm name
-    "params": {
-        "preset": "performance", # Preset [performance, mixed] which controls 
-                                 # the quantization scheme. For the CPU: 
-                                 # performance - symmetric quantization  of weights and activations.
-                                 # mixed - symmetric weights and asymmetric activations.
-                                 # accuracy - the same as "mixed" for CPU, GPU, and GNA devices; asymmetric weights and activations for VPU device.
-        "stat_subset_size": 300  # Size of subset to calculate activations statistics that can be used
-                                 # for quantization parameters calculation.
-    }
-}
-```
+
+.. code-block:: python
+
+   {
+       "name": "DefaultQuantization", # Optimization algorithm name
+       "params": {
+           "preset": "performance", # Preset [performance, mixed] which controls
+                                    # the quantization scheme. For the CPU:
+                                    # performance - symmetric quantization  of weights and activations.
+                                    # mixed - symmetric weights and asymmetric activations.
+                                    # accuracy - the same as "mixed" for CPU, GPU, and GNA devices; asymmetric weights and activations for VPU device.
+           "stat_subset_size": 300  # Size of subset to calculate activations statistics that can be used
+                                    # for quantization parameters calculation.
+       }
+   }
+
 
 There are two alternatives in case of substantial accuracy degradation after applying this method:
-1.  Hyperparameters tuning.
-2.  AccuracyAwareQuantization algorithm.
 
-### Tuning Hyperparameters of the Default Quantization
-The Default Quantization algorithm provides multiple hyperparameters which can be used in order to improve accuracy results for the fully-quantized model. 
+1. Hyperparameters tuning.
+2. AccuracyAwareQuantization algorithm.
+
+Tuning Hyperparameters of the Default Quantization
+++++++++++++++++++++++++++++++++++++++++++++++++++
+
+The Default Quantization algorithm provides multiple hyperparameters which can be used in order to improve accuracy results for the fully-quantized model.
 Below is a list of best practices that can be applied to improve accuracy without a substantial performance reduction with respect to default settings:
 
-1. The first recommended option is to change the `preset` from `performance` to `mixed`. This enables asymmetric quantization of 
-activations and can be helpful for models with non-ReLU activation functions, for example, YOLO, EfficientNet, etc.
+1. The first recommended option is to change the ``preset`` from ``performance`` to ``mixed``. This enables asymmetric quantization of activations and can be helpful for models with non-ReLU activation functions, for example, YOLO, EfficientNet, etc.
+2. The next option is ``use_fast_bias``. Setting this option to ``false`` enables a different bias correction method which is more accurate, in general, and applied after model quantization as a part of the Default Quantization algorithm.
 
-2. The next option is `use_fast_bias`. Setting this option to `false` enables a different bias correction method which is more accurate, in general,
-and applied after model quantization as a part of the Default Quantization algorithm.
-   > **NOTE**: Changing this option can substantially increase quantization time in the POT tool.
+   .. note:: Changing this option can substantially increase quantization time in the POT tool.
 
-3. Some model architectures require a special approach when being quantized. For example, Transformer-based models need to keep some operations in the original precision to preserve accuracy. That is why POT provides a `model_type` option to specify the model architecture. Now, only `"transformer"` type is available. Use it to quantize Transformer-based models, e.g. BERT.
+3. Some model architectures require a special approach when being quantized. For example, Transformer-based models need to keep some operations in the original precision to preserve accuracy. That is why POT provides a ``model_type`` option to specify the model architecture. Now, only ``"transformer"`` type is available. Use it to quantize Transformer-based models, e.g. BERT.
+4. Another important option is a `range_estimator`. It defines how to calculate the minimum and maximum of quantization range for weights and activations. For example, the following ``range_estimator`` for activations can improve the accuracy for Faster R-CNN based networks:
 
-4. Another important option is a `range_estimator`. It defines how to calculate the minimum and maximum of quantization range for weights and activations.
-For example, the following `range_estimator` for activations can improve the accuracy for Faster R-CNN based networks:
-```python
-{
-    "name": "DefaultQuantization", 
-    "params": {
-        "preset": "performance", 
-        "stat_subset_size": 300  
-                                    
+   .. code-block:: python
 
-        "activations": {                 # defines activation
-            "range_estimator": {         # defines how to estimate statistics 
-                "max": {                 # right border of the quantizating floating-point range
-                    "aggregator": "max", # use max(x) to aggregate statistics over calibration dataset
-                    "type": "abs_max"    # use abs(max(x)) to get per-sample statistics
-                }
-            }
-        }
-    }
-}
-```
+      {
+          "name": "DefaultQuantization",
+          "params": {
+              "preset": "performance",
+              "stat_subset_size": 300
+              "activations": {                 # defines activation
+                  "range_estimator": {         # defines how to estimate statistics
+                      "max": {                 # right border of the quantizating floating-point range
+                          "aggregator": "max", # use max(x) to aggregate statistics over calibration dataset
+                          "type": "abs_max"    # use abs(max(x)) to get per-sample statistics
+                      }
+                  }
+              }
+          }
+      }
 
-5. The next option is `stat_subset_size`. It controls the size of the calibration dataset used by POT to collect statistics for quantization parameters initialization.
-It is assumed that this dataset should contain a sufficient number of representative samples. Thus, varying this parameter may affect accuracy (higher is better). 
-However, we empirically found that 300 samples are sufficient to get representative statistics in most cases.
 
-6. The last option is `ignored_scope`. It allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision.
-For example, `DetectionOutput` layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models.
-One of the sources for the ignored scope can be the Accuracy-aware algorithm which can revert layers back to the original precision (see details below).
+5. The next option is ``stat_subset_size``. It controls the size of the calibration dataset used by POT to collect statistics for quantization parameters initialization. It is assumed that this dataset should contain a sufficient number of representative samples. Thus, varying this parameter may affect accuracy (higher is better). However, we empirically found that 300 samples are sufficient to get representative statistics in most cases.
+6. The last option is ``ignored_scope``. It allows excluding some layers from the quantization process, i.e. their inputs will not be quantized. It may be helpful for some patterns for which it is known in advance that they drop accuracy when executing in low-precision. For example, ``DetectionOutput`` layer of SSD model expressed as a subgraph should not be quantized to preserve the accuracy of Object Detection models. One of the sources for the ignored scope can be the Accuracy-aware algorithm which can revert layers back to the original precision (see details below).
 
-Find all the possible options and their description in the configuration [specification file](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json) in the POT directory.
+Find all the possible options and their description in the configuration `specification file <https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json>`__ in the POT directory.
 
-## Accuracy-aware Quantization
-When the steps above do not lead to the accurate quantized model, you may use the so-called [Accuracy-aware Quantization](@ref pot_accuracyaware_usage) algorithm which leads to mixed-precision models. 
-A fragment of Accuracy-aware Quantization configuration with default settings is shown below:
+Accuracy-aware Quantization
+###########################
 
-```python
-{
-    "name": "AccuracyAwareQuantization",
-    "params": {
-        "preset": "performance", 
-        "stat_subset_size": 300,
+When the steps above do not lead to the accurate quantized model, you may use the so-called :doc:`Accuracy-aware Quantization <pot_accuracyaware_usage>` algorithm which leads to mixed-precision models. A fragment of Accuracy-aware Quantization configuration with default settings is shown below:
 
-        "maximal_drop": 0.01 # Maximum accuracy drop which has to be achieved after the quantization
-    }
-}
+.. code-block:: python
+
+   {
+       "name": "AccuracyAwareQuantization",
+       "params": {
+           "preset": "performance",
+           "stat_subset_size": 300,
+           "maximal_drop": 0.01 # Maximum accuracy drop which has to be achieved after the quantization
+       }
+   }
 
-```
 
 Since the Accuracy-aware Quantization calls the Default Quantization at the first step it means that all the parameters of the latter one are also valid and can be applied to the accuracy-aware scenario.
 
-> **NOTE**: In general, the potential increase in speed with the Accuracy-aware Quantization algorithm is not as high  as with the Default Quantization, when the model gets fully quantized.
+.. note::
 
-### Reducing the performance gap of Accuracy-aware Quantization
-To improve model performance after Accuracy-aware Quantization, try the `"tune_hyperparams"` setting and set it to `True`. It will enable searching for optimal quantization parameters before reverting layers to the "backup" precision. Note that this may impact the overall quantization time, though.
+   In general, the potential increase in speed with the Accuracy-aware Quantization algorithm is not as high  as with the Default Quantization, when the model gets fully quantized.
 
-If you do not achieve the desired accuracy and performance after applying the Accuracy-aware Quantization algorithm or you need an accurate fully-quantized model, we recommend either using Quantization-Aware Training from [NNCF](@ref tmo_introduction).
+
+Reducing the performance gap of Accuracy-aware Quantization
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+To improve model performance after Accuracy-aware Quantization, try the ``"tune_hyperparams"`` setting and set it to ``True``. It will enable searching for optimal quantization parameters before reverting layers to the "backup" precision. Note that this may impact the overall quantization time, though.
+
+If you do not achieve the desired accuracy and performance after applying the Accuracy-aware Quantization algorithm or you need an accurate fully-quantized model, we recommend either using Quantization-Aware Training from :doc:`NNCF <tmo_introduction>`.
+
+@endsphinxdirective
\ No newline at end of file
diff --git a/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md b/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
index 3b13371699a..e685c503d00 100644
--- a/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
+++ b/tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
@@ -1,121 +1,151 @@
 # DefaultQuantization Parameters {#pot_compression_algorithms_quantization_default_README}
 
+@sphinxdirective
+
 The DefaultQuantization Algorithm is designed to perform fast and accurate quantization. It does not offer direct control over the accuracy metric itself but provides many options that can be used to improve it.
 
-## Parameters
-Default Quantization algorithm has mandatory and optional parameters. For more details on how to use these parameters, refer to [Best Practices](@ref pot_docs_BestPractices) document. Below is an example of the definition of Default Quantization method and its parameters:
-```python
-{
-    "name": "DefaultQuantization", # the name of optimization algorithm
-    "params": {
-        ...
-    }
-}
-```
+Parameters
+####################
 
-### Mandatory parameters
-- `"preset"` - a preset which controls the quantization mode (symmetric and asymmetric). It can take two values:
-    - `"performance"` (default) - stands for symmetric quantization of weights and activations. This is the most
-    efficient across all the HW.
-    - `"mixed"` - symmetric quantization of weights and asymmetric quantization of activations. This mode can be useful
-    for quantization of NN, which has both negative and positive input values in quantizing operations, for example
-    non-ReLU based CNN.  
-- `"stat_subset_size"` - size of a subset to calculate activations statistics used for quantization. The whole dataset
-is used if no parameter is specified. It is recommended to use not less than 300 samples.
-- `"stat_batch_size"` - size of a batch to calculate activations statistics used for quantization. It has a value of 1 if no parameter is specified.
+Default Quantization algorithm has mandatory and optional parameters. For more details on how to use these parameters, refer to :doc:`Best Practices <pot_docs_BestPractices>` document. Below is an example of the definition of Default Quantization method and its parameters:
+
+.. code-block:: python
+
+   {
+       "name": "DefaultQuantization", # the name of optimization algorithm
+       "params": {
+           ...
+       }
+   }
+
+
+Mandatory parameters
+++++++++++++++++++++
+
+- ``"preset"`` - a preset which controls the quantization mode (symmetric and asymmetric). It can take two values:
+
+  - ``"performance"`` (default) - stands for symmetric quantization of weights and activations. This is the most efficient across all the HW.
+  - ``"mixed"`` - symmetric quantization of weights and asymmetric quantization of activations. This mode can be useful for quantization of NN, which has both negative and positive input values in quantizing operations, for example non-ReLU based CNN.
+
+- ``"stat_subset_size"`` - size of a subset to calculate activations statistics used for quantization. The whole dataset is used if no parameter is specified. It is recommended to use not less than 300 samples.
+- ``"stat_batch_size"`` - size of a batch to calculate activations statistics used for quantization. It has a value of 1 if no parameter is specified.
+
+Optional parameters
++++++++++++++++++++
 
-### Optional parameters
 All other options should be considered as an advanced mode and require deep knowledge of the quantization process. Below
 is an overall description of all possible parameters:
-- `"model type"` - required for accurate optimization of some model architectures. Now, only `"transformer"` type is supported for Transformer-based models (BERT, etc.). Default value is `None`.
-- `"inplace_statistics"` - used to change a method of statistics collection from in-place (in-graph operations) to external collectors that require more memory but can increase optimization time. Default value is `True`.
-- `"ignored"` - NN subgraphs which should be excluded from the optimization process
-    - `"scope"` - list of particular nodes to exclude
-    - `"operations"` - list of operation types to exclude (expressed in OpenVINO IR notation). This list consists of
-    the following tuples:
-        - `"type"` - a type of ignored operation.
-        - `"attributes"` - if attributes are defined, they will be considered during the ignorance. They are defined by
-        a dictionary of `"<NAME>": "<VALUE>"` pairs.
-- `"weights"` - this section describes quantization scheme for weights and the way to estimate the
-quantization range for that. It is worth noting that changing the quantization scheme may lead to inability to infer such
-mode on the existing HW.
-    - `"bits"` - bit-width, the default value is "8".
-    - `"mode"` - a quantization mode (symmetric or asymmetric).
-    - `"level_low"` - the minimum level in the integer range to quantize. The default is "0" for an unsigned range, and "-2^(bit-1)" for a signed one .
-    - `"level_high"` - the maximum level in the integer range to quantize. The default is "2^bits-1" for an unsigned range, and "2^(bit-1)-1" for a signed one.
-    - `"granularity"` - quantization scale granularity. It can take the following values:
-        - `"pertensor"` (default) - per-tensor quantization with one scale factor and zero-point.
-        - `"perchannel"` - per-channel quantization with per-channel scale factor and zero-point.
-    - `"range_estimator"` - this section describes parameters of range estimator that is used in MinMaxQuantization
-    method to get the quantization ranges and filter outliers based on the collected statistics. Below are the parameters
-    that can be modified to get better accuracy results:
-        - `"max"` - parameters to estimate top border of quantizing floating-point range:
-            - `"type"` - a type of the estimator:
-                - `"max"` (default) - estimates the maximum in the quantizing set of value.
-                - `"quantile"` - estimates the quantile in the quantizing set of value.
-            - `"outlier_prob"` - outlier probability used in the "quantile" estimator.
-        - `"min"` - parameters to estimate bottom border of quantizing floating-point range:
-            - `"type"` - a type of the estimator:
-                - `"min"` (default) - estimates the minimum in the quantizing set of value.
-                - `"quantile"` - estimates the quantile in the quantizing set of value.
-            - `"outlier_prob"` - outlier probability used in the "quantile" estimator.
-- `"activations"` - this section describes quantization scheme for activations and the way to estimate the
-quantization range for that. As before, changing the quantization scheme may lead to inability to infer such
-mode on the existing HW:
-    - `"bits"` - bit-width, the default value is "8".
-    - `"mode"` - a quantization mode (symmetric or asymmetric).
-    - `"level_low"` - the minimum level in the integer range to quantize. The default is "0" for an unsigned range, and "-2^(bit-1)" for a signed one.
-    - `"level_high"` - the maximum level in the integer range to quantize. The default is "2^bits-1" for an unsigned range, and "2^(bit-1)-1" for a signed one.
-    - `"granularity"` - quantization scale granularity. It can take the following values:
-        - `"pertensor"` (default) - per-tensor quantization with one scale factor and zero-point.
-        - `"perchannel"` - per-channel quantization with per-channel scale factor and zero-point.
-    - `"range_estimator"` - this section describes parameters of range estimator that is used in MinMaxQuantization
-    method to get the quantization ranges and filter outliers based on the collected statistics. These are the parameters
-    that can be modified to get better accuracy results:
-        - `"preset"` - preset that defines the same estimator for both top and bottom borders of quantizing
-        floating-point range. Possible value is `"quantile"`.
-        - `"max"` - parameters to estimate top border of quantizing floating-point range:
-            - `"aggregator"` - a type of the function used to aggregate statistics obtained with the estimator
-            over the calibration dataset to get a value of the top border:
-                - `"mean"` (default) - aggregates mean value.
-                - `"max"` - aggregates max value.
-                - `"min"` - aggregates min value.
-                - `"median"` - aggregates median value.
-                - `"mean_no_outliers"` - aggregates mean value after removal of extreme quantiles.
-                - `"median_no_outliers"` - aggregates median value after removal of extreme quantiles.
-                - `"hl_estimator"` - Hodges-Lehmann filter based aggregator.
-            - `"type"` - a type of the estimator:
-                - `"max"` (default) - estimates the maximum in the quantizing set of value.
-                - `"quantile"` - estimates the quantile in the quantizing set of value.
-            - `"outlier_prob"` - outlier probability used in the "quantile" estimator.
-        - `"min"` - parameters to estimate bottom border of quantizing floating-point range:
-            - `"type"` - a type of the estimator:
-                - `"max"` (default) - estimates the maximum in the quantizing set of value.
-                - `"quantile"` - estimates the quantile in the quantizing set of value.
-            - `"outlier_prob"` - outlier probability used in the "quantile" estimator.
-- `"use_layerwise_tuning"` - enables layer-wise fine-tuning of model parameters (biases, Convolution/MatMul weights and FakeQuantize scales) by minimizing the mean squared error between original and quantized layer outputs.
-Enabling this option may increase compressed model accuracy, but will result in increased execution time and memory consumption.
 
-## Additional Resources
+- ``"model type"`` - required for accurate optimization of some model architectures. Now, only ``"transformer"`` type is supported for Transformer-based models (BERT, etc.). Default value is `None`.
+- ``"inplace_statistics"`` - used to change a method of statistics collection from in-place (in-graph operations) to external collectors that require more memory but can increase optimization time. Default value is `True`.
+- ``"ignored"`` - NN subgraphs which should be excluded from the optimization process
+
+  - ``"scope"`` - list of particular nodes to exclude
+  - ``"operations"`` - list of operation types to exclude (expressed in OpenVINO IR notation). This list consists of the following tuples:
+
+    - ``"type"`` - a type of ignored operation.
+    - ``"attributes"`` - if attributes are defined, they will be considered during the ignorance. They are defined bya dictionary of ``"<NAME>": "<VALUE>"`` pairs.
+
+- ``"weights"`` - this section describes quantization scheme for weights and the way to estimate the quantization range for that. It is worth noting that changing the quantization scheme may lead to inability to infer such mode on the existing HW.
+
+  - ``"bits"`` - bit-width, the default value is "8".
+  - ``"mode"`` - a quantization mode (symmetric or asymmetric).
+  - ``"level_low"`` - the minimum level in the integer range to quantize. The default is "0" for an unsigned range, and "-2^(bit-1)" for a signed one .
+  - ``"level_high"`` - the maximum level in the integer range to quantize. The default is "2^bits-1" for an unsigned range, and "2^(bit-1)-1" for a signed one.
+  - ``"granularity"`` - quantization scale granularity. It can take the following values:
+
+    - ``"pertensor"`` (default) - per-tensor quantization with one scale factor and zero-point.
+    - ``"perchannel"`` - per-channel quantization with per-channel scale factor and zero-point.
+
+  - ``"range_estimator"`` - this section describes parameters of range estimator that is used in MinMaxQuantization method to get the quantization ranges and filter outliers based on the collected statistics. Below are the parameters that can be modified to get better accuracy results:
+
+    - ``"max"`` - parameters to estimate top border of quantizing floating-point range:
+
+      - ``"type"`` - a type of the estimator:
+
+        - ``"max"`` (default) - estimates the maximum in the quantizing set of value.
+        - ``"quantile"`` - estimates the quantile in the quantizing set of value.
+
+      - ``"outlier_prob"`` - outlier probability used in the "quantile" estimator.
+
+    - ``"min"`` - parameters to estimate bottom border of quantizing floating-point range:
+
+      - ``"type"`` - a type of the estimator:
+
+        - ``"min"`` (default) - estimates the minimum in the quantizing set of value.
+        - ``"quantile"`` - estimates the quantile in the quantizing set of value.
+
+      - ``"outlier_prob"`` - outlier probability used in the "quantile" estimator.
+
+- ``"activations"`` - this section describes quantization scheme for activations and the way to estimate the quantization range for that. As before, changing the quantization scheme may lead to inability to infer such mode on the existing HW:
+
+  - ``"bits"`` - bit-width, the default value is "8".
+  - ``"mode"`` - a quantization mode (symmetric or asymmetric).
+  - ``"level_low"`` - the minimum level in the integer range to quantize. The default is "0" for an unsigned range, and "-2^(bit-1)" for a signed one.
+  - ``"level_high"`` - the maximum level in the integer range to quantize. The default is "2^bits-1" for an unsigned range, and "2^(bit-1)-1" for a signed one.
+  - ``"granularity"`` - quantization scale granularity. It can take the following values:
+
+    - ``"pertensor"`` (default) - per-tensor quantization with one scale factor and zero-point.
+    - ``"perchannel"`` - per-channel quantization with per-channel scale factor and zero-point.
+
+  - ``"range_estimator"`` - this section describes parameters of range estimator that is used in MinMaxQuantization method to get the quantization ranges and filter outliers based on the collected statistics. These are the parameters that can be modified to get better accuracy results:
+
+    - ``"preset"`` - preset that defines the same estimator for both top and bottom borders of quantizing floating-point range. Possible value is ``"quantile"``.
+    - ``"max"`` - parameters to estimate top border of quantizing floating-point range:
+
+      - ``"aggregator"`` - a type of the function used to aggregate statistics obtained with the estimator over the calibration dataset to get a value of the top border:
+
+        - ``"mean"`` (default) - aggregates mean value.
+        - ``"max"`` - aggregates max value.
+        - ``"min"`` - aggregates min value.
+        - ``"median"`` - aggregates median value.
+        - ``"mean_no_outliers"`` - aggregates mean value after removal of extreme quantiles.
+        - ``"median_no_outliers"`` - aggregates median value after removal of extreme quantiles.
+        - ``"hl_estimator"`` - Hodges-Lehmann filter based aggregator.
+
+      - ``"type"`` - a type of the estimator:
+
+        - ``"max"`` (default) - estimates the maximum in the quantizing set of value.
+        - ``"quantile"`` - estimates the quantile in the quantizing set of value.
+
+      - ``"outlier_prob"`` - outlier probability used in the "quantile" estimator.
+
+    - ``"min"`` - parameters to estimate bottom border of quantizing floating-point range:
+
+      - ``"type"`` - a type of the estimator:
+
+        - ``"max"`` (default) - estimates the maximum in the quantizing set of value.
+        - ``"quantile"`` - estimates the quantile in the quantizing set of value.
+
+      - ``"outlier_prob"`` - outlier probability used in the "quantile" estimator.
+
+- ``"use_layerwise_tuning"`` - enables layer-wise fine-tuning of model parameters (biases, Convolution/MatMul weights and FakeQuantize scales) by minimizing the mean squared error between original and quantized layer outputs. Enabling this option may increase compressed model accuracy, but will result in increased execution time and memory consumption.
+
+Additional Resources
+####################
+
 Tutorials:
-* [Quantization of Image Classification model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino)
-* [Quantization of Object Detection model from Model Zoo](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization)
-* [Quantization of Segmentation model for medical data](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize)
-* [Quantization of BERT for Text Classification](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert)
+
+* `Quantization of Image Classification model <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/301-tensorflow-training-openvino>`__
+* `Quantization of Object Detection model from Model Zoo <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-detection-quantization>`__
+* `Quantization of Segmentation model for medical data <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/110-ct-segmentation-quantize>`__
+* `Quantization of BERT for Text Classification <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/105-language-quantize-bert>`__
 
 Examples:
-* [Quantization of 3D segmentation model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/3d_segmentation)
-* [Quantization of Face Detection model](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/face_detection)
-* [Quantizatin of speech model for GNA device](https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/speech)
+
+* :doc:`Quantization of 3D segmentation model <pot_example_3d_segmentation_README>`
+* :doc:`Quantization of Face Detection model <pot_example_face_detection_README>`
+* :doc:`Quantization of speech model for GNA device <pot_example_speech_README>`
 
 Command-line example:
-* [Quantization of Image Classification model](https://docs.openvino.ai/latest/pot_configs_examples_README.html)
+
+* :doc:`Quantization of Image Classification model <pot_configs_examples_README>`
 
 A template and full specification for DefaultQuantization algorithm for POT command-line interface:
-* [Template](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/openvino/tools/pot/configs/templates/default_quantization_template.json)
-* [Full specification](https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json)
 
-@sphinxdirective
+* `Template <https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/openvino/tools/pot/configs/templates/default_quantization_template.json>`__
+* `Full specification <https://github.com/openvinotoolkit/openvino/blob/master/tools/pot/configs/default_quantization_spec.json>`__
+
 
 .. dropdown:: Template