Files
openvino/docs/optimization_guide/ptq_introduction.md
Maksim Proshin 2513db5a04 DOCS Update optimization docs with NNCF PTQ changes and deprecation of POT (#17398)
* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update home.rst

* Update ptq_introduction.md

* Update Introduction.md

* Update Introduction.md

* Update Introduction.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update quantization_w_accuracy_control.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update ptq_introduction.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update quantization_w_accuracy_control.md

* Update Introduction.md

* Update FrequentlyAskedQuestions.md

* Update model_optimization_guide.md

* Update Introduction.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update model_optimization_guide.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* added code snippet (#1)

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update model_optimization_guide.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update quantization_w_accuracy_control.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update basic_quantization_flow.md

* Update ptq_introduction.md

* Update ptq_introduction.md

* Delete ptq_introduction.md

* Update FrequentlyAskedQuestions.md

* Update Introduction.md

* Update quantization_w_accuracy_control.md

* Update introduction.md

* Update basic_quantization_flow.md code blocks

* Update quantization_w_accuracy_control.md code snippets

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* Update model_optimization_guide.md

* Optimization docs proofreading  (#2)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* Update basic_quantization_flow.md

* Update quantization_w_accuracy_control.md

* Update images (#3)

* images updated

* delete reminder

* review

* text review

* change images to original ones

* Update filter_pruning.md code blocks

* update images

* resolve conflicts

* resolve conflicts

* change images to original ones

* resolve conflicts

* update images

* fix conflicts

* Update model_optimization_guide.md

* Update docs/optimization_guide/nncf/ptq/code/ptq_tensorflow.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* Update docs/optimization_guide/nncf/ptq/code/ptq_torch.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* Update docs/optimization_guide/nncf/ptq/code/ptq_onnx.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* Update docs/optimization_guide/nncf/ptq/code/ptq_aa_openvino.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* Update docs/optimization_guide/nncf/ptq/code/ptq_openvino.py

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>

* table format fix

* Update headers

* Update qat.md code blocks

---------

Co-authored-by: Alexander Suslov <alexander.suslov@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
2023-05-19 14:36:22 +00:00

2.7 KiB

Quantizing Models Post-training

@sphinxdirective

.. toctree:: :maxdepth: 1 :hidden:

basic_quantization_flow quantization_w_accuracy_control pot_introduction

Post-training model optimization is the process of applying special methods that transform the model into a more hardware-friendly representation without retraining or fine-tuning. The most popular and widely-spread method here is 8-bit post-training quantization because it is:

  • It is easy-to-use.
  • It does not hurt accuracy a lot.
  • It provides significant performance improvement.
  • It suites many hardware available in stock since most of them support 8-bit computation natively.

8-bit integer quantization lowers the precision of weights and activations to 8 bits, which leads to almost 4x reduction in the model footprint and significant improvements in inference speed, mostly due to lower throughput required for the inference. This lowering step is done offline, before the actual inference, so that the model gets transformed into the quantized representation. The process does not require a training dataset or a training pipeline in the source DL framework.

.. image:: _static/images/quantization_picture.svg

Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>__ provides a post-training quantization API available in Python that is aimed at reusing the code for model training or validation that is usually available with the model in the source framework, for example, PyTorch or TensroFlow. The NNCF API is cross-framework and currently supports models in the following frameworks: OpenVINO, PyTorch, TensorFlow 2.x, and ONNX. Currently, post-training quantization for models in OpenVINO Intermediate Representation is the most mature in terms of supported methods and models coverage.

NNCF API has two main capabilities to apply 8-bit post-training quantization:

  • :doc:Basic quantization <basic_quantization_flow> - the simplest quantization flow that allows applying 8-bit integer quantization to the model. A representative calibration dataset is only needed in this case.
  • :doc:Quantization with accuracy control <quantization_w_accuracy_control> - the most advanced quantization flow that allows applying 8-bit quantization to the model with accuracy control. Calibration and validation datasets, and a validation function to calculate the accuracy metric are needed in this case.

Additional Resources ####################

  • :doc:Optimizing Models at Training Time <tmo_introduction>
  • NNCF GitHub <https://github.com/openvinotoolkit/nncf>__
  • Tutorial: Migrate quantization from POT API to NNCF API <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/111-yolov5-quantization-migration>__

@endsphinxdirective