DOCS shift to rst - Model Optimization Guide articles (#16598)

This commit is contained in:
Sebastian Golebiewski
2023-03-31 11:26:04 +02:00
committed by GitHub
parent bb93bfd90f
commit f9ff518d16
8 changed files with 113 additions and 62 deletions

View File

@@ -1,21 +1,17 @@
# Model Optimization Guide {#openvino_docs_model_optimization_guide}
# Model Optimization Guide {#openvino_docs_model_optimization_guide}
@sphinxdirective
.. toctree::
:maxdepth: 1
:hidden:
ptq_introduction
tmo_introduction
(Experimental) Protecting Model <pot_ranger_README>
@endsphinxdirective
Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
@sphinxdirective
Model optimization is an optional offline step of improving final model performance by applying special optimization methods, such as quantization, pruning, preprocessing optimization, etc. OpenVINO provides several tools to optimize models at different steps of model development:
- :doc:`Model Optimizer <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>` implements most of the optimization parameters to a model by default. Yet, you are free to configure mean/scale values, batch size, RGB vs BGR input channels, and other parameters to speed up preprocess of a model (:doc:`Embedding Preprocessing Computation <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`).
@@ -23,25 +19,29 @@
- :doc:`Training-time Optimization <nncf_ptq_introduction>`, a suite of advanced methods for training-time model optimization within the DL framework, such as PyTorch and TensorFlow 2.x. It supports methods, like Quantization-aware Training and Filter Pruning. NNCF-optimized models can be inferred with OpenVINO using all the available workflows.
@endsphinxdirective
Detailed workflow:
##################
## Detailed workflow:
To understand which development optimization tool you need, refer to the diagram:
![](../img/DEVELOPMENT_FLOW_V3_crunch.svg)
.. image:: _static/images/DEVELOPMENT_FLOW_V3_crunch.svg
Post-training methods are limited in terms of achievable accuracy-performance trade-off for optimizing models. In this case, training-time optimization with NNCF is an option.
Once the model is optimized using the aforementioned tools it can be used for inference using the regular OpenVINO inference workflow. No changes to the inference code are required.
![](../img/WHAT_TO_USE.svg)
.. image:: _static/images/WHAT_TO_USE.svg
Post-training methods are limited in terms of achievable accuracy, which may degrade for certain scenarios. In such cases, training-time optimization with NNCF may give better results.
Once the model has been optimized using the aforementioned tools, it can be used for inference using the regular OpenVINO inference workflow. No changes to the code are required.
If you are not familiar with model optimization methods, refer to [post-training methods](@ref pot_introduction).
If you are not familiar with model optimization methods, refer to :doc:`post-training methods <pot_introduction>`.
## Additional Resources
- [Deployment optimization](./dldt_deployment_optimization_guide.md)
Additional Resources
####################
- :doc:`Deployment optimization <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`
@endsphinxdirective

View File

@@ -9,23 +9,27 @@
pot_introduction
nncf_ptq_introduction
@endsphinxdirective
Post-training model optimization is the process of applying special methods that transform the model into a more hardware-friendly representation without retraining or fine-tuning. The most popular and widely-spread method here is 8-bit post-training quantization because it is:
* It is easy-to-use.
* It does not hurt accuracy a lot.
* It provides significant performance improvement.
* It suites many hardware available in stock since most of them support 8-bit computation natively.
8-bit integer quantization lowers the precision of weights and activations to 8 bits, which leads to almost 4x reduction in the model footprint and significant improvements in inference speed, mostly due to lower throughput required for the inference. This lowering step is done offline, before the actual inference, so that the model gets transformed into the quantized representation. The process does not require a training dataset or a training pipeline in the source DL framework.
8-bit integer quantization lowers the precision of weights and activations to 8 bits, which leads to almost 4x reduction in the model footprint and significant improvements in inference speed, mostly due to lower throughput required for the inference. This lowering step is done offline, before the actual inference, so that the model gets transformed into the quantized representation. The process does not require a training dataset or a training pipeline in the source DL framework.
![](../img/quantization_picture.svg)
.. image:: _static/images/quantization_picture.svg
To apply post-training methods in OpenVINO, you need:
* A floating-point precision model, FP32 or FP16, converted into the OpenVINO Intermediate Representation (IR) format that can be run on CPU.
* A representative calibration dataset, representing a use case scenario, for example, of 300 samples.
* In case of accuracy constraints, a validation dataset and accuracy metrics should be available.
Currently, OpenVINO provides two workflows with post-training quantization capabilities:
* [Post-training Quantization with POT](@ref pot_introduction) - works with models in OpenVINO Intermediate Representation (IR) only.
* [Post-training Quantization with NNCF](@ref nncf_ptq_introduction) - cross-framework solution for model optimization that provides a new simple API for post-training quantization.
* :doc:`Post-training Quantization with POT <pot_introduction>` - works with models in OpenVINO Intermediate Representation (IR) only.
* :doc:`Post-training Quantization with NNCF <nncf_ptq_introduction>` - cross-framework solution for model optimization that provides a new simple API for post-training quantization.
@endsphinxdirective