[DOCS] 'Quantization-aware Training' article update (#15617)
This commit is contained in:
parent
28cfb988e7
commit
bd3a392d84
@ -1,172 +1,201 @@
|
||||
# Quantization-aware Training (QAT) {#qat_introduction}
|
||||
|
||||
## Introduction
|
||||
Quantization-aware Training is a popular method that allows quantizing a model and applying fine-tuning to restore accuracy degradation caused by quantization. In fact, this is the most accurate quantization method. This document describes how to apply QAT from the Neural Network Compression Framework (NNCF) to get 8-bit quantized models. This assumes that you are knowledgeable in Python* programming and familiar with the training code for the model in the source DL framework.
|
||||
@sphinxdirective
|
||||
|
||||
## Using NNCF QAT
|
||||
Here, we provide the steps that are required to integrate QAT from NNCF into the training script written with PyTorch or TensorFlow 2:
|
||||
Introduction
|
||||
####################
|
||||
|
||||
> **NOTE**: Currently, NNCF for TensorFlow 2 supports optimization of the models created using Keras [Sequesntial API](https://www.tensorflow.org/guide/keras/sequential_model) or [Functional API](https://www.tensorflow.org/guide/keras/functional).
|
||||
Quantization-aware Training is a popular method that allows quantizing a model and applying fine-tuning to restore accuracy
|
||||
degradation caused by quantization. In fact, this is the most accurate quantization method. This document describes how to
|
||||
apply QAT from the Neural Network Compression Framework (NNCF) to get 8-bit quantized models. This assumes that you are
|
||||
knowledgeable in Python programming and familiar with the training code for the model in the source DL framework.
|
||||
|
||||
Using NNCF QAT
|
||||
####################
|
||||
|
||||
Here, we provide the steps that are required to integrate QAT from NNCF into the training script written with
|
||||
PyTorch or TensorFlow 2:
|
||||
|
||||
.. note::
|
||||
Currently, NNCF for TensorFlow 2 supports optimization of the models created using Keras
|
||||
`Sequential API <https://www.tensorflow.org/guide/keras/sequential_model>`__ or
|
||||
`Functional API <https://www.tensorflow.org/guide/keras/functional>`__.
|
||||
|
||||
1. Import NNCF API
|
||||
++++++++++++++++++++
|
||||
|
||||
### 1. Import NNCF API
|
||||
In this step, you add NNCF-related imports in the beginning of the training script:
|
||||
|
||||
@sphinxtabset
|
||||
.. tab:: PyTorch
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [imports]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py imports
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@endsphinxtab
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [imports]
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py imports
|
||||
2. Create NNCF configuration
|
||||
++++++++++++++++++++++++++++
|
||||
|
||||
@endsphinxtab
|
||||
Here, you should define NNCF configuration which consists of model-related parameters (``"input_info"`` section) and parameters
|
||||
of optimization methods (``"compression"`` section). For faster convergence, it is also recommended to register a dataset object
|
||||
specific to the DL framework. It will be used at the model creation step to initialize quantization parameters.
|
||||
|
||||
@endsphinxtabset
|
||||
.. tab:: PyTorch
|
||||
|
||||
### 2. Create NNCF configuration
|
||||
Here, you should define NNCF configuration which consists of model-related parameters (`"input_info"` section) and parameters of optimization methods (`"compression"` section). For faster convergence, it is also recommended to register a dataset object specific to the DL framework. It will be used at the model creation step to initialize quantization parameters.
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [nncf_congig]
|
||||
|
||||
@sphinxtabset
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [nncf_congig]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py nncf_congig
|
||||
|
||||
@endsphinxtab
|
||||
3. Apply optimization methods
|
||||
+++++++++++++++++++++++++++++
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
In the next step, you need to wrap the original model object with the ``create_compressed_model()`` API using the configuration
|
||||
defined in the previous step. This method returns a so-called compression controller and a wrapped model that can be used the
|
||||
same way as the original model. It is worth noting that optimization methods are applied at this step so that the model
|
||||
undergoes a set of corresponding transformations and can contain additional operations required for the optimization. In
|
||||
the case of QAT, the compression controller object is used for model export and, optionally, in distributed training as it
|
||||
will be shown below.
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py nncf_congig
|
||||
.. tab:: PyTorch
|
||||
|
||||
@endsphinxtab
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [wrap_model]
|
||||
|
||||
@endsphinxtabset
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
### 3. Apply optimization methods
|
||||
In the next step, you need to wrap the original model object with the `create_compressed_model()` API using the configuration defined in the previous step. This method returns a so-called compression controller and a wrapped model that can be used the same way as the original model. It is worth noting that optimization methods are applied at this step so that the model undergoes a set of corresponding transformations and can contain additional operations required for the optimization. In the case of QAT, the compression controller object is used for model export and, optionally, in distributed training as it will be shown below.
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [wrap_model]
|
||||
|
||||
@sphinxtabset
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
4. Fine-tune the model
|
||||
++++++++++++++++++++++
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py wrap_model
|
||||
This step assumes that you will apply fine-tuning to the model the same way as it is done for the baseline model. In the
|
||||
case of QAT, it is required to train the model for a few epochs with a small learning rate, for example, 10e-5. In principle,
|
||||
you can skip this step which means that the post-training optimization will be applied to the model.
|
||||
|
||||
@endsphinxtab
|
||||
.. tab:: PyTorch
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [tune_model]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py wrap_model
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@endsphinxtab
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [tune_model]
|
||||
|
||||
@endsphinxtabset
|
||||
|
||||
### 4. Fine-tune the model
|
||||
This step assumes that you will apply fine-tuning to the model the same way as it is done for the baseline model. In the case of QAT, it is required to train the model for a few epochs with a small learning rate, for example, 10e-5. In principle, you can skip this step which means that the post-training optimization will be applied to the model.
|
||||
5. Multi-GPU distributed training
|
||||
+++++++++++++++++++++++++++++++++
|
||||
|
||||
@sphinxtabset
|
||||
In the case of distributed multi-GPU training (not DataParallel), you should call ``compression_ctrl.distributed()`` before
|
||||
the fine-tuning that will inform optimization methods to do some adjustments to function in the distributed mode.
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. tab:: PyTorch
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py tune_model
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [distributed]
|
||||
|
||||
@endsphinxtab
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [distributed]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py tune_model
|
||||
6. Export quantized model
|
||||
+++++++++++++++++++++++++
|
||||
|
||||
@endsphinxtab
|
||||
When fine-tuning finishes, the quantized model can be exported to the corresponding format for further inference: ONNX in
|
||||
the case of PyTorch and frozen graph - for TensorFlow 2.
|
||||
|
||||
@endsphinxtabset
|
||||
.. tab:: PyTorch
|
||||
|
||||
### 5. Multi-GPU distributed training
|
||||
In the case of distributed multi-GPU training (not DataParallel), you should call `compression_ctrl.distributed()` before the fine-tuning that will inform optimization methods to do some adjustments to function in the distributed mode.
|
||||
@sphinxtabset
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [export]
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py distributed
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [export]
|
||||
|
||||
@endsphinxtab
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
.. note::
|
||||
The precision of weigths gets INT8 only after the step of model conversion to OpenVINO Intermediate Representation.
|
||||
You can expect the model footprint reduction only for that format.
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py distributed
|
||||
|
||||
@endsphinxtab
|
||||
These were the basic steps to applying the QAT method from the NNCF. However, it is required in some cases to save/load model
|
||||
checkpoints during the training. Since NNCF wraps the original model with its own object it provides an API for these needs.
|
||||
|
||||
@endsphinxtabset
|
||||
7. (Optional) Save checkpoint
|
||||
+++++++++++++++++++++++++++++
|
||||
|
||||
### 6. Export quantized model
|
||||
When fine-tuning finishes, the quantized model can be exported to the corresponding format for further inference: ONNX in the case of PyTorch and frozen graph - for TensorFlow 2.
|
||||
|
||||
@sphinxtabset
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py export
|
||||
|
||||
@endsphinxtab
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py export
|
||||
|
||||
@endsphinxtab
|
||||
|
||||
@endsphinxtabset
|
||||
|
||||
> **NOTE**: The precision of weigths gets INT8 only after the step of model conversion to OpenVINO Intermediate Representation. You can expect the model footprint reduction only for that format.
|
||||
These were the basic steps to applying the QAT method from the NNCF. However, it is required in some cases to save/load model checkpoints during the training. Since NNCF wraps the original model with its own object it provides an API for these needs.
|
||||
|
||||
### 7. (Optional) Save checkpoint
|
||||
To save model checkpoint use the following API:
|
||||
|
||||
@sphinxtabset
|
||||
.. tab:: PyTorch
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [save_checkpoint]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py save_checkpoint
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@endsphinxtab
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [save_checkpoint]
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py save_checkpoint
|
||||
8. (Optional) Restore from checkpoint
|
||||
+++++++++++++++++++++++++++++++++++++
|
||||
|
||||
@endsphinxtab
|
||||
|
||||
@endsphinxtabset
|
||||
|
||||
### 8. (Optional) Restore from checkpoint
|
||||
To restore the model from checkpoint you should use the following API:
|
||||
|
||||
@sphinxtabset
|
||||
.. tab:: PyTorch
|
||||
|
||||
@sphinxtab{PyTorch}
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_torch.py
|
||||
:language: python
|
||||
:fragment: [load_checkpoint]
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_torch.py load_checkpoint
|
||||
.. tab:: TensorFlow 2
|
||||
|
||||
@endsphinxtab
|
||||
.. doxygensnippet:: docs/optimization_guide/nncf/code/qat_tf.py
|
||||
:language: python
|
||||
:fragment: [load_checkpoint]
|
||||
|
||||
@sphinxtab{TensorFlow 2}
|
||||
|
||||
@snippet docs/optimization_guide/nncf/code/qat_tf.py load_checkpoint
|
||||
For more details on saving/loading checkpoints in the NNCF, see the following `documentation <https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md#saving-and-loading-compressed-models>`__.
|
||||
|
||||
@endsphinxtab
|
||||
Deploying quantized model
|
||||
#########################
|
||||
|
||||
@endsphinxtabset
|
||||
The quantized model can be deployed with OpenVINO in the same way as the baseline model. No extra steps or options are
|
||||
required in this case. For more details, see the corresponding :doc:`documentation <openvino_docs_OV_UG_OV_Runtime_User_Guide>`.
|
||||
|
||||
For more details on saving/loading checkpoints in the NNCF, see the following [documentation](https://github.com/openvinotoolkit/nncf/blob/develop/docs/Usage.md#saving-and-loading-compressed-models).
|
||||
Examples
|
||||
####################
|
||||
|
||||
## Deploying quantized model
|
||||
The quantized model can be deployed with OpenVINO in the same way as the baseline model. No extra steps or options are required in this case. For more details, see the corresponding [documentation](../../OV_Runtime_UG/openvino_intro.md).
|
||||
* `Quantizing PyTorch model with NNCF <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/302-pytorch-quantization-aware-training>`__
|
||||
|
||||
## Examples
|
||||
- [Quantizing PyTorch model with NNCF](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/302-pytorch-quantization-aware-training)
|
||||
- [Quantizing TensorFlow model with NNCF](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/305-tensorflow-quantization-aware-training)
|
||||
* `Quantizing TensorFlow model with NNCF <https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/305-tensorflow-quantization-aware-training>`__
|
||||
|
||||
@endsphinxdirective
|
||||
|
Loading…
Reference in New Issue
Block a user