Updated model compression README (#19967)

This commit is contained in:
Alexander Kozlov
2023-09-20 13:04:20 +04:00
committed by GitHub
parent b2217fdafd
commit 59338fa758

View File

@@ -10,7 +10,7 @@ Weight compression aims to reduce the memory footprint of a model. It can also l
- enabling the inference of exceptionally large models that cannot be accommodated in the memory of the device;
- improving the inference performance of the models by reducing the latency of the memory access when computing the operations with weights, for example, Linear layers.
Currently, NNCF provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use.
Currently, `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__ provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use.
Compress Model Weights
######################
@@ -33,5 +33,6 @@ Additional Resources
- :doc:`Post-training Quantization <ptq_introduction>`
- :doc:`Training-time Optimization <tmo_introduction>`
- `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
@endsphinxdirective