Updated model compression README (#19967)
This commit is contained in:
@@ -10,7 +10,7 @@ Weight compression aims to reduce the memory footprint of a model. It can also l
|
||||
- enabling the inference of exceptionally large models that cannot be accommodated in the memory of the device;
|
||||
- improving the inference performance of the models by reducing the latency of the memory access when computing the operations with weights, for example, Linear layers.
|
||||
|
||||
Currently, NNCF provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use.
|
||||
Currently, `Neural Network Compression Framework (NNCF) <https://github.com/openvinotoolkit/nncf>`__ provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use.
|
||||
|
||||
Compress Model Weights
|
||||
######################
|
||||
@@ -33,5 +33,6 @@ Additional Resources
|
||||
|
||||
- :doc:`Post-training Quantization <ptq_introduction>`
|
||||
- :doc:`Training-time Optimization <tmo_introduction>`
|
||||
- `NNCF GitHub <https://github.com/openvinotoolkit/nncf>`__
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
Reference in New Issue
Block a user