diff --git a/docs/optimization_guide/nncf/weight_compression.md b/docs/optimization_guide/nncf/weight_compression.md index efec4839d47..fb29a6d49b7 100644 --- a/docs/optimization_guide/nncf/weight_compression.md +++ b/docs/optimization_guide/nncf/weight_compression.md @@ -10,7 +10,7 @@ Weight compression aims to reduce the memory footprint of a model. It can also l - enabling the inference of exceptionally large models that cannot be accommodated in the memory of the device; - improving the inference performance of the models by reducing the latency of the memory access when computing the operations with weights, for example, Linear layers. -Currently, NNCF provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use. +Currently, `Neural Network Compression Framework (NNCF) `__ provides 8-bit weight quantization as a compression method primarily designed to optimize LLMs. The main difference between weights compression and full model quantization (post-training quantization) is that activations remain floating-point in the case of weights compression which leads to a better accuracy. Weight compression for LLMs provides a solid inference performance improvement which is on par with the performance of the full model quantization. In addition, weight compression is data-free and does not require a calibration dataset, making it easy to use. Compress Model Weights ###################### @@ -33,5 +33,6 @@ Additional Resources - :doc:`Post-training Quantization ` - :doc:`Training-time Optimization ` +- `NNCF GitHub `__ @endsphinxdirective