Files
openvino/docs/MO_DG/prepare_model/FP16_Compression.md
Pavel Esir 15973fd2da enable --compress_to_fp16=True by default in MO (#15488)
* enable --compress_to_fp16 by default in MO

* corrected docs, added warning if user did't specify --compress_to_fp16 explicitly

* fix failing MO unit-tests

* do not wipe out data_type if user defined it explicitly by cli argument

* updated warning message and docs

* corrected phrasing

* corrected phrasing in FP16_Compression.md

* set compress_to_fp16=False for convert tests

* leftover: set compress_to_fp16=False for convert tests

* minor correction

* print info message in main.py, som minor changes

* typos fix

* fix losing information whether arguments set by user or got from defaults

* returned back default values instead of None

* more selective correcting of test_mo_convert_pytorch.py; added test for cases when compression is enabled/disabled or left by default

* fix test_mo_convert_pytorch.py
2023-02-21 13:07:43 +01:00

1005 B

Compressing a Model to FP16

Model Optimizer by default converts all floating-point weights to FP16 data type. The resulting IR is called compressed FP16 model. The resulting model will occupy about twice as less space in the file system, but it may have some accuracy drop. For most models, the accuracy drop is negligible. But in case if accuracy drop is significant user can disable compression explicitly.

By default, models are compressed to FP16, but you can disable compression by specifying --compress_to_fp16=False:

mo --input_model INPUT_MODEL --compress_to_fp16=False

For details on how plugins handle compressed FP16 models, see Working with devices.

Note

: FP16 compression is sometimes used as the initial step for INT8 quantization. Refer to the Post-training optimization guide for more information about that.