* enable --compress_to_fp16 by default in MO * corrected docs, added warning if user did't specify --compress_to_fp16 explicitly * fix failing MO unit-tests * do not wipe out data_type if user defined it explicitly by cli argument * updated warning message and docs * corrected phrasing * corrected phrasing in FP16_Compression.md * set compress_to_fp16=False for convert tests * leftover: set compress_to_fp16=False for convert tests * minor correction * print info message in main.py, som minor changes * typos fix * fix losing information whether arguments set by user or got from defaults * returned back default values instead of None * more selective correcting of test_mo_convert_pytorch.py; added test for cases when compression is enabled/disabled or left by default * fix test_mo_convert_pytorch.py
1005 B
1005 B
Compressing a Model to FP16
Model Optimizer by default converts all floating-point weights to FP16 data type. The resulting IR is called
compressed FP16 model. The resulting model will occupy about twice as less space in the file system,
but it may have some accuracy drop. For most models, the accuracy drop is negligible.
But in case if accuracy drop is significant user can disable compression explicitly.
By default, models are compressed to FP16, but you can disable compression by specifying --compress_to_fp16=False:
mo --input_model INPUT_MODEL --compress_to_fp16=False
For details on how plugins handle compressed FP16 models, see Working with devices.
Note
:
FP16compression is sometimes used as the initial step forINT8quantization. Refer to the Post-training optimization guide for more information about that.