[DOC][CPU] Documentation update (#17784)
This commit is contained in:
parent
047d2d1d7f
commit
263e51a1be
@ -26,6 +26,20 @@ Execution Mode
|
||||
|
||||
If the model has been quantized using :doc:`OpenVINO optimization tools <ptq_introduction>` or any other method, the quantized operators will be executed with the target integer precision if the device has hardware acceleration for that type. For example, quantized ``int8`` primitives are executed with ``int8`` precision for both **ACCURACY** and **PERFORMANCE modes** if the device provides higher compute bandwidth for 8-bit data types compared to any available floating-point type. On the other hand, devices without hardware acceleration for the ``int8`` data type can keep such operators in floating point precision, and the exact floating point type will be affected by ``execution_mode`` and ``inference_precision`` properties.
|
||||
|
||||
Code examples:
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
.. doxygensnippet:: docs/snippets/cpu/ov_execution_mode.cpp
|
||||
:language: cpp
|
||||
:fragment: [ov:execution_mode:part0]
|
||||
|
||||
.. tab:: Python
|
||||
|
||||
.. doxygensnippet:: docs/snippets/cpu/ov_execution_mode.py
|
||||
:language: python
|
||||
:fragment: [ov:execution_mode:part0]
|
||||
|
||||
Inference Precision
|
||||
###################
|
||||
|
||||
|
@ -87,13 +87,15 @@ CPU plugin supports the following floating-point data types as inference precisi
|
||||
|
||||
The default floating-point precision of a CPU primitive is ``f32``. To support the ``f16`` OpenVINO IR the plugin internally converts
|
||||
all the ``f16`` values to ``f32`` and all the calculations are performed using the native precision of ``f32``.
|
||||
On platforms that natively support ``bfloat16`` calculations (have the ``AVX512_BF16`` extension), the ``bf16`` type is automatically used instead
|
||||
of ``f32`` to achieve better performance. Thus, no special steps are required to run a ``bf16`` model. For more details about the ``bfloat16`` format, see
|
||||
On platforms that natively support ``bfloat16`` calculations (have the ``AVX512_BF16`` or ``AMX`` extension), the ``bf16`` type is automatically used instead
|
||||
of ``f32`` to achieve better performance (see the `Execution Mode Hint <#execution-mode-hint>`__).
|
||||
Thus, no special steps are required to run a ``bf16`` model. For more details about the ``bfloat16`` format, see
|
||||
the `BFLOAT16 – Hardware Numerics Definition white paper <https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf>`__.
|
||||
|
||||
Using the ``bf16`` precision provides the following performance benefits:
|
||||
|
||||
- Faster multiplication of two ``bfloat16`` numbers because of shorter mantissa of the ``bfloat16`` data.
|
||||
- ``bfloat16`` data type allows using Intel® Advanced Matrix Extension (AMX), which provides dramatically faster computations on corresponding hardware in
|
||||
comparison with AVX512 or AVX2 instructions in many DL operation implementations.
|
||||
- Reduced memory consumption since ``bfloat16`` data half the size of 32-bit float.
|
||||
|
||||
To check if the CPU device can support the ``bfloat16`` data type, use the :doc:`query device properties interface <openvino_docs_OV_UG_query_api>`
|
||||
@ -117,6 +119,9 @@ to query ``ov::device::capabilities`` property, which should contain ``BF16`` in
|
||||
:fragment: [part0]
|
||||
|
||||
|
||||
Inference Precision Hint
|
||||
-----------------------------------------------------------
|
||||
|
||||
If the model has been converted to ``bf16``, the ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
|
||||
the ``ov::CompiledModel::get_property`` call. The code below demonstrates how to get the element type:
|
||||
|
||||
@ -156,7 +161,18 @@ To enable the simulation, the ``ov::hint::inference_precision`` has to be explic
|
||||
|
||||
Due to the reduced mantissa size of the ``bfloat16`` data type, the resulting ``bf16`` inference accuracy may differ from the ``f32`` inference,
|
||||
especially for models that were not trained using the ``bfloat16`` data type. If the ``bf16`` inference accuracy is not acceptable,
|
||||
it is recommended to switch to the ``f32`` precision.
|
||||
it is recommended to switch to the ``f32`` precision. Also, the performance/accuracy balance can be managed using the ``ov::hint::execution_mode`` hint,
|
||||
see the `Execution Mode Hint <#execution-mode-hint>`__.
|
||||
|
||||
Execution Mode Hint
|
||||
-----------------------------------------------------------
|
||||
In case ``ov::hint::inference_precision`` is not explicitly set, one can use ``ov::hint::execution_mode`` hint to direct the run-time optimizations toward either better accuracy or better performance.
|
||||
If ``ov::hint::execution_mode`` is set to ``ov::hint::ExecutionMode::PERFORMANCE`` (default behavior) and the platform natively supports ``bfloat16``
|
||||
calculations (has the ``AVX512_BF16`` or ``AMX`` extension) then ``bf16`` type is automatically used instead of ``f32`` to achieve better performance.
|
||||
If the accuracy in this mode is not good enough, then set ``ov::hint::execution_mode`` to ``ov::hint::ExecutionMode::ACCURACY`` to enforce the plugin to
|
||||
use the ``f32`` precision in floating point calculations.
|
||||
|
||||
For more details and code examples, see the :doc:`Precision Control <openvino_docs_OV_UG_Precision_Control>`.
|
||||
|
||||
Supported Features
|
||||
###########################################################
|
||||
@ -285,11 +301,6 @@ That means that :doc:`OpenVINO™ Extensibility Mechanism <openvino_docs_Extensi
|
||||
Enabling fallback on a custom operation implementation is possible by overriding the ``ov::Op::evaluate`` method in the derived operation
|
||||
class (see :doc:`custom OpenVINO™ operations <openvino_docs_Extensibility_UG_add_openvino_ops>` for details).
|
||||
|
||||
.. note::
|
||||
|
||||
At the moment, custom operations with internal dynamism (when the output tensor shape can only be determined
|
||||
as a result of performing the operation) are not supported by the plugin.
|
||||
|
||||
Stateful Models
|
||||
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
|
||||
|
||||
@ -310,6 +321,7 @@ All parameters must be set before calling ``ov::Core::compile_model()`` in order
|
||||
- ``ov::enable_profiling``
|
||||
- ``ov::hint::inference_precision``
|
||||
- ``ov::hint::performance_mode``
|
||||
- ``ov::hint::execution_mode``
|
||||
- ``ov::hint::num_request``
|
||||
- ``ov::num_streams``
|
||||
- ``ov::affinity``
|
||||
|
@ -19,8 +19,9 @@ The OpenVINO Runtime provides unique capabilities to infer deep learning models
|
||||
|| :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>` | Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics |
|
||||
+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
|
||||
|| :doc:`CPU (x86) <openvino_docs_OV_UG_supported_plugins_CPU>` | Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector |
|
||||
|| | Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® |
|
||||
|| | AVX2, Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
|
||||
|| | Extensions 512 (Intel® AVX-512), Intel® Advanced Matrix Extensions (Intel® AMX), |
|
||||
|| | Intel® Core™ Processors with Intel® AVX2, |
|
||||
|| | Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
|
||||
+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
|
||||
|| :doc:`CPU (Arm®) <openvino_docs_OV_UG_supported_plugins_CPU>` | Raspberry Pi™ 4 Model B, Apple® Mac with Apple silicon |
|
||||
|| | |
|
||||
|
17
docs/snippets/cpu/ov_execution_mode.cpp
Normal file
17
docs/snippets/cpu/ov_execution_mode.cpp
Normal file
@ -0,0 +1,17 @@
|
||||
// Copyright (C) 2023 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include <openvino/runtime/core.hpp>
|
||||
|
||||
int main() {
|
||||
//! [ov:execution_mode:part0]
|
||||
ov::Core core;
|
||||
// in case of Accuracy
|
||||
core.set_property("CPU", ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
|
||||
// in case of Performance
|
||||
core.set_property("CPU", ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
|
||||
//! [ov:execution_mode:part0]
|
||||
|
||||
return 0;
|
||||
}
|
12
docs/snippets/cpu/ov_execution_mode.py
Normal file
12
docs/snippets/cpu/ov_execution_mode.py
Normal file
@ -0,0 +1,12 @@
|
||||
# Copyright (C) 2023 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
from openvino.runtime import Core
|
||||
|
||||
#! [ov:execution_mode:part0]
|
||||
core = Core()
|
||||
# in case of Accuracy
|
||||
core.set_property("CPU", {"EXECUTION_MODE_HINT": "ACCURACY"})
|
||||
# in case of Performance
|
||||
core.set_property("CPU", {"EXECUTION_MODE_HINT": "PERFORMANCE"})
|
||||
#! [ov:execution_mode:part0]
|
Loading…
Reference in New Issue
Block a user