[DOC][CPU] Documentation update (#17784)

This commit is contained in:
Anton Voronov 2023-05-31 10:37:32 +04:00 committed by GitHub
parent 047d2d1d7f
commit 263e51a1be
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 67 additions and 11 deletions

View File

@ -26,6 +26,20 @@ Execution Mode
If the model has been quantized using :doc:`OpenVINO optimization tools <ptq_introduction>` or any other method, the quantized operators will be executed with the target integer precision if the device has hardware acceleration for that type. For example, quantized ``int8`` primitives are executed with ``int8`` precision for both **ACCURACY** and **PERFORMANCE modes** if the device provides higher compute bandwidth for 8-bit data types compared to any available floating-point type. On the other hand, devices without hardware acceleration for the ``int8`` data type can keep such operators in floating point precision, and the exact floating point type will be affected by ``execution_mode`` and ``inference_precision`` properties.
Code examples:
.. tab:: C++
.. doxygensnippet:: docs/snippets/cpu/ov_execution_mode.cpp
:language: cpp
:fragment: [ov:execution_mode:part0]
.. tab:: Python
.. doxygensnippet:: docs/snippets/cpu/ov_execution_mode.py
:language: python
:fragment: [ov:execution_mode:part0]
Inference Precision
###################

View File

@ -87,13 +87,15 @@ CPU plugin supports the following floating-point data types as inference precisi
The default floating-point precision of a CPU primitive is ``f32``. To support the ``f16`` OpenVINO IR the plugin internally converts
all the ``f16`` values to ``f32`` and all the calculations are performed using the native precision of ``f32``.
On platforms that natively support ``bfloat16`` calculations (have the ``AVX512_BF16`` extension), the ``bf16`` type is automatically used instead
of ``f32`` to achieve better performance. Thus, no special steps are required to run a ``bf16`` model. For more details about the ``bfloat16`` format, see
On platforms that natively support ``bfloat16`` calculations (have the ``AVX512_BF16`` or ``AMX`` extension), the ``bf16`` type is automatically used instead
of ``f32`` to achieve better performance (see the `Execution Mode Hint <#execution-mode-hint>`__).
Thus, no special steps are required to run a ``bf16`` model. For more details about the ``bfloat16`` format, see
the `BFLOAT16 Hardware Numerics Definition white paper <https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf>`__.
Using the ``bf16`` precision provides the following performance benefits:
- Faster multiplication of two ``bfloat16`` numbers because of shorter mantissa of the ``bfloat16`` data.
- ``bfloat16`` data type allows using Intel® Advanced Matrix Extension (AMX), which provides dramatically faster computations on corresponding hardware in
comparison with AVX512 or AVX2 instructions in many DL operation implementations.
- Reduced memory consumption since ``bfloat16`` data half the size of 32-bit float.
To check if the CPU device can support the ``bfloat16`` data type, use the :doc:`query device properties interface <openvino_docs_OV_UG_query_api>`
@ -117,6 +119,9 @@ to query ``ov::device::capabilities`` property, which should contain ``BF16`` in
:fragment: [part0]
Inference Precision Hint
-----------------------------------------------------------
If the model has been converted to ``bf16``, the ``ov::hint::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
the ``ov::CompiledModel::get_property`` call. The code below demonstrates how to get the element type:
@ -156,7 +161,18 @@ To enable the simulation, the ``ov::hint::inference_precision`` has to be explic
Due to the reduced mantissa size of the ``bfloat16`` data type, the resulting ``bf16`` inference accuracy may differ from the ``f32`` inference,
especially for models that were not trained using the ``bfloat16`` data type. If the ``bf16`` inference accuracy is not acceptable,
it is recommended to switch to the ``f32`` precision.
it is recommended to switch to the ``f32`` precision. Also, the performance/accuracy balance can be managed using the ``ov::hint::execution_mode`` hint,
see the `Execution Mode Hint <#execution-mode-hint>`__.
Execution Mode Hint
-----------------------------------------------------------
In case ``ov::hint::inference_precision`` is not explicitly set, one can use ``ov::hint::execution_mode`` hint to direct the run-time optimizations toward either better accuracy or better performance.
If ``ov::hint::execution_mode`` is set to ``ov::hint::ExecutionMode::PERFORMANCE`` (default behavior) and the platform natively supports ``bfloat16``
calculations (has the ``AVX512_BF16`` or ``AMX`` extension) then ``bf16`` type is automatically used instead of ``f32`` to achieve better performance.
If the accuracy in this mode is not good enough, then set ``ov::hint::execution_mode`` to ``ov::hint::ExecutionMode::ACCURACY`` to enforce the plugin to
use the ``f32`` precision in floating point calculations.
For more details and code examples, see the :doc:`Precision Control <openvino_docs_OV_UG_Precision_Control>`.
Supported Features
###########################################################
@ -285,11 +301,6 @@ That means that :doc:`OpenVINO™ Extensibility Mechanism <openvino_docs_Extensi
Enabling fallback on a custom operation implementation is possible by overriding the ``ov::Op::evaluate`` method in the derived operation
class (see :doc:`custom OpenVINO™ operations <openvino_docs_Extensibility_UG_add_openvino_ops>` for details).
.. note::
At the moment, custom operations with internal dynamism (when the output tensor shape can only be determined
as a result of performing the operation) are not supported by the plugin.
Stateful Models
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@ -310,6 +321,7 @@ All parameters must be set before calling ``ov::Core::compile_model()`` in order
- ``ov::enable_profiling``
- ``ov::hint::inference_precision``
- ``ov::hint::performance_mode``
- ``ov::hint::execution_mode``
- ``ov::hint::num_request``
- ``ov::num_streams``
- ``ov::affinity``

View File

@ -19,8 +19,9 @@ The OpenVINO Runtime provides unique capabilities to infer deep learning models
|| :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>` | Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics |
+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
|| :doc:`CPU (x86) <openvino_docs_OV_UG_supported_plugins_CPU>` | Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector |
|| | Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® |
|| | AVX2, Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
|| | Extensions 512 (Intel® AVX-512), Intel® Advanced Matrix Extensions (Intel® AMX), |
|| | Intel® Core™ Processors with Intel® AVX2, |
|| | Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
+--------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+
|| :doc:`CPU (Arm®) <openvino_docs_OV_UG_supported_plugins_CPU>` | Raspberry Pi™ 4 Model B, Apple® Mac with Apple silicon |
|| | |

View File

@ -0,0 +1,17 @@
// Copyright (C) 2023 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <openvino/runtime/core.hpp>
int main() {
//! [ov:execution_mode:part0]
ov::Core core;
// in case of Accuracy
core.set_property("CPU", ov::hint::execution_mode(ov::hint::ExecutionMode::ACCURACY));
// in case of Performance
core.set_property("CPU", ov::hint::execution_mode(ov::hint::ExecutionMode::PERFORMANCE));
//! [ov:execution_mode:part0]
return 0;
}

View File

@ -0,0 +1,12 @@
# Copyright (C) 2023 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
from openvino.runtime import Core
#! [ov:execution_mode:part0]
core = Core()
# in case of Accuracy
core.set_property("CPU", {"EXECUTION_MODE_HINT": "ACCURACY"})
# in case of Performance
core.set_property("CPU", {"EXECUTION_MODE_HINT": "PERFORMANCE"})
#! [ov:execution_mode:part0]