DOCS shift to rst - cpu n gna (#16252)

This commit is contained in:
Karol Blaszczak
2023-03-15 09:39:09 +01:00
committed by GitHub
parent 36c18e29a8
commit d774cc65a9
3 changed files with 790 additions and 496 deletions

View File

@@ -1,323 +1,460 @@
# CPU Device {#openvino_docs_OV_UG_supported_plugins_CPU}
@sphinxdirective
The CPU plugin is a part of the Intel® Distribution of OpenVINO™ toolkit. It is developed to achieve high performance inference of neural networks on Intel® x86-64 CPUs.
For an in-depth description of CPU plugin, see:
- [CPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/cmake_options_for_custom_comiplation.md).
- [OpenVINO Runtime CPU plugin source files](https://github.com/openvinotoolkit/openvino/tree/master/src/plugins/intel_cpu/).
- `CPU plugin developers documentation <https://github.com/openvinotoolkit/openvino/blob/master/docs/dev/cmake_options_for_custom_comiplation.md>`__.
- `OpenVINO Runtime CPU plugin source files <https://github.com/openvinotoolkit/openvino/tree/master/src/plugins/intel_cpu/>`__.
Device Name
###########################################################
## Device Name
The `CPU` device name is used for the CPU plugin. Even though there can be more than one physical socket on a platform, only one device of this kind is listed by OpenVINO.
The ``CPU`` device name is used for the CPU plugin. Even though there can be more than one physical socket on a platform, only one device of this kind is listed by OpenVINO.
On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically.
In order to use CPU for inference, the device name should be passed to the `ov::Core::compile_model()` method:
In order to use CPU for inference, the device name should be passed to the ``ov::Core::compile_model()`` method:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/compile_model.cpp compile_model_default
@endsphinxtab
.. tab-set::
@sphinxtab{Python}
@snippet docs/snippets/cpu/compile_model.py compile_model_default
@endsphinxtab
.. tab-item:: C++
:sync: cpp
@endsphinxtabset
.. doxygensnippet:: docs/snippets/cpu/compile_model.cpp
:language: cpp
:fragment: [compile_model_default]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/cpu/compile_model.py
:language: py
:fragment: [compile_model_default]
Supported Inference Data Types
###########################################################
## Supported Inference Data Types
CPU plugin supports the following data types as inference precision of internal primitives:
- Floating-point data types:
- f32
- bf16
- Integer data types:
- i32
- Quantized data types:
- u8
- i8
- u1
| - Floating-point data types:
| - f32
| - bf16
| - Integer data types:
| - i32
| - Quantized data types:
| - u8
| - i8
| - u1
[Hello Query Device C++ Sample](../../../samples/cpp/hello_query_device/README.md) can be used to print out supported data types for all detected devices.
:doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` can be used to print out supported data types for all detected devices.
### Quantized Data Types Specifics
Quantized Data Types Specifics
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
The `u1/u8/i8` data types are used for quantized operations only, i.e., those are not selected automatically for non-quantized operations.
The ``u1/u8/i8`` data types are used for quantized operations only, i.e., those are not selected automatically for non-quantized operations.
See the [low-precision optimization guide](@ref openvino_docs_model_optimization_guide) for more details on how to get a quantized model.
For more details on how to get a quantized model see the :doc:`low-precision optimization guide <openvino_docs_model_optimization_guide>`.
> **NOTE**: Platforms that do not support Intel® AVX512-VNNI have a known "saturation issue" that may lead to reduced computational accuracy for `u8/i8` precision calculations.
> See the [saturation (overflow) issue section](@ref pot_saturation_issue) to get more information on how to detect such issues and possible workarounds.
.. note::
Platforms that do not support Intel® AVX512-VNNI have a known "saturation issue" that may lead to reduced computational accuracy for ``u8/i8`` precision calculations.
To get more information on how to detect such issues and possible workarounds, see the :doc:`saturation (overflow) issue section <pot_saturation_issue>`.
### Floating Point Data Types Specifics
The default floating-point precision of a CPU primitive is `f32`. To support the `f16` OpenVINO IR the plugin internally converts all the `f16` values to `f32` and all the calculations are performed using the native precision of `f32`.
On platforms that natively support `bfloat16` calculations (have the `AVX512_BF16` extension), the `bf16` type is automatically used instead of `f32` to achieve better performance. Thus, no special steps are required to run a `bf16` model.
For more details about the `bfloat16` format, see the [BFLOAT16 Hardware Numerics Definition white paper](https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf).
Floating Point Data Types Specifics
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Using the `bf16` precision provides the following performance benefits:
The default floating-point precision of a CPU primitive is ``f32``. To support the ``f16`` OpenVINO IR the plugin internally converts
all the ``f16`` values to ``f32`` and all the calculations are performed using the native precision of ``f32``.
On platforms that natively support ``bfloat16`` calculations (have the ``AVX512_BF16`` extension), the ``bf16`` type is automatically used instead
of ``f32`` to achieve better performance. Thus, no special steps are required to run a ``bf16`` model. For more details about the ``bfloat16`` format, see
the `BFLOAT16 Hardware Numerics Definition white paper <https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf>`__.
- Faster multiplication of two `bfloat16` numbers because of shorter mantissa of the `bfloat16` data.
- Reduced memory consumption since `bfloat16` data half the size of 32-bit float.
Using the ``bf16`` precision provides the following performance benefits:
To check if the CPU device can support the `bfloat16` data type, use the [query device properties interface](./config_properties.md) to query `ov::device::capabilities` property, which should contain `BF16` in the list of CPU capabilities:
- Faster multiplication of two ``bfloat16`` numbers because of shorter mantissa of the ``bfloat16`` data.
- Reduced memory consumption since ``bfloat16`` data half the size of 32-bit float.
@sphinxtabset
To check if the CPU device can support the ``bfloat16`` data type, use the :doc:`query device properties interface <openvino_docs_OV_UG_query_api>`
to query ``ov::device::capabilities`` property, which should contain ``BF16`` in the list of CPU capabilities:
@sphinxtab{C++}
@snippet docs/snippets/cpu/Bfloat16Inference0.cpp part0
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/Bfloat16Inference.py part0
@endsphinxtab
.. tab-set::
@endsphinxtabset
.. tab-item:: C++
:sync: cpp
If the model has been converted to `bf16`, the `ov::inference_precision` is set to `ov::element::bf16` and can be checked via the `ov::CompiledModel::get_property` call. The code below demonstrates how to get the element type:
.. doxygensnippet:: docs/snippets/cpu/Bfloat16Inference0.cpp
:language: cpp
:fragment: [part0]
@snippet snippets/cpu/Bfloat16Inference1.cpp part1
.. tab-item:: Python
:sync: py
To infer the model in `f32` precision instead of `bf16` on targets with native `bf16` support, set the `ov::inference_precision` to `ov::element::f32`.
.. doxygensnippet:: docs/snippets/cpu/Bfloat16Inference.py
:language: py
:fragment: [part0]
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/cpu/Bfloat16Inference2.cpp part2
@endsphinxtab
If the model has been converted to ``bf16``, the ``ov::inference_precision`` is set to ``ov::element::bf16`` and can be checked via
the ``ov::CompiledModel::get_property`` call. The code below demonstrates how to get the element type:
@sphinxtab{Python}
@snippet docs/snippets/cpu/Bfloat16Inference.py part2
@endsphinxtab
.. doxygensnippet:: snippets/cpu/Bfloat16Inference1.cpp
:language: py
:fragment: [part1]
@endsphinxtabset
To infer the model in ``f32`` precision instead of ``bf16`` on targets with native ``bf16`` support, set the ``ov::inference_precision`` to ``ov::element::f32``.
The `Bfloat16` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, the `ov::inference_precision` has to be explicitly set to `ov::element::bf16`.
> **NOTE**: If ov::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
.. tab-set::
> **NOTE**: Due to the reduced mantissa size of the `bfloat16` data type, the resulting `bf16` inference accuracy may differ from the `f32` inference, especially for models that were not trained using the `bfloat16` data type. If the `bf16` inference accuracy is not acceptable, it is recommended to switch to the `f32` precision.
.. tab-item:: C++
:sync: cpp
## Supported Features
.. doxygensnippet:: docs/snippets/cpu/Bfloat16Inference2.cpp
:language: cpp
:fragment: [part2]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/cpu/Bfloat16Inference.py
:language: py
:fragment: [part2]
The ``Bfloat16`` software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the
native ``avx512_bf16`` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, the ``ov::inference_precision`` has to be explicitly set to ``ov::element::bf16``.
.. note::
If ``ov::inference_precision`` is set to ``ov::element::bf16`` on a CPU without native bfloat16 support or bfloat16 simulation mode, an exception is thrown.
.. note::
Due to the reduced mantissa size of the ``bfloat16`` data type, the resulting ``bf16`` inference accuracy may differ from the ``f32`` inference,
especially for models that were not trained using the ``bfloat16`` data type. If the ``bf16`` inference accuracy is not acceptable,
it is recommended to switch to the ``f32`` precision.
Supported Features
###########################################################
Multi-device Execution
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### Multi-device Execution
If a system includes OpenVINO-supported devices other than the CPU (e.g. an integrated GPU), then any supported model can be executed on all the devices simultaneously.
This can be achieved by specifying `MULTI:CPU,GPU.0` as a target device in case of simultaneous usage of CPU and GPU.
This can be achieved by specifying ``MULTI:CPU,GPU.0`` as a target device in case of simultaneous usage of CPU and GPU.
@sphinxtabset
.. tab-set::
@sphinxtab{C++}
@snippet docs/snippets/cpu/compile_model.cpp compile_model_multi
@endsphinxtab
.. tab-item:: C++
:sync: cpp
@sphinxtab{Python}
@snippet docs/snippets/cpu/compile_model.py compile_model_multi
@endsphinxtab
.. doxygensnippet:: docs/snippets/cpu/compile_model.cpp
:language: cpp
:fragment: [compile_model_multi]
@endsphinxtabset
.. tab-item:: Python
:sync: py
For more details, see the [Multi-device execution](../multi_device.md) article.
.. doxygensnippet:: docs/snippets/cpu/compile_model.py
:language: py
:fragment: [compile_model_multi]
### Multi-stream Execution
If either `ov::num_streams(n_streams)` with `n_streams > 1` or `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` property is set for CPU plugin,
then multiple streams are created for the model. In case of CPU plugin, each stream has its own host thread, which means that incoming infer requests can be processed simultaneously.
Each stream is pinned to its own group of physical cores with respect to NUMA nodes physical memory usage to minimize overhead on data transfer between NUMA nodes.
For more details, see the [optimization guide](@ref openvino_docs_deployment_optimization_guide_dldt_optimization_guide).
For more details, see the :doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` article.
> **NOTE**: When it comes to latency, be aware that running only one stream on multi-socket platform may introduce additional overheads on data transfer between NUMA nodes.
> In that case it is better to use the `ov::hint::PerformanceMode::LATENCY` performance hint. For more details see the [performance hints](@ref openvino_docs_OV_UG_Performance_Hints) overview.
Multi-stream Execution
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
If either ``ov::num_streams(n_streams)`` with ``n_streams > 1`` or ``ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)``
property is set for CPU plugin, then multiple streams are created for the model. In case of CPU plugin, each stream has its own
host thread, which means that incoming infer requests can be processed simultaneously. Each stream is pinned to its own group of
physical cores with respect to NUMA nodes physical memory usage to minimize overhead on data transfer between NUMA nodes.
For more details, see the :doc:`optimization guide <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`.
.. note::
When it comes to latency, be aware that running only one stream on multi-socket platform may introduce additional overheads
on data transfer between NUMA nodes. In that case it is better to use the ``ov::hint::PerformanceMode::LATENCY`` performance hint.
For more details see the :doc:`performance hints <openvino_docs_OV_UG_Performance_Hints>` overview.
Dynamic Shapes
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### Dynamic Shapes
CPU provides full functional support for models with dynamic shapes in terms of the opset coverage.
> **NOTE**: The CPU plugin does not support tensors with dynamically changing rank. In case of an attempt to infer a model with such tensors, an exception will be thrown.
.. note::
Some runtime optimizations work better if the model shapes are known in advance.
Therefore, if the input data shape is not changed between inference calls, it is recommended to use a model with static shapes or reshape the existing model with the static input shape to get the best performance.
The CPU plugin does not support tensors with dynamically changing rank. In case of an attempt to infer a model with such tensors, an exception will be thrown.
@sphinxtabset
Some runtime optimizations work better if the model shapes are known in advance. Therefore, if the input data shape is
not changed between inference calls, it is recommended to use a model with static shapes or reshape the existing model
with the static input shape to get the best performance.
@sphinxtab{C++}
@snippet docs/snippets/cpu/dynamic_shape.cpp static_shape
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/cpu/dynamic_shape.py static_shape
@endsphinxtab
.. tab-set::
@endsphinxtabset
.. tab-item:: C++
:sync: cpp
For more details, see the [dynamic shapes guide](../ov_dynamic_shapes.md).
.. doxygensnippet:: docs/snippets/cpu/dynamic_shape.cpp
:language: cpp
:fragment: [static_shape]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/cpu/dynamic_shape.py
:language: py
:fragment: [static_shape]
For more details, see the :doc:`dynamic shapes guide <openvino_docs_OV_UG_DynamicShapes>`.
Preprocessing Acceleration
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### Preprocessing Acceleration
CPU plugin supports a full set of the preprocessing operations, providing high performance implementations for them.
For more details, see :doc:`preprocessing API guide <openvino_docs_OV_UG_Preprocessing_Overview>`.
For more details, see [preprocessing API guide](../preprocessing_overview.md).
@sphinxdirective
.. dropdown:: The CPU plugin support for handling tensor precision conversion is limited to the following ov::element types:
* bf16
* f16
* f32
* f64
* i8
* i16
* i32
* i64
* u8
* u16
* u32
* u64
* boolean
@endsphinxdirective
* bf16
* f16
* f32
* f64
* i8
* i16
* i32
* i64
* u8
* u16
* u32
* u64
* boolean
### Models Caching
CPU supports Import/Export network capability. If model caching is enabled via the common OpenVINO™ `ov::cache_dir` property, the plugin automatically creates a cached blob inside the specified directory during model compilation.
This cached blob contains partial representation of the network, having performed common runtime optimizations and low precision transformations.
The next time the model is compiled, the cached representation will be loaded to the plugin instead of the initial OpenVINO IR, so the aforementioned transformation steps will be skipped.
These transformations take a significant amount of time during model compilation, so caching this representation reduces time spent for subsequent compilations of the model,
thereby reducing first inference latency (FIL).
For more details, see the [model caching](@ref openvino_docs_OV_UG_Model_caching_overview) overview.
Model Caching
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### Extensibility
CPU plugin supports fallback on `ov::Op` reference implementation if the plugin do not have its own implementation for such operation.
That means that [OpenVINO™ Extensibility Mechanism](@ref openvino_docs_Extensibility_UG_Intro) can be used for the plugin extension as well.
Enabling fallback on a custom operation implementation is possible by overriding the `ov::Op::evaluate` method in the derived operation class (see [custom OpenVINO™ operations](@ref openvino_docs_Extensibility_UG_add_openvino_ops) for details).
CPU supports Import/Export network capability. If model caching is enabled via the common OpenVINO™ ``ov::cache_dir`` property,
the plugin automatically creates a cached blob inside the specified directory during model compilation. This cached blob contains
partial representation of the network, having performed common runtime optimizations and low precision transformations.
The next time the model is compiled, the cached representation will be loaded to the plugin instead of the initial OpenVINO IR,
so the aforementioned transformation steps will be skipped. These transformations take a significant amount of time during
model compilation, so caching this representation reduces time spent for subsequent compilations of the model, thereby reducing
first inference latency (FIL).
> **NOTE**: At the moment, custom operations with internal dynamism (when the output tensor shape can only be determined as a result of performing the operation) are not supported by the plugin.
For more details, see the :doc:`model caching <openvino_docs_OV_UG_Model_caching_overview>` overview.
Extensibility
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CPU plugin supports fallback on ``ov::Op`` reference implementation if the plugin do not have its own implementation for such operation.
That means that :doc:`OpenVINO™ Extensibility Mechanism <openvino_docs_Extensibility_UG_Intro>` can be used for the plugin extension as well.
Enabling fallback on a custom operation implementation is possible by overriding the ``ov::Op::evaluate`` method in the derived operation
class (see :doc:`custom OpenVINO™ operations <openvino_docs_Extensibility_UG_add_openvino_ops>` for details).
.. note::
At the moment, custom operations with internal dynamism (when the output tensor shape can only be determined
as a result of performing the operation) are not supported by the plugin.
Stateful Models
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
### Stateful Models
The CPU plugin supports stateful models without any limitations.
For details, see [stateful models guide](@ref openvino_docs_OV_UG_network_state_intro).
For details, see :doc:`stateful models guide <openvino_docs_OV_UG_network_state_intro>`.
Supported Properties
###########################################################
## Supported Properties
The plugin supports the following properties:
### Read-write Properties
All parameters must be set before calling `ov::Core::compile_model()` in order to take effect or passed as additional argument to `ov::Core::compile_model()`
Read-write Properties
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- `ov::enable_profiling`
- `ov::inference_precision`
- `ov::hint::performance_mode`
- `ov::hint::num_request`
- `ov::num_streams`
- `ov::affinity`
- `ov::inference_num_threads`
- `ov::cache_dir`
- `ov::intel_cpu::denormals_optimization`
- `ov::intel_cpu::sparse_weights_decompression_rate`
All parameters must be set before calling ``ov::Core::compile_model()`` in order to take effect or passed as additional argument to ``ov::Core::compile_model()``
- ``ov::enable_profiling``
- ``ov::inference_precision``
- ``ov::hint::performance_mode``
- ``ov::hint::num_request``
- ``ov::num_streams``
- ``ov::affinity``
- ``ov::inference_num_threads``
- ``ov::cache_dir``
- ``ov::intel_cpu::denormals_optimization``
- ``ov::intel_cpu::sparse_weights_decompression_rate``
### Read-only properties
- `ov::supported_properties`
- `ov::available_devices`
- `ov::range_for_async_infer_requests`
- `ov::range_for_streams`
- `ov::device::full_name`
- `ov::device::capabilities`
Read-only properties
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## External Dependencies
For some performance-critical DL operations, the CPU plugin uses optimized implementations from the oneAPI Deep Neural Network Library ([oneDNN](https://github.com/oneapi-src/oneDNN)).
- ``ov::supported_properties``
- ``ov::available_devices``
- ``ov::range_for_async_infer_requests``
- ``ov::range_for_streams``
- ``ov::device::full_name``
- ``ov::device::capabilities``
External Dependencies
###########################################################
For some performance-critical DL operations, the CPU plugin uses optimized implementations from the oneAPI Deep Neural Network Library
(`oneDNN <https://github.com/oneapi-src/oneDNN>`__).
@sphinxdirective
.. dropdown:: The following operations are implemented using primitives from the OneDNN library:
* AvgPool
* Concat
* Convolution
* ConvolutionBackpropData
* GroupConvolution
* GroupConvolutionBackpropData
* GRUCell
* GRUSequence
* LRN
* LSTMCell
* LSTMSequence
* MatMul
* MaxPool
* RNNCell
* RNNSequence
* SoftMax
@endsphinxdirective
* AvgPool
* Concat
* Convolution
* ConvolutionBackpropData
* GroupConvolution
* GroupConvolutionBackpropData
* GRUCell
* GRUSequence
* LRN
* LSTMCell
* LSTMSequence
* MatMul
* MaxPool
* RNNCell
* RNNSequence
* SoftMax
## Optimization guide
### Denormals Optimization
Denormal numbers (denormals) are non-zero, finite float numbers that are very close to zero, i.e. the numbers in (0, 1.17549e-38) and (0, -1.17549e-38). In such cases, normalized-number encoding format does not have a capability to encode the number and underflow will happen. The computation involving such numbers is extremely slow on much hardware.
Optimization guide
###########################################################
As a denormal number is extremely close to zero, treating a denormal directly as zero is a straightforward and simple method to optimize computation of denormals. This optimization does not comply with IEEE 754 standard. If it causes unacceptable accuracy degradation, the `denormals_optimization` property is introduced to control this behavior. If there are denormal numbers in use cases, and no or acceptable accuracy drop is seen, set the property to `True` to improve performance, otherwise set it to `False`. If it is not set explicitly by the property and the application does not perform any denormals optimization as well, the optimization is disabled by default. After enabling the `denormals_optimization` property, OpenVINO will provide a cross operation system/ compiler and safe optimization on all platform when applicable.
Denormals Optimization
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
There are cases when the application in which OpenVINO is used also performs this low-level denormals optimization. If it is optimized by setting the FTZ(Flush-To-Zero) and DAZ(Denormals-As-Zero) flags in MXCSR register at the beginning of the thread where OpenVINO is called, OpenVINO will inherit this setting in the same thread and sub-thread, so there is no need to set the `denormals_optimization` property. In such cases, you are responsible for the effectiveness and safety of the settings.
Denormal numbers (denormals) are non-zero, finite float numbers that are very close to zero, i.e. the numbers
in (0, 1.17549e-38) and (0, -1.17549e-38). In such cases, normalized-number encoding format does not have a capability
to encode the number and underflow will happen. The computation involving such numbers is extremely slow on much hardware.
> **NOTE**: The `denormals_optimization` property must be set before calling `compile_model()`.
As a denormal number is extremely close to zero, treating a denormal directly as zero is a straightforward
and simple method to optimize computation of denormals. This optimization does not comply with IEEE 754 standard.
If it causes unacceptable accuracy degradation, the ``denormals_optimization`` property is introduced to control this behavior.
If there are denormal numbers in use cases, and no or acceptable accuracy drop is seen, set the property to `True`
to improve performance, otherwise set it to ``False``. If it is not set explicitly by the property and the application
does not perform any denormals optimization as well, the optimization is disabled by default. After enabling
the ``denormals_optimization`` property, OpenVINO will provide a cross operation system/ compiler and safe optimization
on all platform when applicable.
To enable denormals optimization in the application, the `denormals_optimization` property must be set to `True`:
There are cases when the application in which OpenVINO is used also performs this low-level denormals optimization.
If it is optimized by setting the FTZ(Flush-To-Zero) and DAZ(Denormals-As-Zero) flags in MXCSR register at the beginning
of the thread where OpenVINO is called, OpenVINO will inherit this setting in the same thread and sub-thread,
so there is no need to set the ``denormals_optimization`` property. In such cases, you are responsible for the
effectiveness and safety of the settings.
.. note::
The ``denormals_optimization`` property must be set before calling ``compile_model()``.
To enable denormals optimization in the application, the ``denormals_optimization`` property must be set to ``True``:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/ov_denormals.cpp
:language: cpp
:fragment: [ov:intel_cpu:denormals_optimization:part0]
.. doxygensnippet:: docs/snippets/ov_denormals.cpp
:language: cpp
:fragment: [ov:intel_cpu:denormals_optimization:part0]
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_denormals.py
:language: python
:fragment: [ov:intel_cpu:denormals_optimization:part0]
.. doxygensnippet:: docs/snippets/ov_denormals.py
:language: python
:fragment: [ov:intel_cpu:denormals_optimization:part0]
@endsphinxdirective
### Sparse weights decompression
`Sparse weights` are weights where most of the elements are zero. The ratio of the number of zero elements to the number of all elements is called `sparse rate`. Thus, we assume that `sparse weights` are weights with a high sparse rate. In case of `sparse weights`, we can store only non-zero values in memory using special storage structures, which allows us to use memory more efficiently. In turn, this can give us better performance in the high memory bound workloads (e.g., throughput scenario).
Sparse weights decompression
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
`Sparse weights decompression feature` allows to pack weights for Matrix Multiplication operations directly in the CPU plugin at the model compilation stage and store non-zero values in a special packed format. Then, during the execution of the model, the weights are unpacked and used in the computational kernel. Since the weights are loaded from DDR/L3 cache in the packed format this significantly decreases memory consumption and as a consequence improve inference performance.
``Sparse weights`` are weights where most of the elements are zero. The ratio of the number of zero elements
to the number of all elements is called ``sparse rate``. Thus, we assume that ``sparse weights`` are weights
with a high sparse rate. In case of ``sparse weights``, we can store only non-zero values in memory using
special storage structures, which allows us to use memory more efficiently. In turn, this can give us better
performance in the high memory bound workloads (e.g., throughput scenario).
To use this feature, the user is provided with property `sparse_weights_decompression_rate`, which can take values from the interval \[0.5, 1\] (values from \[0, 0.5\] are not supported in current implementation, see limitations below). `sparse_weights_decompression_rate` defines sparse rate threashold: only operations with higher sparse rate will be executed using `sparse weights decompression feature`. The default value is `1`, which means the option is disabled.
``Sparse weights decompression feature`` allows to pack weights for Matrix Multiplication operations directly
in the CPU plugin at the model compilation stage and store non-zero values in a special packed format. Then,
during the execution of the model, the weights are unpacked and used in the computational kernel. Since the
weights are loaded from DDR/L3 cache in the packed format this significantly decreases memory consumption
and as a consequence improve inference performance.
> **NOTE**: `Sparse weights decompression feature` is disabled by default since overall speed-up highly depends on particular workload and for some cases the feature may introduce performance degradations.
To use this feature, the user is provided with property ``sparse_weights_decompression_rate``, which can take
values from the interval \[0.5, 1\] (values from \[0, 0.5\] are not supported in current implementation,
see limitations below). ``sparse_weights_decompression_rate`` defines sparse rate threashold: only operations
with higher sparse rate will be executed using ``sparse weights decompression feature``. The default value is ``1``,
which means the option is disabled.
Code examples how to use `sparse_weights_decompression_rate`:
.. note::
``Sparse weights decompression feature`` is disabled by default since overall speed-up highly depends on
particular workload and for some cases the feature may introduce performance degradations.
@sphinxdirective
Code examples of how to use ``sparse_weights_decompression_rate``:
.. tab:: C++
.. doxygensnippet:: docs/snippets/cpu/ov_sparse_weights_decompression.cpp
:language: cpp
:fragment: [ov:intel_cpu:sparse_weights_decompression:part0]
.. doxygensnippet:: docs/snippets/cpu/ov_sparse_weights_decompression.cpp
:language: cpp
:fragment: [ov:intel_cpu:sparse_weights_decompression:part0]
.. tab:: Python
.. doxygensnippet:: docs/snippets/cpu/ov_sparse_weights_decompression.py
:language: python
:fragment: [ov:intel_cpu:sparse_weights_decompression:part0]
.. doxygensnippet:: docs/snippets/cpu/ov_sparse_weights_decompression.py
:language: python
:fragment: [ov:intel_cpu:sparse_weights_decompression:part0]
@endsphinxdirective
> **NOTE**: The `sparse_weights_decompression_rate` property must be set before calling `compile_model()`.
.. note::
The ``sparse_weights_decompression_rate`` property must be set before calling ``compile_model()``.
Information about the layers in which the `sparse weights decompression feature` was applied can be obtained from perf counters log. The "exec type" field will contain the implementation type with the "sparse" particle ("brgemm_avx512_amx_sparse_I8" in the example below):
Information about the layers in which the ``sparse weights decompression feature`` was applied can be obtained
from perf counters log. The "exec type" field will contain the implementation type with the "sparse" particle
("brgemm_avx512_amx_sparse_I8" in the example below):
MatMul_1800 EXECUTED layerType: FullyConnected execType: brgemm_avx512_amx_sparse_I8 realTime (ms): 0.050000 cpuTime (ms): 0.050000
.. code-block:: sh
MatMul_1800 EXECUTED layerType: FullyConnected execType: brgemm_avx512_amx_sparse_I8 realTime (ms): 0.050000 cpuTime (ms): 0.050000
Limitations
-----------------------------------------------------------
Currently, the ``sparse weights decompression feature`` is supported with the following limitations:
#### Limitations
Currently, the `sparse weights decompression feature` is supported with the following limitations:
1. Model should be quantized to int8 precision.
2. Feature is only supported for Matrix Multiplication operations.
3. HW target must have Intel AMX extension support (e.g., Intel® 4th Generation Xeon® processors (code name Sapphire Rapids)).
4. The number of input and output channels of the weights must be a multiple of 64.
5. Current feature implementation supports only sparse rate higher than 0.5.
## Additional Resources
* [Supported Devices](Supported_Devices.md)
* [Optimization guide](@ref openvino_docs_deployment_optimization_guide_dldt_optimization_guide)
* [СPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_cpu/README.md)
Additional Resources
###########################################################
* :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`
* :doc:`Optimization guide <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>`
* `CPU plugin developers documentation <https://github.com/openvinotoolkit/openvino/blob/master/src/plugins/intel_cpu/README.md>`__
@endsphinxdirective

View File

@@ -1,170 +1,229 @@
# GNA Device {#openvino_docs_OV_UG_supported_plugins_GNA}
@sphinxdirective
The Intel® Gaussian & Neural Accelerator (GNA) is a low-power neural coprocessor for continuous inference at the edge.
Intel® GNA is not intended to replace typical inference devices such as the
CPU and GPU. It is designed for offloading
Intel® GNA is not intended to replace typical inference devices such as the CPU and GPU. It is designed for offloading
continuous inference workloads including but not limited to noise reduction or speech recognition
to save power and free CPU resources.
The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU.
For more details on how to configure a machine to use GNA plugin, see the [GNA configuration page](@ref openvino_docs_install_guides_configurations_for_intel_gna).
For more details on how to configure a machine to use GNA, see the :doc:`GNA configuration page <openvino_docs_install_guides_configurations_for_intel_gna>`.
## Intel® GNA Generational Differences
Intel® GNA Generational Differences
###########################################################
The first (1.0) and second (2.0) versions of Intel® GNA found in 10th and 11th generation Intel® Core™ Processors may be considered functionally equivalent. Intel® GNA 2.0 provided performance improvement with respect to Intel® GNA 1.0. Starting with 12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake), support for Intel® GNA 3.0 features is being added.
The first (1.0) and second (2.0) versions of Intel® GNA found in 10th and 11th generation Intel® Core™ Processors may be considered
functionally equivalent. Intel® GNA 2.0 provided performance improvement with respect to Intel® GNA 1.0. Starting with 12th Generation
Intel® Core™ Processors (formerly codenamed Alder Lake), support for Intel® GNA 3.0 features is being added.
In this documentation, "GNA 2.0" refers to Intel® GNA hardware delivered on 10th and 11th generation Intel® Core™ processors, and the term "GNA 3.0" refers to GNA hardware delivered on 12th generation Intel® Core™ processors.
In this documentation, "GNA 2.0" refers to Intel® GNA hardware delivered on 10th and 11th generation Intel® Core™ processors,
and the term "GNA 3.0" refers to GNA hardware delivered on 12th generation Intel® Core™ processors.
### Intel® GNA Forward and Backward Compatibility
Intel® GNA Forward and Backward Compatibility
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
When a model is run, using the GNA plugin, it is compiled internally for the specific hardware target. It is possible to export a compiled model, using <a href="#import-export">Import/Export</a> functionality to use it later. In general, there is no guarantee that a model compiled and exported for GNA 2.0 runs on GNA 3.0 or vice versa.
When a model is run, using the GNA plugin, it is compiled internally for the specific hardware target. It is possible to export a compiled model,
using `Import/Export <#import-export>`__ functionality to use it later. In general, there is no guarantee that a model compiled and
exported for GNA 2.0 runs on GNA 3.0 or vice versa.
@sphinxdirective
================== ======================== =======================================================
Hardware Compile target 2.0 Compile target 3.0
================== ======================== =======================================================
GNA 2.0 Supported Not supported (incompatible layers emulated on CPU)
GNA 3.0 Partially supported Supported
================== ======================== =======================================================
.. csv-table:: Interoperability of compile target and hardware target
:header: "Hardware", "Compile target 2.0", "Compile target 3.0"
.. note::
"GNA 2.0", "Supported", "Not supported (incompatible layers emulated on CPU)"
"GNA 3.0", "Partially supported", "Supported"
In most cases, a network compiled for GNA 2.0 runs as expected on GNA 3.0. However, performance may be worse
compared to when a network is compiled specifically for the latter. The exception is a network with convolutions
with the number of filters greater than 8192 (see the :ref:`Model and Operation Limitations <#model-and-operation-limitations>` section).
@endsphinxdirective
> **NOTE**: In most cases, a network compiled for GNA 2.0 runs as expected on GNA 3.0. However, the performance may be worse compared to when a network is compiled specifically for the latter. The exception is a network with convolutions with the number of filters greater than 8192 (see the <a href="#models-and-operations-limitations">Models and Operations Limitations</a> section).
For optimal work with POT quantized models, which include 2D convolutions on GNA 3.0 hardware, the following requirements should be satisfied:
For optimal work with POT quantized models, which include 2D convolutions on GNA 3.0 hardware, the <a href="#support-for-2d-convolutions-using-pot">following requirements</a> should be satisfied.
* Choose a compile target with priority on: cross-platform execution, performance, memory, or power optimization.
* To check interoperability in your application use: ``ov::intel_gna::execution_target`` and ``ov::intel_gna::compile_target``.
Choose a compile target with priority on: cross-platform execution, performance, memory, or power optimization.
:doc:`Speech C++ Sample <openvino_inference_engine_samples_speech_sample_README>` can be used for experiments (see the ``-exec_target`` and ``-compile_target`` command line options).
Use the following properties to check interoperability in your application: `ov::intel_gna::execution_target` and `ov::intel_gna::compile_target`.
[Speech C++ Sample](@ref openvino_inference_engine_samples_speech_sample_README) can be used for experiments (see the `-exec_target` and `-compile_target` command line options).
## Software Emulation Mode
Software Emulation Mode
###########################################################
Software emulation mode is used by default on platforms without GNA hardware support. Therefore, model runs even if there is no GNA HW within your platform.
GNA plugin enables switching the execution between software emulation mode and hardware execution mode once the model has been loaded.
For details, see a description of the `ov::intel_gna::execution_mode` property.
For details, see a description of the ``ov::intel_gna::execution_mode`` property.
## Recovery from Interruption by High-Priority Windows Audio Processes
Recovery from Interruption by High-Priority Windows Audio Processes
############################################################################
GNA is designed for real-time workloads i.e., noise reduction.
For such workloads, processing should be time constrained. Otherwise, extra delays may cause undesired effects such as
*audio glitches*. The GNA driver provides a Quality of Service (QoS) mechanism to ensure that processing can satisfy real-time requirements.
The mechanism interrupts requests that might cause high-priority Windows audio processes to miss
the schedule. As a result, long running GNA tasks terminate early.
GNA is designed for real-time workloads i.e., noise reduction. For such workloads, processing should be time constrained.
Otherwise, extra delays may cause undesired effects such as *audio glitches*. The GNA driver provides a Quality of Service (QoS)
mechanism to ensure that processing can satisfy real-time requirements. The mechanism interrupts requests that might cause
high-priority Windows audio processes to miss the schedule. As a result, long running GNA tasks terminate early.
To prepare the applications correctly, use Automatic QoS Feature described below.
### Automatic QoS Feature on Windows
Automatic QoS Feature on Windows
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Starting with the 2021.4.1 release of OpenVINO™ and the 03.00.00.1363 version of Windows GNA driver, a new execution mode of `ov::intel_gna::ExecutionMode::HW_WITH_SW_FBACK` has been available to ensure that workloads satisfy real-time execution. In this mode, the GNA driver automatically falls back on CPU for a particular infer request
if the HW queue is not empty. Therefore, there is no need for explicitly switching between GNA and CPU.
Starting with the 2021.4.1 release of OpenVINO™ and the 03.00.00.1363 version of Windows GNA driver, the execution mode of
``ov::intel_gna::ExecutionMode::HW_WITH_SW_FBACK`` has been available to ensure that workloads satisfy real-time execution.
In this mode, the GNA driver automatically falls back on CPU for a particular infer request if the HW queue is not empty.
Therefore, there is no need for explicitly switching between GNA and CPU.
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/gna/configure.cpp include
@snippet docs/snippets/gna/configure.cpp ov_gna_exec_mode_hw_with_sw_fback
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/gna/configure.py import
@snippet docs/snippets/gna/configure.py ov_gna_exec_mode_hw_with_sw_fback
@endsphinxtab
@endsphinxtabset
.. tab-set::
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/gna/configure.cpp
:language: cpp
:fragment: [include]
.. doxygensnippet:: docs/snippets/gna/configure.cpp
:language: cpp
:fragment: [ov_gna_exec_mode_hw_with_sw_fback]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/gna/configure.py
:language: py
:fragment: [import]
.. doxygensnippet:: docs/snippets/gna/configure.py
:language: py
:fragment: [ov_gna_exec_mode_hw_with_sw_fback]
> **NOTE**: Due to the "first come - first served" nature of GNA driver and the QoS feature, this mode may lead to increased CPU consumption
if there are several clients using GNA simultaneously.
Even a lightweight competing infer request, not cleared at the time when the user's GNA client process makes its request,
can cause the user's request to be executed on CPU, unnecessarily increasing CPU utilization and power.
## Supported Inference Data Types
Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit (`i8`), 16-bit (`i16`), and 32-bit (`i32`) integer computations.
GNA plugin users are encouraged to use the [Post-Training Optimization Tool](@ref pot_introduction) to get a model with quantization hints based on statistics for the provided dataset.
Unlike other plugins supporting low-precision execution, the GNA plugin can calculate quantization factors at the model loading time. Therefore, a model can be run without calibration. However, this mode may not provide satisfactory accuracy because the internal quantization algorithm is based on heuristics, the efficiency of which depends on the model and dynamic range of input data. This mode is going to be deprecated soon.
.. note::
Due to the "first come - first served" nature of GNA driver and the QoS feature, this mode may lead to increased
CPU consumption if there are several clients using GNA simultaneously. Even a lightweight competing infer request,
not cleared at the time when the user's GNA client process makes its request, can cause the user's request to be
executed on CPU, unnecessarily increasing CPU utilization and power.
GNA plugin supports the `i16` and `i8` quantized data types as inference precision of internal primitives.
[Hello Query Device C++ Sample](@ref openvino_inference_engine_samples_hello_query_device_README) can be used to print out supported data types for all detected devices.
Supported Inference Data Types
###########################################################
Intel® GNA essentially operates in the low-precision mode which represents a mix of 8-bit (``i8``), 16-bit (``i16``), and 32-bit (``i32``)
integer computations. Unlike other OpenVINO devices supporting low-precision execution, it can calculate quantization factors at the
model loading time. Therefore, a model can be run without calibration. However, this mode may not provide satisfactory accuracy
because the internal quantization algorithm is based on heuristics, the efficiency of which depends on the model and dynamic range of input data.
This mode is going to be deprecated soon. GNA supports the ``i16`` and ``i8`` quantized data types as inference precision of internal primitives.
GNA users are encouraged to use the :doc:`Post-Training Optimization Tool <pot_introduction>` to get a model with
quantization hints based on statistics for the provided dataset.
:doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` can be used to print out supported data types for all detected devices.
:doc:`POT API Usage sample for GNA <pot_example_speech_README>` demonstrates how a model can be quantized for GNA, using POT API in two modes:
[POT API Usage sample for GNA](@ref pot_example_speech_README) demonstrates how a model can be quantized for GNA, using POT API in two modes:
* Accuracy (i16 weights)
* Performance (i8 weights)
For POT quantized model, the `ov::inference_precision` property has no effect except cases described in <a href="#support-for-2d-convolutions-using-pot">Support for 2D Convolutions using POT</a>.
For POT quantized models, the ``ov::inference_precision`` property has no effect except in cases described in the
:ref:`Model and Operation Limitations section <#model-and-operation-limitations>`.
## Supported Features
The plugin supports the features listed below:
Supported Features
###########################################################
### Models Caching
Due to import/export functionality support (see below), cache for GNA plugin may be enabled via common `ov::cache_dir` property of OpenVINO™.
Model Caching
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
For more details, see the [Model caching overview](@ref openvino_docs_OV_UG_Model_caching_overview).
Due to import/export functionality support (see below), cache for GNA plugin may be enabled via common ``ov::cache_dir`` property of OpenVINO™.
### Import/Export
For more details, see the :doc:`Model caching overview <openvino_docs_OV_UG_Model_caching_overview>`.
The GNA plugin supports import/export capability, which helps decrease first inference time significantly. The model compile target is the same as the execution target by default. If there is no GNA HW in the system, the default value for the execution target corresponds to available hardware or latest hardware version, supported by the plugin (i.e., GNA 3.0).
To export a model for a specific version of GNA HW, use the `ov::intel_gna::compile_target` property and then export the model:
Import/Export
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@sphinxtabset
The GNA plugin supports import/export capability, which helps decrease first inference time significantly.
The model compile target is the same as the execution target by default. If there is no GNA HW in the system,
the default value for the execution target corresponds to available hardware or latest hardware version,
supported by the plugin (i.e., GNA 3.0).
@sphinxtab{C++}
To export a model for a specific version of GNA HW, use the ``ov::intel_gna::compile_target`` property and then export the model:
@snippet docs/snippets/gna/import_export.cpp ov_gna_export
@endsphinxtab
.. tab-set::
@sphinxtab{Python}
.. tab-item:: C++
:sync: cpp
@snippet docs/snippets/gna/import_export.py ov_gna_export
.. doxygensnippet:: docs/snippets/gna/import_export.cpp
:language: cpp
:fragment: [ov_gna_export]
@endsphinxtab
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/gna/import_export.py
:language: py
:fragment: [ov_gna_export]
@endsphinxtabset
Import model:
@sphinxtabset
@sphinxtab{C++}
.. tab-set::
@snippet docs/snippets/gna/import_export.cpp ov_gna_import
.. tab-item:: C++
:sync: cpp
@endsphinxtab
.. doxygensnippet:: docs/snippets/gna/import_export.cpp
:language: cpp
:fragment: [ov_gna_import]
@sphinxtab{Python}
.. tab-item:: Python
:sync: py
@snippet docs/snippets/gna/import_export.py ov_gna_import
@endsphinxtab
@endsphinxtabset
To compile a model, use either [compile Tool](@ref openvino_inference_engine_tools_compile_tool_README) or [Speech C++ Sample](@ref openvino_inference_engine_samples_speech_sample_README).
### Stateful Models
GNA plugin natively supports stateful models. For more details on such models, refer to the [Stateful models] (@ref openvino_docs_OV_UG_network_state_intro).
.. doxygensnippet:: docs/snippets/gna/import_export.py
:language: py
:fragment: [ov_gna_import]
> **NOTE**: The GNA is typically used in streaming scenarios when minimizing latency is important. Taking into account that POT does not support the `TensorIterator` operation, the recommendation is to use the `--transform` option of the Model Optimizer to apply `LowLatency2` transformation when converting an original model.
To compile a model, use either :doc:`compile Tool <openvino_inference_engine_tools_compile_tool_README>` or
:doc:`Speech C++ Sample <openvino_inference_engine_samples_speech_sample_README>`.
### Profiling
The GNA plugin allows turning on profiling, using the `ov::enable_profiling` property.
Stateful Models
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
GNA plugin natively supports stateful models. For more details on such models, refer to the :doc:`Stateful models <openvino_docs_OV_UG_network_state_intro>`.
.. note::
The GNA is typically used in streaming scenarios when minimizing latency is important. Taking into account that POT does not
support the ``TensorIterator`` operation, the recommendation is to use the ``--transform`` option of the Model Optimizer
to apply ``LowLatency2`` transformation when converting an original model.
Profiling
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The GNA plugin allows turning on profiling, using the ``ov::enable_profiling`` property.
With the following methods, you can collect profiling information with various performance data about execution on GNA:
@sphinxdirective
.. tab:: C++
``ov::InferRequest::get_profiling_info``
@@ -173,36 +232,39 @@ With the following methods, you can collect profiling information with various p
``openvino.runtime.InferRequest.get_profiling_info``
@endsphinxdirective
The current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. The API enables you to retrieve counter units in cycles. You can convert cycles to seconds as follows:
The current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information.
The API enables you to retrieve counter units in cycles. You can convert cycles to seconds as follows:
```
seconds = cycles / frequency
```
.. code-block:: sh
Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor:
seconds = cycles / frequency
@sphinxdirective
.. csv-table:: Frequency of Intel® GNA inside a particular processor
:header: "Processor", "Frequency of Intel® GNA, MHz"
Refer to the table below for the frequency of Intel® GNA inside particular processors:
"Intel® Core™ processors", 400
"Intel® processors formerly codenamed Elkhart Lake", 200
"Intel® processors formerly codenamed Gemini Lake", 200
========================================================== ==================================
Processor Frequency of Intel® GNA, MHz
========================================================== ==================================
Intel® Core™ processors 400
Intel® processors formerly codenamed Elkhart Lake 200
Intel® processors formerly codenamed Gemini Lake 200
========================================================== ==================================
@endsphinxdirective
Inference request performance counters provided for the time being:
* The number of total cycles spent on scoring in hardware, including compute and memory stall cycles
* The number of stall cycles spent in hardware
* The number of total cycles spent on scoring in hardware, including compute and memory stall cycles
* The number of stall cycles spent in hardware
## Supported Properties
### Read-write Properties
In order to take effect, the following parameters must be set before model compilation or passed as additional arguments to `ov::Core::compile_model()`:
Supported Properties
###########################################################
Read-write Properties
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
In order to take effect, the following parameters must be set before model compilation or passed as additional arguments to ``ov::Core::compile_model()``:
- ov::cache_dir
- ov::enable_profiling
@@ -215,12 +277,13 @@ In order to take effect, the following parameters must be set before model compi
- ov::intel_gna::pwl_max_error_percent
- ov::intel_gna::scale_factors_per_input
These parameters can be changed after model compilation `ov::CompiledModel::set_property`:
These parameters can be changed after model compilation ``ov::CompiledModel::set_property``:
- ov::hint::performance_mode
- ov::intel_gna::execution_mode
- ov::log::level
### Read-only Properties
Read-only Properties
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
- ov::available_devices
- ov::device::capabilities
- ov::device::full_name
@@ -229,13 +292,15 @@ These parameters can be changed after model compilation `ov::CompiledModel::set_
- ov::range_for_async_infer_requests
- ov::supported_properties
## Limitations
Limitations
###########################################################
### Model and Operation Limitations
Model and Operation Limitations
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Due to the specification of hardware architecture, Intel® GNA supports a limited set of operations (including their kinds and combinations).
For example, GNA Plugin should not be expected to run computer vision models because the plugin does not fully support 2D convolutions. The exception are the models specifically adapted for the GNA Plugin.
For example, GNA Plugin should not be expected to run computer vision models because the plugin does not fully support 2D convolutions.
The exception are the models specifically adapted for the GNA Plugin.
Limitations include:
@@ -246,90 +311,125 @@ Limitations include:
- Splits and concatenations are supported for continuous portions of memory (e.g., split of 1,2,3,4 to 1,1,3,4 and 1,1,3,4 or concats of 1,2,3,4 and 1,2,3,5 to 2,2,3,4).
- For *Multiply*, *Add* and *Subtract* layers, auto broadcasting is only supported for constant inputs.
#### Support for 2D Convolutions
The Intel® GNA 1.0 and 2.0 hardware natively supports only 1D convolutions. However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction.
Support for 2D Convolutions
-----------------------------------------------------------
Initially, a limited subset of Intel® GNA 3.0 features are added to the previous feature set including the following:
The Intel® GNA 1.0 and 2.0 hardware natively supports only 1D convolutions. However, 2D convolutions can be mapped to 1D when
a convolution kernel moves in a single direction. Initially, a limited subset of Intel® GNA 3.0 features are added to the
previous feature set including:
* **2D VALID Convolution With Small 2D Kernels:** Two-dimensional convolutions with the following kernel dimensions [`H`,`W`] are supported: [1,1], [2,2], [3,3], [2,1], [3,1], [4,1], [5,1], [6,1], [7,1], [1,2], or [1,3]. Input tensor dimensions are limited to [1,8,16,16] <= [`N`,`C`,`H`,`W`] <= [1,120,384,240]. Up to 384 `C` channels may be used with a subset of kernel sizes (see the table below). Up to 256 kernels (output channels) are supported. Pooling is limited to pool shapes of [1,1], [2,2], or [3,3]. Not all combinations of kernel shape and input tensor shape are supported (see the tables below for exact limitations).
* **2D VALID Convolution With Small 2D Kernels:** Two-dimensional convolutions with the following kernel dimensions
[``H``,``W``] are supported: [1,1], [2,2], [3,3], [2,1], [3,1], [4,1], [5,1], [6,1], [7,1], [1,2], or [1,3].
Input tensor dimensions are limited to [1,8,16,16] <= [``N``,``C``,``H``,``W``] <= [1,120,384,240]. Up to 384 ``C``
channels may be used with a subset of kernel sizes (see the table below). Up to 256 kernels (output channels)
are supported. Pooling is limited to pool shapes of [1,1], [2,2], or [3,3]. Not all combinations of kernel
shape and input tensor shape are supported (see the tables below for exact limitations).
The tables below show that the exact limitation on the input tensor width W depends on the number of input channels *C* (indicated as *Ci* below) and the kernel shape. There is much more freedom to choose the input tensor height and number of output channels.
The tables below show that the exact limitation on the input tensor width W depends on the number of input channels
*C* (indicated as *Ci* below) and the kernel shape. There is much more freedom to choose the input tensor height and number of output channels.
The following tables provide a more explicit representation of the Intel(R) GNA 3.0 2D convolution operations initially supported. The limits depend strongly on number of input tensor channels (*Ci*) and the input tensor width (*W*). Other factors are kernel height (*KH*), kernel width (*KW*), pool height (*PH*), pool width (*PW*), horizontal pool step (*SH*), and vertical pool step (*PW*). For example, the first table shows that for a 3x3 kernel with max pooling, only square pools are supported, and *W* is limited to 87 when there are 64 input channels.
The following tables provide a more explicit representation of the Intel(R) GNA 3.0 2D convolution operations
initially supported. The limits depend strongly on number of input tensor channels (*Ci*) and the input tensor width (*W*).
Other factors are kernel height (*KH*), kernel width (*KW*), pool height (*PH*), pool width (*PW*), horizontal pool step (*SH*),
and vertical pool step (*PW*). For example, the first table shows that for a 3x3 kernel with max pooling, only square pools are supported,
and *W* is limited to 87 when there are 64 input channels.
@sphinxdirective
:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i16) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i16.csv>`
:download:`Table of Maximum Input Tensor Widths (W) vs. Rest of Parameters (Input and Kernel Precision: i8) <../../../docs/OV_Runtime_UG/supported_plugins/files/GNA_Maximum_Input_Tensor_Widths_i8.csv>`
@endsphinxdirective
> **NOTE**: The above limitations only apply to the new hardware 2D convolution operation. When possible, the Intel® GNA plugin graph compiler flattens 2D convolutions so that the second generation Intel® GNA 1D convolution operations (without these limitations) may be used. The plugin will also flatten 2D convolutions regardless of the sizes if GNA 2.0 compilation target is selected (see below).
.. note::
#### Support for 2D Convolutions using POT
The above limitations only apply to the new hardware 2D convolution operation. When possible, the Intel® GNA
plugin graph compiler flattens 2D convolutions so that the second generation Intel® GNA 1D convolution operations
(without these limitations) may be used. The plugin will also flatten 2D convolutions regardless of the sizes if GNA 2.0
compilation target is selected (see below).
Support for 2D Convolutions using POT
-----------------------------------------------------------
For POT to successfully work with the models including GNA3.0 2D convolutions, the following requirements must be met:
* All convolution parameters are natively supported by HW (see tables above).
* The runtime precision is explicitly set by the `ov::inference_precision` property as `i8` for the models produced by the `performance mode` of POT, and as `i16` for the models produced by the `accuracy mode` of POT.
* The runtime precision is explicitly set by the ``ov::inference_precision`` property as ``i8`` for the models produced by
the ``performance mode`` of POT, and as ``i16`` for the models produced by the ``accuracy mode`` of POT.
### Batch Size Limitation
Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames.
Batch Size Limitation
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Refer to the [Layout API overview](@ref openvino_docs_OV_UG_Layout_Overview) to determine batch dimension.
Intel® GNA plugin supports processing of context-windowed speech frames in batches of 1-8 frames.
Refer to the :doc:`Layout API overview <openvino_docs_OV_UG_Layout_Overview>` to determine batch dimension.
To set the layout of model inputs in runtime, use the :doc:`Optimize Preprocessing <openvino_docs_OV_UG_Preprocessing_Overview>` guide:
To set layout of model inputs in runtime, use the [Optimize Preprocessing](@ref openvino_docs_OV_UG_Preprocessing_Overview) guide:
@sphinxtabset
.. tab-set::
@sphinxtab{C++}
.. tab-item:: C++
:sync: cpp
@snippet docs/snippets/gna/set_batch.cpp include
@snippet docs/snippets/gna/set_batch.cpp ov_gna_set_nc_layout
.. doxygensnippet:: docs/snippets/gna/set_batch.cpp
:language: cpp
:fragment: [include]
.. doxygensnippet:: docs/snippets/gna/set_batch.cpp
:language: cpp
:fragment: [ov_gna_set_nc_layout]
@endsphinxtab
.. tab-item:: Python
:sync: py
@sphinxtab{Python}
@snippet docs/snippets/gna/set_batch.py import
@snippet docs/snippets/gna/set_batch.py ov_gna_set_nc_layout
@endsphinxtab
@endsphinxtabset
.. doxygensnippet:: docs/snippets/gna/set_batch.py
:language: py
:fragment: [import]
.. doxygensnippet:: docs/snippets/gna/set_batch.py
:language: py
:fragment: [ov_gna_set_nc_layout]
then set batch size:
@sphinxtabset
.. tab-set::
@sphinxtab{C++}
.. tab-item:: C++
:sync: cpp
@snippet docs/snippets/gna/set_batch.cpp ov_gna_set_batch_size
.. doxygensnippet:: docs/snippets/gna/set_batch.cpp
:language: cpp
:fragment: [ov_gna_set_batch_size]
@endsphinxtab
.. tab-item:: Python
:sync: py
@sphinxtab{Python}
@snippet docs/snippets/gna/set_batch.py ov_gna_set_batch_size
@endsphinxtab
@endsphinxtabset
.. doxygensnippet:: docs/snippets/gna/set_batch.py
:language: py
:fragment: [ov_gna_set_batch_size]
Increasing batch size only improves efficiency of `MatMul` layers.
Increasing batch size only improves efficiency of ``MatMul`` layers.
> **NOTE**: For models with `Convolution`, `LSTMCell`, `GRUCell`, or `ReadValue`/`Assign` operations, the only supported batch size is 1.
.. note::
For models with ``Convolution``, ``LSTMCell``, ``GRUCell``, or ``ReadValue`` / ``Assign`` operations, the only supported batch size is 1.
### Compatibility with Heterogeneous mode
[Heterogeneous execution](@ref openvino_docs_OV_UG_Hetero_execution) is currently not supported by GNA plugin.
Compatibility with Heterogeneous mode
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
:doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>` is currently not supported by GNA plugin.
See Also
###########################################################
* :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`
* :doc:`Converting Model <openvino_docs_MO_DG_prepare_model_convert_model_Converting_Model>`
* :doc:`Convert model from Kaldi <openvino_docs_MO_DG_prepare_model_convert_model_Convert_Model_From_Kaldi>`
@endsphinxdirective
## See Also
* [Supported Devices](Supported_Devices.md)
* [Converting Model](../../MO_DG/prepare_model/convert_model/Converting_Model.md)
* [Convert model from Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md)

View File

@@ -1,220 +1,277 @@
# Query Device Properties - Configuration {#openvino_docs_OV_UG_query_api}
@sphinxdirective
The OpenVINO™ toolkit supports inference with several types of devices (processors or accelerators).
This section provides a high-level description of the process of querying of different device properties and configuration values at runtime.
OpenVINO runtime has two types of properties:
- Read only properties which provide information about the devices (such as device name, thermal state, execution capabilities, etc.) and information about configuration values used to compile the model (`ov::CompiledModel`) .
- Mutable properties which are primarily used to configure the `ov::Core::compile_model` process and affect final inference on a specific set of devices. Such properties can be set globally per device via `ov::Core::set_property` or locally for particular model in the `ov::Core::compile_model` and the `ov::Core::query_model` calls.
An OpenVINO property is represented as a named constexpr variable with a given string name and a type. The following example represents a read-only property with a C++ name of `ov::available_devices`, a string name of `AVAILABLE_DEVICES` and a type of `std::vector<std::string>`:
```
static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};
```
- Read only properties which provide information about the devices (such as device name or execution capabilities, etc.)
and information about configuration values used to compile the model (``ov::CompiledModel``) .
- Mutable properties which are primarily used to configure the ``ov::Core::compile_model`` process and affect final
inference on a specific set of devices. Such properties can be set globally per device via ``ov::Core::set_property``
or locally for particular model in the ``ov::Core::compile_model`` and the ``ov::Core::query_model`` calls.
Refer to the [Hello Query Device С++ Sample](../../../samples/cpp/hello_query_device/README.md) sources and the [Multi-Device execution](../multi_device.md) documentation for examples of using setting and getting properties in user applications.
### Get a Set of Available Devices
An OpenVINO property is represented as a named constexpr variable with a given string name and a type.
The following example represents a read-only property with a C++ name of ``ov::available_devices``,
a string name of ``AVAILABLE_DEVICES`` and a type of ``std::vector<std::string>``:
Based on the `ov::available_devices` read-only property, OpenVINO Core collects information about currently available devices enabled by OpenVINO plugins and returns information, using the `ov::Core::get_available_devices` method:
.. code-block:: sh
static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};
@sphinxtabset
@sphinxtab{C++}
Refer to the :doc:`Hello Query Device С++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and
the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using
setting and getting properties in user applications.
@snippet docs/snippets/ov_properties_api.cpp get_available_devices
@endsphinxtab
Get a Set of Available Devices
###########################################################
@sphinxtab{Python}
Based on the ``ov::available_devices`` read-only property, OpenVINO Core collects information about currently available
devices enabled by OpenVINO plugins and returns information, using the ``ov::Core::get_available_devices`` method:
@snippet docs/snippets/ov_properties_api.py get_available_devices
@endsphinxtab
.. tab-set::
@endsphinxtabset
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [get_available_devices]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [get_available_devices]
The function returns a list of available devices, for example:
```
CPU
GPU.0
GPU.1
```
.. code-block::
If there are multiple instances of a specific device, the devices are enumerated with a suffix comprising a full stop and a unique string identifier, such as `.suffix`. Each device name can then be passed to:
CPU
GPU.0
GPU.1
* `ov::Core::compile_model` to load the model to a specific device with specific configuration properties.
* `ov::Core::get_property` to get common or device-specific properties.
* All other methods of the `ov::Core` class that accept `deviceName`.
If there are multiple instances of a specific device, the devices are enumerated with a suffix comprising a full stop and
a unique string identifier, such as ``.suffix``. Each device name can then be passed to:
### Working with Properties in Your Code
* ``ov::Core::compile_model`` to load the model to a specific device with specific configuration properties.
* ``ov::Core::get_property`` to get common or device-specific properties.
* All other methods of the ``ov::Core`` class that accept ``deviceName``.
The `ov::Core` class provides the following method to query device information, set or get different device configuration properties:
Working with Properties in Your Code
###########################################################
* `ov::Core::get_property` - Gets the current value of a specific property.
* `ov::Core::set_property` - Sets a new value for the property globally for specified `device_name`.
The ``ov::Core`` class provides the following method to query device information, set or get different device configuration properties:
The `ov::CompiledModel` class is also extended to support the properties:
* ``ov::Core::get_property`` - Gets the current value of a specific property.
* ``ov::Core::set_property`` - Sets a new value for the property globally for specified ``device_name``.
* `ov::CompiledModel::get_property`
* `ov::CompiledModel::set_property`
The ``ov::CompiledModel`` class is also extended to support the properties:
For documentation about OpenVINO common device-independent properties, refer to the `openvino/runtime/properties.hpp`. Device-specific configuration keys can be found in corresponding device folders (for example, `openvino/runtime/intel_gpu/properties.hpp`).
* ``ov::CompiledModel::get_property``
* ``ov::CompiledModel::set_property``
### Working with Properties via Core
For documentation about OpenVINO common device-independent properties, refer to the ``openvino/runtime/properties.hpp``.
Device-specific configuration keys can be found in corresponding device folders (for example, ``openvino/runtime/intel_gpu/properties.hpp``).
#### Getting Device Properties
Working with Properties via Core
###########################################################
The code below demonstrates how to query `HETERO` device priority of devices which will be used to infer the model:
Getting Device Properties
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@sphinxtabset
The code below demonstrates how to query ``HETERO`` device priority of devices which will be used to infer the model:
@sphinxtab{C++}
@snippet docs/snippets/ov_properties_api.cpp hetero_priorities
.. tab-set::
@endsphinxtab
.. tab-item:: C++
:sync: cpp
@sphinxtab{Python}
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [hetero_priorities]
@snippet docs/snippets/ov_properties_api.py hetero_priorities
.. tab-item:: Python
:sync: py
@endsphinxtab
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [hetero_priorities]
@endsphinxtabset
> **NOTE**: All properties have a type, which is specified during property declaration. Based on this, actual type under `auto` is automatically deduced by C++ compiler.
.. note::
All properties have a type, which is specified during property declaration. Based on this, actual type under ``auto`` is automatically deduced by C++ compiler.
To extract device properties such as available devices (`ov::available_devices`), device name (`ov::device::full_name`), supported properties (`ov::supported_properties`), and others, use the `ov::Core::get_property` method:
To extract device properties such as available devices (``ov::available_devices``), device name (``ov::device::full_name``),
supported properties (``ov::supported_properties``), and others, use the ``ov::Core::get_property`` method:
@sphinxtabset
@sphinxtab{C++}
.. tab-set::
@snippet docs/snippets/ov_properties_api.cpp cpu_device_name
.. tab-item:: C++
:sync: cpp
@endsphinxtab
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [cpu_device_name]
@sphinxtab{Python}
.. tab-item:: Python
:sync: py
@snippet docs/snippets/ov_properties_api.py cpu_device_name
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [cpu_device_name]
@endsphinxtab
@endsphinxtabset
A returned value appears as follows: ``Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz``.
A returned value appears as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`.
.. note::
In order to understand a list of supported properties on ``ov::Core`` or ``ov::CompiledModel`` levels, use ``ov::supported_properties``
which contains a vector of supported property names. Properties which can be changed, has ``ov::PropertyName::is_mutable``
returning the ``true`` value. Most of the properites which are changable on ``ov::Core`` level, cannot be changed once the model is compiled,
so it becomes immutable read-only property.
> **NOTE**: In order to understand a list of supported properties on `ov::Core` or `ov::CompiledModel` levels, use `ov::supported_properties` which contains a vector of supported property names. Properties which can be changed, has `ov::PropertyName::is_mutable` returning the `true` value. Most of the properites which are changable on `ov::Core` level, cannot be changed once the model is compiled, so it becomes immutable read-only property.
Configure a Work with a Model
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
#### Configure a Work with a Model
The ``ov::Core`` methods like:
The `ov::Core` methods like:
* `ov::Core::compile_model`
* `ov::Core::import_model`
* `ov::Core::query_model`
* ``ov::Core::compile_model``
* ``ov::Core::import_model``
* ``ov::Core::query_model``
accept a selection of properties as last arguments. Each of the properties should be used as a function call to pass a property value with a specified property type.
@sphinxtabset
@sphinxtab{C++}
.. tab-set::
@snippet docs/snippets/ov_properties_api.cpp compile_model_with_property
.. tab-item:: C++
:sync: cpp
@endsphinxtab
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [compile_model_with_property]
@sphinxtab{Python}
.. tab-item:: Python
:sync: py
@snippet docs/snippets/ov_properties_api.py compile_model_with_property
@endsphinxtab
@endsphinxtabset
The example below specifies hints that a model should be compiled to be inferred with multiple inference requests in parallel to achieve best throughput, while inference should be performed without accuracy loss with FP32 precision.
#### Setting Properties Globally
`ov::Core::set_property` with a given device name should be used to set global configuration properties, which are the same across multiple `ov::Core::compile_model`, `ov::Core::query_model`, and other calls. However, setting properties on a specific `ov::Core::compile_model` call applies properties only for the current call:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_properties_api.cpp core_set_property_then_compile
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_properties_api.py core_set_property_then_compile
@endsphinxtab
@endsphinxtabset
### Properties on CompiledModel Level
#### Getting Property
The `ov::CompiledModel::get_property` method is used to get property values the compiled model has been created with or a compiled model level property such as `ov::optimal_number_of_infer_requests`:
@sphinxtabset
@sphinxtab{C++}
@snippet docs/snippets/ov_properties_api.cpp optimal_number_of_infer_requests
@endsphinxtab
@sphinxtab{Python}
@snippet docs/snippets/ov_properties_api.py optimal_number_of_infer_requests
@endsphinxtab
@endsphinxtabset
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [compile_model_with_property]
Or the number of threads that would be used for inference on `CPU` device:
The example below specifies hints that a model should be compiled to be inferred with multiple inference requests in parallel
to achieve best throughput, while inference should be performed without accuracy loss with FP32 precision.
@sphinxtabset
Setting Properties Globally
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
@sphinxtab{C++}
``ov::Core::set_property`` with a given device name should be used to set global configuration properties,
which are the same across multiple ``ov::Core::compile_model``, ``ov::Core::query_model``, and other calls.
However, setting properties on a specific ``ov::Core::compile_model`` call applies properties only for the current call:
@snippet docs/snippets/ov_properties_api.cpp inference_num_threads
@endsphinxtab
.. tab-set::
@sphinxtab{Python}
.. tab-item:: C++
:sync: cpp
@snippet docs/snippets/ov_properties_api.py inference_num_threads
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [core_set_property_then_compile]
@endsphinxtab
.. tab-item:: Python
:sync: py
@endsphinxtabset
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [core_set_property_then_compile]
#### Setting Properties for Compiled Model
Properties on CompiledModel Level
###########################################################
Getting Property
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The ``ov::CompiledModel::get_property`` method is used to get property values the compiled model has been created with or a
compiled model level property such as ``ov::optimal_number_of_infer_requests``:
.. tab-set::
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [optimal_number_of_infer_requests]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [optimal_number_of_infer_requests]
Or the number of threads that would be used for inference on ``CPU`` device:
.. tab-set::
.. tab-item:: C++
:sync: cpp
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [inference_num_threads]
.. tab-item:: Python
:sync: py
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [inference_num_threads]
Setting Properties for Compiled Model
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The only mode that supports this method is [Multi-Device execution](../multi_device.md):
@sphinxtabset
@sphinxtab{C++}
.. tab-set::
@snippet docs/snippets/ov_properties_api.cpp multi_device
.. tab-item:: C++
:sync: cpp
@endsphinxtab
.. doxygensnippet:: docs/snippets/ov_properties_api.cpp
:language: cpp
:fragment: [multi_device]
@sphinxtab{Python}
.. tab-item:: Python
:sync: py
@snippet docs/snippets/ov_properties_api.py multi_device
.. doxygensnippet:: docs/snippets/ov_properties_api.py
:language: py
:fragment: [multi_device]
@endsphinxdirective
@endsphinxtab
@endsphinxtabset