DOCS-cpu_language_review-port (#11537)

* DOCS-cpu_language_review

Co-Authored-By: Yuan Xu <yuan1.xu@intel.com>

* Update docs/OV_Runtime_UG/supported_plugins/CPU.md

* Update docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md

Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
This commit is contained in:
Karol Blaszczak 2022-04-27 13:07:38 +02:00 committed by GitHub
parent 54e5af95da
commit f5781b1255
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 47 additions and 51 deletions

View File

@ -1,19 +1,17 @@
# CPU device {#openvino_docs_OV_UG_supported_plugins_CPU}
The CPU plugin is developed to achieve high performance inference of neural networks on Intel® x86-64 CPUs.
For an in-depth description of CPU plugin, see
The CPU plugin is a part of the Intel® Distribution of OpenVINO™ toolkit and is developed to achieve high performance inference of neural networks on Intel® x86-64 CPUs.
For an in-depth description of the plugin, see:
- [CPU plugin developers documentation](https://github.com/openvinotoolkit/openvino/wiki/CPUPluginDevelopersDocs)
- [OpenVINO Runtime CPU plugin source files](https://github.com/openvinotoolkit/openvino/tree/master/src/plugins/intel_cpu/)
The CPU plugin is a part of the Intel® Distribution of OpenVINO™ toolkit.
## Device name
For the CPU plugin `"CPU"` device name is used, and even though there can be more than one socket on a platform, from the plugin's point of view, there is only one `"CPU"` device.
The CPU device plugin uses the label of `"CPU"` and is the only device of this kind, even if multiple sockets are present on the platform.
On multi-socket platforms, load balancing and memory usage distribution between NUMA nodes are handled automatically.
In order to use CPU for inference the device name should be passed to `ov::Core::compile_model()` method:
In order to use CPU for inference the device name should be passed to the `ov::Core::compile_model()` method:
@sphinxtabset
@ -28,7 +26,7 @@ In order to use CPU for inference the device name should be passed to `ov::Core:
@endsphinxtabset
## Supported inference data types
CPU plugin supports the following data types as inference precision of internal primitives:
The CPU device plugin supports the following data types as inference precision of internal primitives:
- Floating-point data types:
- f32
@ -40,28 +38,28 @@ CPU plugin supports the following data types as inference precision of internal
- i8
- u1
[Hello Query Device C++ Sample](../../../samples/cpp/hello_query_device/README.md) can be used to print out supported data types for all detected devices.
[Hello Query Device C++ Sample](../../../samples/cpp/hello_query_device/README.md) can be used to print out the supported data types for all detected devices.
### Quantized data types specifics
### Quantized data type specifics
Selected precision of each primitive depends on the operation precision in IR, quantization primitives, and available hardware capabilities.
u1/u8/i8 data types are used for quantized operations only, i.e. those are not selected automatically for non-quantized operations.
See [low-precision optimization guide](@ref openvino_docs_model_optimization_guide) for more details on how to get a quantized model.
See the [low-precision optimization guide](@ref openvino_docs_model_optimization_guide) for more details on how to get a quantized model.
> **NOTE**: Platforms that do not support Intel® AVX512-VNNI have a known "saturation issue" which in some cases leads to reduced computational accuracy for u8/i8 precision calculations.
> See [saturation (overflow) issue section](@ref pot_saturation_issue) to get more information on how to detect such issues and possible workarounds.
> See the [saturation (overflow) issue section](@ref pot_saturation_issue) to get more information on how to detect such issues and find possible workarounds.
### Floating point data types specifics
### Floating point data type specifics
Default floating-point precision of a CPU primitive is f32. To support f16 IRs the plugin internally converts all the f16 values to f32 and all the calculations are performed using native f32 precision.
On platforms that natively support bfloat16 calculations (have AVX512_BF16 extension) bf16 type is automatically used instead of f32 to achieve better performance, thus no special steps are required to run a model with bf16 precision.
See the [BFLOAT16 Hardware Numerics Definition white paper](https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf) for more details about bfloat16 format.
The default floating-point precision of a CPU primitive is f32. To support f16 IRs, the plugin internally converts all the f16 values to f32 and all the calculations are performed using the native f32 precision.
On platforms that natively support bfloat16 calculations (have AVX512_BF16 extension), the bf16 type is automatically used instead of f32 to achieve better performance, thus no special steps are required to run a model with bf16 precision.
See the [BFLOAT16 Hardware Numerics Definition white paper](https://software.intel.com/content/dam/develop/external/us/en/documents/bf16-hardware-numerics-definition-white-paper.pdf) for more details about bfloat16.
Using bf16 precision provides the following performance benefits:
Using bf16 provides the following performance benefits:
- Faster multiplication of two bfloat16 numbers because of shorter mantissa of the bfloat16 data.
- Reduced memory consumption since bfloat16 data size is two times less than 32-bit float.
- Faster multiplication of two bfloat16 numbers because of shorter mantissa of bfloat16 data.
- Reduced memory consumption since bfloat16 data is half the size of 32-bit float.
To check if the CPU device can support the bfloat16 data type use the [query device properties interface](./config_properties.md) to query ov::device::capabilities property, which should contain `BF16` in the list of CPU capabilities:
@ -77,11 +75,11 @@ To check if the CPU device can support the bfloat16 data type use the [query dev
@endsphinxtabset
In case if the model was converted to bf16, ov::hint::inference_precision is set to ov::element::bf16 and can be checked via ov::CompiledModel::get_property call. The code below demonstrates how to get the element type:
If the model has been converted to bf16, ov::hint::inference_precision is set to ov::element::bf16 and can be checked via ov::CompiledModel::get_property call. The code below demonstrates how to get the element type:
@snippet snippets/cpu/Bfloat16Inference1.cpp part1
To infer the model in f32 precision instead of bf16 on targets with native bf16 support, set the ov::hint::inference_precision to ov::element::f32.
To infer the model in f32 instead of bf16 on targets with native bf16 support, set the ov::hint::inference_precision to ov::element::f32.
@sphinxtabset
@ -95,17 +93,17 @@ To infer the model in f32 precision instead of bf16 on targets with native bf16
@endsphinxtabset
Bfloat16 software simulation mode is available on CPUs with Intel® AVX-512 instruction set that do not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, one have to explicitly set ov::hint::inference_precision to ov::element::bf16.
Bfloat16 software simulation mode is available on CPUs with Intel® AVX-512 instruction set which does not support the native `avx512_bf16` instruction. This mode is used for development purposes and it does not guarantee good performance.
To enable the simulation, you have to explicitly set ov::hint::inference_precision to ov::element::bf16.
> **NOTE**: An exception is thrown in case of setting ov::hint::inference_precision to ov::element::bf16 on CPU without native bfloat16 support or bfloat16 simulation mode.
> **NOTE**: An exception is thrown if ov::hint::inference_precision is set to ov::element::bf16 on a CPU without native bfloat16 support or bfloat16 simulation mode.
> **NOTE**: Due to the reduced mantissa size of the bfloat16 data type, the resulting bf16 inference accuracy may differ from the f32 inference, especially for models that were not trained using the bfloat16 data type. If the bf16 inference accuracy is not acceptable, it is recommended to switch to the f32 precision.
## Supported features
### Multi-device execution
If a machine has OpenVINO supported devices other than CPU (for example integrated GPU), then any supported model can be executed on CPU and all the other devices simultaneously.
If a machine has OpenVINO-supported devices other than the CPU (for example an integrated GPU), then any supported model can be executed on CPU and all the other devices simultaneously.
This can be achieved by specifying `"MULTI:CPU,GPU.0"` as a target device in case of simultaneous usage of CPU and GPU.
@sphinxtabset
@ -123,25 +121,24 @@ This can be achieved by specifying `"MULTI:CPU,GPU.0"` as a target device in cas
See [Multi-device execution page](../multi_device.md) for more details.
### Multi-stream execution
If either `ov::num_streams(n_streams)` with `n_streams > 1` or `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` property is set for CPU plugin,
then multiple streams are created for the model. In case of CPU plugin each stream has its own host thread which means that incoming infer requests can be processed simultaneously.
If either `ov::num_streams(n_streams)` with `n_streams > 1` or the `ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)` property is set for the CPU plugin, multiple streams are created for the model. In the case of the CPU plugin, each stream has its own host thread, which means that incoming infer requests can be processed simultaneously.
Each stream is pinned to its own group of physical cores with respect to NUMA nodes physical memory usage to minimize overhead on data transfer between NUMA nodes.
See [optimization guide](@ref openvino_docs_deployment_optimization_guide_dldt_optimization_guide) for more details.
> **NOTE**: When it comes to latency, one needs to keep in mind that running only one stream on multi-socket platform may introduce additional overheads on data transfer between NUMA nodes.
> **NOTE**: When it comes to latency, keep in mind that running only one stream on a multi-socket platform may introduce additional overheads on data transfer between NUMA nodes.
> In that case it is better to use ov::hint::PerformanceMode::LATENCY performance hint (please see [performance hints overview](@ref openvino_docs_OV_UG_Performance_Hints) for details).
### Dynamic shapes
CPU plugin provides full functional support for models with dynamic shapes in terms of the opset coverage.
The CPU device plugin provides full functional support for models with dynamic shapes in terms of the opset coverage.
> **NOTE**: CPU plugin does not support tensors with dynamically changing rank. In case of an attempt to infer a model with such tensors, an exception will be thrown.
> **NOTE**: CPU does not support tensors with a dynamically changing rank. If you try to infer a model with such tensors, an exception will be thrown.
Dynamic shapes support introduce some additional overheads on memory management and may limit internal runtime optimizations.
The more degrees of freedom we have, the more difficult it is to achieve the best performance.
The most flexible configuration is the fully undefined shape, when we do not apply any constraints to the shape dimensions, which is the most convenient approach.
But reducing the level of uncertainty will bring performance gains.
We can reduce memory consumption through memory reuse, and as a result achieve better cache locality, which in its turn leads to better inference performance, if we explicitly set dynamic shapes with defined upper bounds.
Dynamic shapes support introduces additional overhead on memory management and may limit internal runtime optimizations.
The more degrees of freedom are used, the more difficult it is to achieve the best performance.
The most flexible configuration and the most convenient approach is the fully undefined shape, where no constraints to the shape dimensions are applied.
But reducing the level of uncertainty brings gains in performance.
You can reduce memory consumption through memory reuse and achieve better cache locality, leading to better inference performance, if you explicitly set dynamic shapes with defined upper bounds.
@sphinxtabset
@ -156,9 +153,9 @@ We can reduce memory consumption through memory reuse, and as a result achieve b
@endsphinxtabset
> **NOTE**: Using fully undefined shapes may result in significantly higher memory consumption compared to inferring the same model with static shapes.
> If the memory consumption is unacceptable but dynamic shapes are still required, one can reshape the model using shapes with defined upper bound to reduce memory footprint.
> If the level of memory consumption is unacceptable but dynamic shapes are still required, you can reshape the model using shapes with defined upper bounds to reduce memory footprint.
Some runtime optimizations works better if the model shapes are known in advance.
Some runtime optimizations work better if the model shapes are known in advance.
Therefore, if the input data shape is not changed between inference calls, it is recommended to use a model with static shapes or reshape the existing model with the static input shape to get the best performance.
@sphinxtabset
@ -198,29 +195,28 @@ See [preprocessing API guide](../preprocessing_overview.md) for more details.
* boolean
@endsphinxdirective
### Models caching
CPU plugin supports Import/Export network capability. If the model caching is enabled via common OpenVINO™ `ov::cache_dir` property, the plugin will automatically create a cached blob inside the specified directory during model compilation.
This cached blob contains some intermediate representation of the network that it has after common runtime optimizations and low precision transformations.
The next time the model is compiled, the cached representation will be loaded to the plugin instead of the initial IR, so the aforementioned transformation steps will be skipped.
These transformations take a significant amount of time during model compilation, so caching this representation reduces time spent for subsequent compilations of the model,
thereby reducing first inference latency (FIL).
### Model caching
The CPU device plugin supports Import/Export network capability. If model caching is enabled via the common OpenVINO™ `ov::cache_dir` property, the plugin will automatically create a cached blob inside the specified directory during model compilation.
This cached blob contains partial representation of the network, having performed common runtime optimizations and low precision transformations.
At the next attempt to compile the model, the cached representation will be loaded to the plugin instead of the initial IR, so the aforementioned steps will be skipped.
These operations take a significant amount of time during model compilation, so caching their results makes subsequent compilations of the model much faster, thus reducing first inference latency (FIL).
See [model caching overview](@ref openvino_docs_OV_UG_Model_caching_overview) for more details.
### Extensibility
CPU plugin supports fallback on `ov::Op` reference implementation if the plugin do not have its own implementation for such operation.
That means that [OpenVINO™ Extensibility Mechanism](@ref openvino_docs_Extensibility_UG_Intro) can be used for the plugin extension as well.
To enable fallback on a custom operation implementation, one have to override `ov::Op::evaluate` method in the derived operation class (see [custom OpenVINO™ operations](@ref openvino_docs_Extensibility_UG_add_openvino_ops) for details).
The CPU device plugin supports fallback on `ov::Op` reference implementation if it lacks own implementation of such operation.
This means that [OpenVINO™ Extensibility Mechanism](@ref openvino_docs_Extensibility_UG_Intro) can be used for the plugin extension as well.
To enable fallback on a custom operation implementation, override the `ov::Op::evaluate` method in the derived operation class (see [custom OpenVINO™ operations](@ref openvino_docs_Extensibility_UG_add_openvino_ops) for details).
> **NOTE**: At the moment, custom operations with internal dynamism (when the output tensor shape can only be determined as a result of performing the operation) are not supported by the plugin.
### Stateful models
CPU plugin supports stateful models without any limitations.
The CPU device plugin supports stateful models without any limitations.
See [stateful models guide](@ref openvino_docs_OV_UG_network_state_intro) for details.
## Supported properties
The plugin supports the properties listed below.
The plugin supports the following properties:
### Read-write properties
All parameters must be set before calling `ov::Core::compile_model()` in order to take effect or passed as additional argument to `ov::Core::compile_model()`

View File

@ -25,16 +25,16 @@ The OpenVINO Runtime provides capabilities to infer deep learning models on the
|[GNA](GNA.md) |[Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html); [Amazon Alexa\* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice); [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx (formerly codenamed Gemini Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html): [Intel® Pentium® Silver J5005 Processor](https://ark.intel.com/content/www/us/en/ark/products/128984/intel-pentium-silver-j5005-processor-4m-cache-up-to-2-80-ghz.html), [Intel® Pentium® Silver N5000 Processor](https://ark.intel.com/content/www/us/en/ark/products/128990/intel-pentium-silver-n5000-processor-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® J4005 Processor](https://ark.intel.com/content/www/us/en/ark/products/128992/intel-celeron-j4005-processor-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® J4105 Processor](https://ark.intel.com/content/www/us/en/ark/products/128989/intel-celeron-j4105-processor-4m-cache-up-to-2-50-ghz.html), [Intel® Celeron® J4125 Processor](https://ark.intel.com/content/www/us/en/ark/products/197305/intel-celeron-processor-j4125-4m-cache-up-to-2-70-ghz.html), [Intel® Celeron® Processor N4100](https://ark.intel.com/content/www/us/en/ark/products/128983/intel-celeron-processor-n4100-4m-cache-up-to-2-40-ghz.html), [Intel® Celeron® Processor N4000](https://ark.intel.com/content/www/us/en/ark/products/128988/intel-celeron-processor-n4000-4m-cache-up-to-2-60-ghz.html); [Intel® Pentium® Processors N6xxx, J6xxx, Intel® Celeron® Processors N6xxx, J6xxx and Intel Atom® x6xxxxx (formerly codenamed Elkhart Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/128825/products-formerly-elkhart-lake.html); [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html); [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html): [Intel® Core™ i7-1065G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/196597/intel-core-i71065g7-processor-8m-cache-up-to-3-90-ghz.html), [Intel® Core™ i7-1060G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/197120/intel-core-i71060g7-processor-8m-cache-up-to-3-80-ghz.html), [Intel® Core™ i5-1035G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/196591/intel-core-i51035g4-processor-6m-cache-up-to-3-70-ghz.html), [Intel® Core™ i5-1035G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/196592/intel-core-i51035g7-processor-6m-cache-up-to-3-70-ghz.html), [Intel® Core™ i5-1035G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/196603/intel-core-i51035g1-processor-6m-cache-up-to-3-60-ghz.html), [Intel® Core™ i5-1030G7 Processor](https://ark.intel.com/content/www/us/en/ark/products/197119/intel-core-i51030g7-processor-6m-cache-up-to-3-50-ghz.html), [Intel® Core™ i5-1030G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/197121/intel-core-i51030g4-processor-6m-cache-up-to-3-50-ghz.html), [Intel® Core™ i3-1005G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/196588/intel-core-i31005g1-processor-4m-cache-up-to-3-40-ghz.html), [Intel® Core™ i3-1000G1 Processor](https://ark.intel.com/content/www/us/en/ark/products/197122/intel-core-i31000g1-processor-4m-cache-up-to-3-20-ghz.html), [Intel® Core™ i3-1000G4 Processor](https://ark.intel.com/content/www/us/en/ark/products/197123/intel-core-i31000g4-processor-4m-cache-up-to-3-20-ghz.html); [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html); [12th Generation Intel® Core™ Processors (formerly codenamed Alder Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/147470/products-formerly-alder-lake.html)|
|[Arm® CPU](ARM_CPU.md) |Raspberry Pi™ 4 Model B, Apple® Mac mini with M1 chip, NVIDIA® Jetson Nano™, Android™ devices |
OpenVINO runtime also has several execution capabilities which work on top of other devices:
OpenVINO Runtime also offers several execution modes which work on top of other devices:
| Capability | Description |
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
|[Multi-Device execution](../multi_device.md) |Multi-Device enables simultaneous inference of the same model on several devices in parallel |
|[Auto-Device selection](../auto_device_selection.md) |Auto-Device selection enables selecting Intel&reg; device for inference automatically |
|[Heterogeneous execution](../hetero_execution.md) |Heterogeneous execution enables automatic inference splitting between several devices (for example if a device doesn't [support certain operation](#supported-layers))|
|[Automatic Batching](../automatic_batching.md) | Auto-Batching plugin enables the batching (on top of the specified device) that is completely transparent to the application |
|[Automatic Batching](../automatic_batching.md) | the Auto-Batching plugin enables batching (on top of the specified device) that is completely transparent to the application |
Devices similar to the ones we have used for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/).
Devices similar to the ones we use for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/).
@anchor features_support_matrix
## Feature Support Matrix
@ -53,4 +53,4 @@ The table below demonstrates support of key features by OpenVINO device plugins.
| [Stateful models](../network_state_intro.md) | Yes | No | Yes | No |
| [Extensibility](@ref openvino_docs_Extensibility_UG_Intro) | Yes | Yes | No | No |
For more details on plugin specific feature limitation, see corresponding plugin pages.
For more details on plugin-specific feature limitations, refer to the corresponding plugin pages.