[DOCS] NPU articles (#21430)

merging with the reservation that additional changes will be done in a follow-up PR
This commit is contained in:
Karol Blaszczak 2023-12-14 17:31:44 +01:00 committed by GitHub
parent f4b2f950f2
commit 6367206ea8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 237 additions and 116 deletions

View File

@ -3,70 +3,43 @@
Configurations for Intel® NPU with OpenVINO™ Configurations for Intel® NPU with OpenVINO™
=============================================== ===============================================
.. meta:: .. meta::
:description: Learn how to provide additional configuration for Intel® :description: Learn how to provide additional configuration for Intel®
NPU to work with the OpenVINO™ toolkit on your system. NPU to work with the OpenVINO™ toolkit on your system.
The Intel® NPU device requires a proper driver to be installed in the system.
Drivers and Dependencies Make sure you use the most recent supported driver for your hardware setup.
########################
The Intel® NPU device requires a proper driver to be installed on the system. .. tab-set::
.. tab-item:: Linux
The driver is maintained as open source and may be found in the following repository,
together with comprehensive information on installation and system requirements:
`github.com/intel/linux-npu-driver <https://github.com/intel/linux-npu-driver>`__
Linux It is recommended to check for the latest version of the driver.
####################
Prerequisites Make sure you use a supported OS version, as well as install make, gcc,
++++++++++++++++++++ and Linux kernel headers. To check the NPU state, use the ``dmesg``
command in the console. A successful boot-up of the NPU should give you
a message like this one:
Ensure that make, gcc, and Linux kernel headers are installed. Use the following command to install the required software: ``[ 797.193201] [drm] Initialized intel_vpu 0.<version number> for 0000:00:0b.0 on minor 0``
.. code-block:: sh The current requirement for inference on NPU is Ubuntu 22.04 with the kernel
version of 6.6 or higher.
sudo apt-get install gcc make linux-headers-generic .. tab-item:: Windows
Configuration steps
++++++++++++++++++++
Windows
####################
Intel® NPU driver for Windows is available through Windows Update.
Whats Next?
####################
Now you are ready to try out OpenVINO™. You can use the following tutorials to write your applications using Python and C/C++.
* Developing in Python:
* `Start with tensorflow models with OpenVINO™ <notebooks/101-tensorflow-to-openvino-with-output.html>`__
* `Start with ONNX and PyTorch models with OpenVINO™ <notebooks/102-pytorch-onnx-to-openvino-with-output.html>`__
* `Start with PaddlePaddle models with OpenVINO™ <notebooks/103-paddle-to-openvino-classification-with-output.html>`__
* Developing in C/C++:
* :doc:`Image Classification Async C++ Sample <openvino_inference_engine_samples_classification_sample_async_README>`
* :doc:`Hello Classification C++ Sample <openvino_inference_engine_samples_hello_classification_README>`
* :doc:`Hello Reshape SSD C++ Sample <openvino_inference_engine_samples_hello_reshape_ssd_README>`
The Intel® NPU driver for Windows is available through Windows Update but
it may also be installed manually by downloading the
`NPU driver package <https://www.intel.com/content/www/us/en/download-center/home.html>`__ and following the
`Windows driver installation guide <https://support.microsoft.com/en-us/windows/update-drivers-manually-in-windows-ec62f46c-ff14-c91d-eead-d7126dc1f7b6>`__.
If a driver has already been installed you should be able to find
'Intel(R) NPU Accelerator' in Windows Device Manager. If you
cannot find such a device, the NPU is most likely listed in "Other devices"
as "Multimedia Video Controller."

View File

@ -14,17 +14,18 @@ Inference Device Support
:maxdepth: 1 :maxdepth: 1
:hidden: :hidden:
openvino_docs_OV_UG_query_api
openvino_docs_OV_UG_supported_plugins_CPU openvino_docs_OV_UG_supported_plugins_CPU
openvino_docs_OV_UG_supported_plugins_GPU openvino_docs_OV_UG_supported_plugins_GPU
openvino_docs_OV_UG_supported_plugins_NPU openvino_docs_OV_UG_supported_plugins_NPU
openvino_docs_OV_UG_supported_plugins_GNA openvino_docs_OV_UG_supported_plugins_GNA
openvino_docs_OV_UG_query_api
OpenVINO™ Runtime can infer deep learning models using the following device types: OpenVINO™ Runtime can infer deep learning models using the following device types:
* :doc:`CPU <openvino_docs_OV_UG_supported_plugins_CPU>` * :doc:`CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
* :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>` * :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>`
* :doc:`NPU <openvino_docs_OV_UG_supported_plugins_NPU>`
* :doc:`GNA <openvino_docs_OV_UG_supported_plugins_GNA>` * :doc:`GNA <openvino_docs_OV_UG_supported_plugins_GNA>`
* :doc:`Arm® CPU <openvino_docs_OV_UG_supported_plugins_CPU>` * :doc:`Arm® CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
@ -33,15 +34,14 @@ For a more detailed list of hardware, see :doc:`Supported Devices <openvino_docs
.. _devicesupport-feature-support-matrix: .. _devicesupport-feature-support-matrix:
Feature Support Matrix Feature Support Matrix
####################################### #######################################
The table below demonstrates support of key features by OpenVINO device plugins. The table below demonstrates support of key features by OpenVINO device plugins.
========================================================================================= ============================ =============== =============== ========================================================================================= ============================ ========== =========== ===========
Capability CPU GPU GNA Capability CPU GPU NPU GNA
========================================================================================= ============================ =============== =============== ========================================================================================= ============================ ========== =========== ===========
:doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>` Yes Yes No :doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>` Yes Yes No
:doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` Yes Yes Partial :doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` Yes Yes Partial
:doc:`Automatic batching <openvino_docs_OV_UG_Automatic_Batching>` No Yes No :doc:`Automatic batching <openvino_docs_OV_UG_Automatic_Batching>` No Yes No
@ -52,7 +52,7 @@ The table below demonstrates support of key features by OpenVINO device plugins.
:doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>` Yes Yes No :doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>` Yes Yes No
:doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>` Yes No Yes :doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>` Yes No Yes
:doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>` Yes Yes No :doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>` Yes Yes No
========================================================================================= ============================ =============== =============== ========================================================================================= ============================ ========== =========== ===========
For more details on plugin-specific feature limitations, see the corresponding plugin pages. For more details on plugin-specific feature limitations, see the corresponding plugin pages.

View File

@ -3,27 +3,158 @@
NPU Device NPU Device
========== ==========
.. meta:: .. meta::
:description: The NPU plugin in the Intel® Distribution of OpenVINO™ toolkit :description: OpenVINO™ supports the Neural Processing Unit,
aims at high performance inference of neural a low-power processing device dedicated to running AI inference.
networks on the low-power NPU processing device.
NPU is a new generation of low-power processing unit dedicated to processing neural networks. The Neural Processing Unit is a low-power hardware solution, introduced with the
The NPU plugin is a core part of the OpenVINO™ toolkit. For its in-depth description, see: Intel® Core™ Ultra generation of CPUs (formerly known as Meteor Lake). It enables
you to offload certain neural network computation tasks from other devices,
for more streamlined resource management.
.. For an in-depth description of the NPU plugin, see:
- `NPU plugin developer documentation < cmake_options_for_custom_compilation.md ??? >`__.
- `NPU plugin source files < ??? >`__. `NPU plugin developer documentation <https://github.com/openvinotoolkit/npu_plugin/blob/develop/docs/VPUX_DG/index.md>`__
`OpenVINO Runtime NPU plugin source files <https://github.com/openvinotoolkit/npu_plugin>`__
| **Supported Platforms:**
| Host: Intel® Core™ Ultra (former Meteor Lake)
| NPU device: NPU 3720
| OS: Ubuntu* 20, MS Windows* 11 (both 64-bit)
| **Supported Inference Data Types**
| The NPU plugin supports the following data types as inference precision of internal primitives:
| Floating-point data types: f32, f16O
| Quantized data types: u8 (quantized models may be int8 or mixed fp16-int8)
| Computation precision for the HW is fp16.
|
| For more details on how to get a quantized model, refer to the
:doc:`Model Optimization guide <openvino_docs_model_optimization_guide>`, and
:doc:`NNCF tool quantization guide <basic_quantization_flow>`.
Model Caching
#############################
Model Caching helps reduce application startup delays by exporting and reusing the compiled
model automatically. The following two compilation-related metrics are crucial in this area:
| **First Ever Inference Latency (FEIL)**
| Measures all steps required to compile and execute a model on the device for the
first time. It includes model compilation time, the time required to load and
initialize the model on the device and the first inference execution.
| **First Inference Latency (FIL)**
| Measures the time required to load and initialize the pre-compiled model on the
device and the first inference execution.
UMD Dynamic Model Caching
+++++++++++++++++++++++++++++
UMD model caching is a solution enabled by default in the current NPU driver.
It improves time to first inference (FIL) by storing the model in the cache
after the compilation (included in FEIL), based on a hash key. The process
may be summarized in three stages:
1. UMD generates the key from the input IR model and build arguments
2. UMD requests the DirectX Shader cache session to store the model
with the computed key.
3. All subsequent requests to compile the same IR model with the same arguments
use the pre-compiled model, reading it from the cache instead of recompiling.
OpenVINO Model Caching
+++++++++++++++++++++++++++++
OpenVINO Model Caching is a common mechanism for all OpenVINO device plugins and
can be enabled by setting the ``ov::cache_dir`` property. This way, the UMD model
caching is automatically bypassed by the NPU plugin, which means the model
will only be stored in the OpenVINO cache after compilation. When a cache hit
occurs for subsequent compilation requests, the plugin will import the model
instead of recompiling it.
For more details about OpenVINO model caching, see the
:doc:`Model Caching Overview <openvino_docs_OV_UG_Model_caching_overview>`.
Supported Features and properties
#######################################
The NPU device is currently supported by AUTO and MULTI inference modes.
The NPU support in OpenVINO is still under active development and may
offer a limited set of supported OpenVINO features.
**Supported Properties:**
.. tab-set::
.. tab-item:: Read-write properties
.. code-block::
ov::caching_properties
ov::enable_profiling
ov::hint::performance_mode
ov::hint::num_requests
ov::hint::model_priority
ov::hint::enable_cpu_pinning
ov::log::level
ov::device::id
ov::cache_dir
ov::internal::exclusive_async_requests
ov::intel_vpux::dpu_groups
ov::intel_vpux::dma_engines
ov::intel_vpux::compilation_mode
ov::intel_vpux::compilation_mode_params
ov::intel_vpux::print_profiling
ov::intel_vpux::profiling_output_file
ov::intel_vpux::vpux_platform
ov::intel_vpux::use_elf_compiler_backend
.. tab-item:: Read-only properties
.. code-block::
ov::supported_properties
ov::streams::num
ov::optimal_number_of_infer_requests
ov::range_for_async_infer_requests
ov::range_for_streams
ov::available_devices
ov::device::uuid
ov::device::architecture
ov::device::full_name
ov::intel_vpux::device_total_mem_size
ov::intel_vpux::driver_version
.. note::
The optimum number of inference requests returned by the plugin
based on the performance mode is **4 for THROUGHPUT** and **1 for LATENCY**.
The default mode for the NPU device is LATENCY.
Limitations
#############################
* Currently, only the models with static shapes are supported on NPU.
* If the path to the model file includes non-Unicode symbols, such as in Chinese,
the model cannot be used for inference on NPU. It will return an error.
* Running the Alexnet model with NPU may result in a drop in accuracy.
At this moment, the googlenet-v4 model is recommended for classification tasks.
Additional Resources
#############################
* `Vision colorization Notebook <notebooks/222-vision-image-colorization-with-output.html>`__
* `Classification Benchmark C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/classification_benchmark_demo/cpp>`__
* `3D Human Pose Estimation Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/3d_segmentation_demo/python>`__
* `Object Detection C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/cpp>`__
* `Object Detection Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/python>`__
* `POT-specific sample with sparse resnet-50 generation <https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/prune_and_quantize>`__

View File

@ -3,35 +3,37 @@
Query Device Properties - Configuration Query Device Properties - Configuration
======================================= =======================================
.. meta:: .. meta::
:description: Learn the details on the process of querying different device :description: Learn the details on the process of querying different device
properties and configuration values at runtime. properties and configuration values at runtime.
The OpenVINO™ toolkit supports inference with several types of devices (processors or accelerators). This article provides an overview of how to query different device properties
This section provides a high-level description of the process of querying of different device properties and configuration values at runtime. and configuration values at runtime.
OpenVINO runtime has two types of properties: OpenVINO runtime has two types of properties:
- Read only properties which provide information about the devices (such as device name or execution capabilities, etc.) - **Read only properties** which provide information about devices, such as device
and information about configuration values used to compile the model (``ov::CompiledModel``) . name and execution capabilities, and information about configuration values
- Mutable properties which are primarily used to configure the ``ov::Core::compile_model`` process and affect final used to compile the model - ``ov::CompiledModel``.
inference on a specific set of devices. Such properties can be set globally per device via ``ov::Core::set_property`` - **Mutable properties**, primarily used to configure the ``ov::Core::compile_model``
or locally for particular model in the ``ov::Core::compile_model`` and the ``ov::Core::query_model`` calls. process and affect final inference on a specific set of devices. Such properties
can be set globally per device via ``ov::Core::set_property`` or locally for a
particular model in the ``ov::Core::compile_model`` and ``ov::Core::query_model``
calls.
An OpenVINO property is represented as a named constexpr variable with a given string name and a type. An OpenVINO property is represented as a named constexpr variable with a given string
The following example represents a read-only property with a C++ name of ``ov::available_devices``, name and a type. The following example represents a read-only property with the C++ name
a string name of ``AVAILABLE_DEVICES`` and a type of ``std::vector<std::string>``: of ``ov::available_devices``, the string name of ``AVAILABLE_DEVICES`` and the type of
``std::vector<std::string>``:
.. code-block:: sh .. code-block:: sh
static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"}; static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};
Refer to the :doc:`Hello Query Device С++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and Refer to the :doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and
the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using
setting and getting properties in user applications. setting and getting properties in user applications.

View File

@ -3,6 +3,11 @@
Automatic Device Selection Automatic Device Selection
========================== ==========================
.. meta::
:description: The Automatic Device Selection mode in OpenVINO™ Runtime
detects available devices and selects the optimal processing
unit for inference automatically.
.. toctree:: .. toctree::
:maxdepth: 1 :maxdepth: 1
@ -10,34 +15,25 @@ Automatic Device Selection
Debugging Auto-Device Plugin <openvino_docs_OV_UG_supported_plugins_AUTO_debugging> Debugging Auto-Device Plugin <openvino_docs_OV_UG_supported_plugins_AUTO_debugging>
.. meta::
:description: The Automatic Device Selection mode in OpenVINO™ Runtime
detects available devices and selects the optimal processing
unit for inference automatically.
This article introduces how Automatic Device Selection works and how to use it for inference.
.. _how-auto-works: .. _how-auto-works:
How AUTO Works
##############
The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device, The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device,
which does not bind to a specific type of hardware, but rather selects the processing unit for inference automatically. which does not bind to a specific type of hardware, but rather selects the processing unit
It detects available devices, picks the one best-suited for the task, and configures its optimization settings. for inference automatically. It detects available devices, picks the one best-suited for the
This way, you can write the application once and deploy it anywhere. task, and configures its optimization settings. This way, you can write the application once
and deploy it anywhere.
The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as device priority list limitations, if you choose to exclude some hardware from the process. The selection also depends on your performance requirements, defined by the “hints”
configuration API, as well as device priority list limitations, if you choose to exclude
some hardware from the process.
The logic behind the choice is as follows: The logic behind the choice is as follows:
1. Check what supported devices are available. 1. Check what supported devices are available.
2. Check precisions of the input model (for detailed information on precisions read more on the ``ov::device::capabilities``). 2. Check precisions of the input model (for detailed information on precisions read more on the ``ov::device::capabilities``).
3. Select the highest-priority device capable of supporting the given model, as listed in the table below. 3. Select the highest-priority device capable of supporting the given model, as listed in the table below.
4. If models precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16. 4. If model's precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
+----------+-----------------------------------------------------+------------------------------------+ +----------+-----------------------------------------------------+------------------------------------+
@ -53,7 +49,18 @@ The logic behind the choice is as follows:
| 3 | Intel® CPU | FP32, FP16, INT8, BIN | | 3 | Intel® CPU | FP32, FP16, INT8, BIN |
| | (e.g. Intel® Core™ i7-1165G7) | | | | (e.g. Intel® Core™ i7-1165G7) | |
+----------+-----------------------------------------------------+------------------------------------+ +----------+-----------------------------------------------------+------------------------------------+
| 4 | Intel® NPU | |
| | (e.g. Intel® Core™ Ultra) | |
+----------+-----------------------------------------------------+------------------------------------+
.. note::
Note that NPU is currently excluded from the default priority list. To use it for inference, you
need to specify it explicitly
How AUTO Works
##############
To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds. To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays. What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
@ -61,12 +68,19 @@ While the CPU is performing inference, AUTO continues to load the model to the d
This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages. This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages.
For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone. For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.
Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage. The models with dynamic input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` operations will be loaded to the CPU if it is in the candidate list. Otherwise, these models will follow the normal flow and be loaded to the device based on priority. Note that if you choose to exclude CPU from the priority list or disable the initial
CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be
unable to support the initial model compilation stage. The models with dynamic
input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>`
operations will be loaded to the CPU if it is in the candidate list. Otherwise,
these models will follow the normal flow and be loaded to the device based on priority.
.. image:: _static/images/autoplugin_accelerate.svg .. image:: _static/images/autoplugin_accelerate.svg
This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` section, showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example: This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>`
section, showing how the first-inference latency (the time it takes to compile the
model and perform the first inference) is reduced when using AUTO. For example:
.. code-block:: sh .. code-block:: sh
@ -88,8 +102,9 @@ This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app
Using AUTO Using AUTO
########## ##########
Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options: Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO".
It may be defined with no additional parameters, resulting in defaults being used, or configured further with
the following setup options:
+----------------------------------------------+--------------------------------------------------------------------+ +----------------------------------------------+--------------------------------------------------------------------+
| Property(C++ version) | Values and Description | | Property(C++ version) | Values and Description |
@ -205,7 +220,6 @@ The code samples on this page assume following import(Python)/using (C++) are in
Device Candidates and Priority Device Candidates and Priority
++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++
The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO. The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO.
* If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used. * If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
@ -496,3 +510,4 @@ Additional Resources
- :doc:`Running on Multiple Devices Simultaneously <openvino_docs_OV_UG_Running_on_multiple_devices>` - :doc:`Running on Multiple Devices Simultaneously <openvino_docs_OV_UG_Running_on_multiple_devices>`
- :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>` - :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`

View File

@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1 version https://git-lfs.github.com/spec/v1
oid sha256:e0791abad48ec62d3ebcd111cf42139abe4bfb809c84882c0e8aa88ff7b430b7 oid sha256:27bff5eb0b93754e6f8cff0ae294d0221cc9184a517d1991da06bea9cc272eb7
size 85563 size 84550