[DOCS] NPU articles (#21430)

merging with the reservation that additional changes will be done in a follow-up PR
This commit is contained in:
Karol Blaszczak 2023-12-14 17:31:44 +01:00 committed by GitHub
parent f4b2f950f2
commit 6367206ea8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 237 additions and 116 deletions

View File

@ -3,70 +3,43 @@
Configurations for Intel® NPU with OpenVINO™
===============================================
.. meta::
:description: Learn how to provide additional configuration for Intel®
NPU to work with the OpenVINO™ toolkit on your system.
Drivers and Dependencies
########################
The Intel® NPU device requires a proper driver to be installed in the system.
Make sure you use the most recent supported driver for your hardware setup.
The Intel® NPU device requires a proper driver to be installed on the system.
.. tab-set::
.. tab-item:: Linux
The driver is maintained as open source and may be found in the following repository,
together with comprehensive information on installation and system requirements:
`github.com/intel/linux-npu-driver <https://github.com/intel/linux-npu-driver>`__
Linux
####################
It is recommended to check for the latest version of the driver.
Prerequisites
++++++++++++++++++++
Make sure you use a supported OS version, as well as install make, gcc,
and Linux kernel headers. To check the NPU state, use the ``dmesg``
command in the console. A successful boot-up of the NPU should give you
a message like this one:
Ensure that make, gcc, and Linux kernel headers are installed. Use the following command to install the required software:
``[ 797.193201] [drm] Initialized intel_vpu 0.<version number> for 0000:00:0b.0 on minor 0``
.. code-block:: sh
The current requirement for inference on NPU is Ubuntu 22.04 with the kernel
version of 6.6 or higher.
sudo apt-get install gcc make linux-headers-generic
Configuration steps
++++++++++++++++++++
Windows
####################
Intel® NPU driver for Windows is available through Windows Update.
Whats Next?
####################
Now you are ready to try out OpenVINO™. You can use the following tutorials to write your applications using Python and C/C++.
* Developing in Python:
* `Start with tensorflow models with OpenVINO™ <notebooks/101-tensorflow-to-openvino-with-output.html>`__
* `Start with ONNX and PyTorch models with OpenVINO™ <notebooks/102-pytorch-onnx-to-openvino-with-output.html>`__
* `Start with PaddlePaddle models with OpenVINO™ <notebooks/103-paddle-to-openvino-classification-with-output.html>`__
* Developing in C/C++:
* :doc:`Image Classification Async C++ Sample <openvino_inference_engine_samples_classification_sample_async_README>`
* :doc:`Hello Classification C++ Sample <openvino_inference_engine_samples_hello_classification_README>`
* :doc:`Hello Reshape SSD C++ Sample <openvino_inference_engine_samples_hello_reshape_ssd_README>`
.. tab-item:: Windows
The Intel® NPU driver for Windows is available through Windows Update but
it may also be installed manually by downloading the
`NPU driver package <https://www.intel.com/content/www/us/en/download-center/home.html>`__ and following the
`Windows driver installation guide <https://support.microsoft.com/en-us/windows/update-drivers-manually-in-windows-ec62f46c-ff14-c91d-eead-d7126dc1f7b6>`__.
If a driver has already been installed you should be able to find
'Intel(R) NPU Accelerator' in Windows Device Manager. If you
cannot find such a device, the NPU is most likely listed in "Other devices"
as "Multimedia Video Controller."

View File

@ -14,17 +14,18 @@ Inference Device Support
:maxdepth: 1
:hidden:
openvino_docs_OV_UG_query_api
openvino_docs_OV_UG_supported_plugins_CPU
openvino_docs_OV_UG_supported_plugins_GPU
openvino_docs_OV_UG_supported_plugins_NPU
openvino_docs_OV_UG_supported_plugins_GNA
openvino_docs_OV_UG_query_api
OpenVINO™ Runtime can infer deep learning models using the following device types:
* :doc:`CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
* :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>`
* :doc:`NPU <openvino_docs_OV_UG_supported_plugins_NPU>`
* :doc:`GNA <openvino_docs_OV_UG_supported_plugins_GNA>`
* :doc:`Arm® CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
@ -33,15 +34,14 @@ For a more detailed list of hardware, see :doc:`Supported Devices <openvino_docs
.. _devicesupport-feature-support-matrix:
Feature Support Matrix
#######################################
The table below demonstrates support of key features by OpenVINO device plugins.
========================================================================================= ============================ =============== ===============
Capability CPU GPU GNA
========================================================================================= ============================ =============== ===============
========================================================================================= ============================ ========== =========== ===========
Capability CPU GPU NPU GNA
========================================================================================= ============================ ========== =========== ===========
:doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>` Yes Yes No
:doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` Yes Yes Partial
:doc:`Automatic batching <openvino_docs_OV_UG_Automatic_Batching>` No Yes No
@ -52,7 +52,7 @@ The table below demonstrates support of key features by OpenVINO device plugins.
:doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>` Yes Yes No
:doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>` Yes No Yes
:doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>` Yes Yes No
========================================================================================= ============================ =============== ===============
========================================================================================= ============================ ========== =========== ===========
For more details on plugin-specific feature limitations, see the corresponding plugin pages.

View File

@ -3,27 +3,158 @@
NPU Device
==========
.. meta::
:description: The NPU plugin in the Intel® Distribution of OpenVINO™ toolkit
aims at high performance inference of neural
networks on the low-power NPU processing device.
:description: OpenVINO™ supports the Neural Processing Unit,
a low-power processing device dedicated to running AI inference.
NPU is a new generation of low-power processing unit dedicated to processing neural networks.
The NPU plugin is a core part of the OpenVINO™ toolkit. For its in-depth description, see:
The Neural Processing Unit is a low-power hardware solution, introduced with the
Intel® Core™ Ultra generation of CPUs (formerly known as Meteor Lake). It enables
you to offload certain neural network computation tasks from other devices,
for more streamlined resource management.
..
- `NPU plugin developer documentation < cmake_options_for_custom_compilation.md ??? >`__.
- `NPU plugin source files < ??? >`__.
For an in-depth description of the NPU plugin, see:
`NPU plugin developer documentation <https://github.com/openvinotoolkit/npu_plugin/blob/develop/docs/VPUX_DG/index.md>`__
`OpenVINO Runtime NPU plugin source files <https://github.com/openvinotoolkit/npu_plugin>`__
| **Supported Platforms:**
| Host: Intel® Core™ Ultra (former Meteor Lake)
| NPU device: NPU 3720
| OS: Ubuntu* 20, MS Windows* 11 (both 64-bit)
| **Supported Inference Data Types**
| The NPU plugin supports the following data types as inference precision of internal primitives:
| Floating-point data types: f32, f16O
| Quantized data types: u8 (quantized models may be int8 or mixed fp16-int8)
| Computation precision for the HW is fp16.
|
| For more details on how to get a quantized model, refer to the
:doc:`Model Optimization guide <openvino_docs_model_optimization_guide>`, and
:doc:`NNCF tool quantization guide <basic_quantization_flow>`.
Model Caching
#############################
Model Caching helps reduce application startup delays by exporting and reusing the compiled
model automatically. The following two compilation-related metrics are crucial in this area:
| **First Ever Inference Latency (FEIL)**
| Measures all steps required to compile and execute a model on the device for the
first time. It includes model compilation time, the time required to load and
initialize the model on the device and the first inference execution.
| **First Inference Latency (FIL)**
| Measures the time required to load and initialize the pre-compiled model on the
device and the first inference execution.
UMD Dynamic Model Caching
+++++++++++++++++++++++++++++
UMD model caching is a solution enabled by default in the current NPU driver.
It improves time to first inference (FIL) by storing the model in the cache
after the compilation (included in FEIL), based on a hash key. The process
may be summarized in three stages:
1. UMD generates the key from the input IR model and build arguments
2. UMD requests the DirectX Shader cache session to store the model
with the computed key.
3. All subsequent requests to compile the same IR model with the same arguments
use the pre-compiled model, reading it from the cache instead of recompiling.
OpenVINO Model Caching
+++++++++++++++++++++++++++++
OpenVINO Model Caching is a common mechanism for all OpenVINO device plugins and
can be enabled by setting the ``ov::cache_dir`` property. This way, the UMD model
caching is automatically bypassed by the NPU plugin, which means the model
will only be stored in the OpenVINO cache after compilation. When a cache hit
occurs for subsequent compilation requests, the plugin will import the model
instead of recompiling it.
For more details about OpenVINO model caching, see the
:doc:`Model Caching Overview <openvino_docs_OV_UG_Model_caching_overview>`.
Supported Features and properties
#######################################
The NPU device is currently supported by AUTO and MULTI inference modes.
The NPU support in OpenVINO is still under active development and may
offer a limited set of supported OpenVINO features.
**Supported Properties:**
.. tab-set::
.. tab-item:: Read-write properties
.. code-block::
ov::caching_properties
ov::enable_profiling
ov::hint::performance_mode
ov::hint::num_requests
ov::hint::model_priority
ov::hint::enable_cpu_pinning
ov::log::level
ov::device::id
ov::cache_dir
ov::internal::exclusive_async_requests
ov::intel_vpux::dpu_groups
ov::intel_vpux::dma_engines
ov::intel_vpux::compilation_mode
ov::intel_vpux::compilation_mode_params
ov::intel_vpux::print_profiling
ov::intel_vpux::profiling_output_file
ov::intel_vpux::vpux_platform
ov::intel_vpux::use_elf_compiler_backend
.. tab-item:: Read-only properties
.. code-block::
ov::supported_properties
ov::streams::num
ov::optimal_number_of_infer_requests
ov::range_for_async_infer_requests
ov::range_for_streams
ov::available_devices
ov::device::uuid
ov::device::architecture
ov::device::full_name
ov::intel_vpux::device_total_mem_size
ov::intel_vpux::driver_version
.. note::
The optimum number of inference requests returned by the plugin
based on the performance mode is **4 for THROUGHPUT** and **1 for LATENCY**.
The default mode for the NPU device is LATENCY.
Limitations
#############################
* Currently, only the models with static shapes are supported on NPU.
* If the path to the model file includes non-Unicode symbols, such as in Chinese,
the model cannot be used for inference on NPU. It will return an error.
* Running the Alexnet model with NPU may result in a drop in accuracy.
At this moment, the googlenet-v4 model is recommended for classification tasks.
Additional Resources
#############################
* `Vision colorization Notebook <notebooks/222-vision-image-colorization-with-output.html>`__
* `Classification Benchmark C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/classification_benchmark_demo/cpp>`__
* `3D Human Pose Estimation Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/3d_segmentation_demo/python>`__
* `Object Detection C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/cpp>`__
* `Object Detection Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/python>`__
* `POT-specific sample with sparse resnet-50 generation <https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/prune_and_quantize>`__

View File

@ -3,35 +3,37 @@
Query Device Properties - Configuration
=======================================
.. meta::
:description: Learn the details on the process of querying different device
properties and configuration values at runtime.
The OpenVINO™ toolkit supports inference with several types of devices (processors or accelerators).
This section provides a high-level description of the process of querying of different device properties and configuration values at runtime.
This article provides an overview of how to query different device properties
and configuration values at runtime.
OpenVINO runtime has two types of properties:
- Read only properties which provide information about the devices (such as device name or execution capabilities, etc.)
and information about configuration values used to compile the model (``ov::CompiledModel``) .
- Mutable properties which are primarily used to configure the ``ov::Core::compile_model`` process and affect final
inference on a specific set of devices. Such properties can be set globally per device via ``ov::Core::set_property``
or locally for particular model in the ``ov::Core::compile_model`` and the ``ov::Core::query_model`` calls.
- **Read only properties** which provide information about devices, such as device
name and execution capabilities, and information about configuration values
used to compile the model - ``ov::CompiledModel``.
- **Mutable properties**, primarily used to configure the ``ov::Core::compile_model``
process and affect final inference on a specific set of devices. Such properties
can be set globally per device via ``ov::Core::set_property`` or locally for a
particular model in the ``ov::Core::compile_model`` and ``ov::Core::query_model``
calls.
An OpenVINO property is represented as a named constexpr variable with a given string name and a type.
The following example represents a read-only property with a C++ name of ``ov::available_devices``,
a string name of ``AVAILABLE_DEVICES`` and a type of ``std::vector<std::string>``:
An OpenVINO property is represented as a named constexpr variable with a given string
name and a type. The following example represents a read-only property with the C++ name
of ``ov::available_devices``, the string name of ``AVAILABLE_DEVICES`` and the type of
``std::vector<std::string>``:
.. code-block:: sh
static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};
Refer to the :doc:`Hello Query Device С++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and
Refer to the :doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and
the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using
setting and getting properties in user applications.

View File

@ -3,6 +3,11 @@
Automatic Device Selection
==========================
.. meta::
:description: The Automatic Device Selection mode in OpenVINO™ Runtime
detects available devices and selects the optimal processing
unit for inference automatically.
.. toctree::
:maxdepth: 1
@ -10,34 +15,25 @@ Automatic Device Selection
Debugging Auto-Device Plugin <openvino_docs_OV_UG_supported_plugins_AUTO_debugging>
.. meta::
:description: The Automatic Device Selection mode in OpenVINO™ Runtime
detects available devices and selects the optimal processing
unit for inference automatically.
This article introduces how Automatic Device Selection works and how to use it for inference.
.. _how-auto-works:
How AUTO Works
##############
The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device,
which does not bind to a specific type of hardware, but rather selects the processing unit for inference automatically.
It detects available devices, picks the one best-suited for the task, and configures its optimization settings.
This way, you can write the application once and deploy it anywhere.
which does not bind to a specific type of hardware, but rather selects the processing unit
for inference automatically. It detects available devices, picks the one best-suited for the
task, and configures its optimization settings. This way, you can write the application once
and deploy it anywhere.
The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as device priority list limitations, if you choose to exclude some hardware from the process.
The selection also depends on your performance requirements, defined by the “hints”
configuration API, as well as device priority list limitations, if you choose to exclude
some hardware from the process.
The logic behind the choice is as follows:
1. Check what supported devices are available.
2. Check precisions of the input model (for detailed information on precisions read more on the ``ov::device::capabilities``).
3. Select the highest-priority device capable of supporting the given model, as listed in the table below.
4. If models precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
4. If model's precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
+----------+-----------------------------------------------------+------------------------------------+
@ -53,7 +49,18 @@ The logic behind the choice is as follows:
| 3 | Intel® CPU | FP32, FP16, INT8, BIN |
| | (e.g. Intel® Core™ i7-1165G7) | |
+----------+-----------------------------------------------------+------------------------------------+
| 4 | Intel® NPU | |
| | (e.g. Intel® Core™ Ultra) | |
+----------+-----------------------------------------------------+------------------------------------+
.. note::
Note that NPU is currently excluded from the default priority list. To use it for inference, you
need to specify it explicitly
How AUTO Works
##############
To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
@ -61,12 +68,19 @@ While the CPU is performing inference, AUTO continues to load the model to the d
This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages.
For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.
Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage. The models with dynamic input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` operations will be loaded to the CPU if it is in the candidate list. Otherwise, these models will follow the normal flow and be loaded to the device based on priority.
Note that if you choose to exclude CPU from the priority list or disable the initial
CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be
unable to support the initial model compilation stage. The models with dynamic
input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>`
operations will be loaded to the CPU if it is in the candidate list. Otherwise,
these models will follow the normal flow and be loaded to the device based on priority.
.. image:: _static/images/autoplugin_accelerate.svg
This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` section, showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example:
This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>`
section, showing how the first-inference latency (the time it takes to compile the
model and perform the first inference) is reduced when using AUTO. For example:
.. code-block:: sh
@ -88,8 +102,9 @@ This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app
Using AUTO
##########
Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO".
It may be defined with no additional parameters, resulting in defaults being used, or configured further with
the following setup options:
+----------------------------------------------+--------------------------------------------------------------------+
| Property(C++ version) | Values and Description |
@ -205,7 +220,6 @@ The code samples on this page assume following import(Python)/using (C++) are in
Device Candidates and Priority
++++++++++++++++++++++++++++++
The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO.
* If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
@ -496,3 +510,4 @@ Additional Resources
- :doc:`Running on Multiple Devices Simultaneously <openvino_docs_OV_UG_Running_on_multiple_devices>`
- :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`

View File

@ -1,3 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e0791abad48ec62d3ebcd111cf42139abe4bfb809c84882c0e8aa88ff7b430b7
size 85563
oid sha256:27bff5eb0b93754e6f8cff0ae294d0221cc9184a517d1991da06bea9cc272eb7
size 84550