[DOCS] NPU articles (#21430)

merging with the reservation that additional changes will be done in a follow-up PR
2023-12-14 17:31:44 +01:00 · 2023-12-14 17:31:44 +01:00 · 6367206ea8
commit 6367206ea8
parent f4b2f950f2
6 changed files with 237 additions and 116 deletions
--- a/docs/articles_en/get_started/configurations-header/configurations-for-intel-npu.rst
+++ b/docs/articles_en/get_started/configurations-header/configurations-for-intel-npu.rst
@ -3,70 +3,43 @@
 Configurations for Intel® NPU with OpenVINO™
 ===============================================

-
 .. meta::
   :description: Learn how to provide additional configuration for Intel® 
                 NPU to work with the OpenVINO™ toolkit on your system.


-
-Drivers and Dependencies
-########################
+The Intel® NPU device requires a proper driver to be installed in the system.
+Make sure you use the most recent supported driver for your hardware setup.


-The Intel® NPU device requires a proper driver to be installed on the system.
+.. tab-set::

+   .. tab-item:: Linux

+      The driver is maintained as open source and may be found in the following repository,
+      together with comprehensive information on installation and system requirements:
+      `github.com/intel/linux-npu-driver <https://github.com/intel/linux-npu-driver>`__
      
-Linux
-####################
+      It is recommended to check for the latest version of the driver.

-Prerequisites
-++++++++++++++++++++
+      Make sure you use a supported OS version, as well as install make, gcc,
+      and Linux kernel headers. To check the NPU state, use the ``dmesg``
+      command in the console. A successful boot-up of the NPU should give you
+      a message like this one:
      
-Ensure that make, gcc, and Linux kernel headers are installed. Use the following command to install the required software:
+      ``[  797.193201] [drm] Initialized intel_vpu 0.<version number> for 0000:00:0b.0 on minor 0``

-.. code-block:: sh
+      The current requirement for inference on NPU is Ubuntu 22.04 with the kernel
+      version of 6.6 or higher.

-   sudo apt-get install gcc make linux-headers-generic
-
-
-Configuration steps
-++++++++++++++++++++
-
-
-
-
-
-
-
-
-
-
-
-Windows
-####################
-
-Intel® NPU driver for Windows is available through Windows Update.
-
-
-
-
-What’s Next?
-####################
-
-Now you are ready to try out OpenVINO™. You can use the following tutorials to write your applications using Python and C/C++.
-
-* Developing in Python:
-
-  * `Start with tensorflow models with OpenVINO™ <notebooks/101-tensorflow-to-openvino-with-output.html>`__
-  * `Start with ONNX and PyTorch models with OpenVINO™ <notebooks/102-pytorch-onnx-to-openvino-with-output.html>`__
-  * `Start with PaddlePaddle models with OpenVINO™ <notebooks/103-paddle-to-openvino-classification-with-output.html>`__
-
-* Developing in C/C++:
-
-  * :doc:`Image Classification Async C++ Sample <openvino_inference_engine_samples_classification_sample_async_README>`
-  * :doc:`Hello Classification C++ Sample <openvino_inference_engine_samples_hello_classification_README>`
-  * :doc:`Hello Reshape SSD C++ Sample <openvino_inference_engine_samples_hello_reshape_ssd_README>`
+   .. tab-item:: Windows

+      The Intel® NPU driver for Windows is available through Windows Update but
+      it may also be installed manually by downloading the 
+      `NPU driver package <https://www.intel.com/content/www/us/en/download-center/home.html>`__ and following the 
+      `Windows driver installation guide <https://support.microsoft.com/en-us/windows/update-drivers-manually-in-windows-ec62f46c-ff14-c91d-eead-d7126dc1f7b6>`__.

+      If a driver has already been installed you should be able to find 
+      'Intel(R) NPU Accelerator' in Windows Device Manager. If you 
+      cannot find such a device, the NPU is most likely listed in "Other devices"
+      as "Multimedia Video Controller."
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
@ -14,17 +14,18 @@ Inference Device Support
   :maxdepth: 1
   :hidden:

-   openvino_docs_OV_UG_query_api
   openvino_docs_OV_UG_supported_plugins_CPU
   openvino_docs_OV_UG_supported_plugins_GPU
   openvino_docs_OV_UG_supported_plugins_NPU
   openvino_docs_OV_UG_supported_plugins_GNA
+   openvino_docs_OV_UG_query_api


 OpenVINO™ Runtime can infer deep learning models using the following device types:

 * :doc:`CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
 * :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>`
+* :doc:`NPU <openvino_docs_OV_UG_supported_plugins_NPU>`
 * :doc:`GNA <openvino_docs_OV_UG_supported_plugins_GNA>`
 * :doc:`Arm® CPU <openvino_docs_OV_UG_supported_plugins_CPU>`

@ -33,15 +34,14 @@ For a more detailed list of hardware, see :doc:`Supported Devices <openvino_docs
 .. _devicesupport-feature-support-matrix:


-
 Feature Support Matrix
 #######################################

 The table below demonstrates support of key features by OpenVINO device plugins.

-========================================================================================= ============================ =============== ===============
- Capability                                                                                CPU                         GPU             GNA            
-========================================================================================= ============================ =============== ===============
+========================================================================================= ============================ ========== =========== ===========  
+ Capability                                                                                CPU                         GPU        NPU         GNA          
+========================================================================================= ============================ ========== =========== ===========  
 :doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>`                     Yes                         Yes                    No           
 :doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>`           Yes                         Yes                    Partial      
 :doc:`Automatic batching <openvino_docs_OV_UG_Automatic_Batching>`                        No                          Yes                    No           
@ -52,7 +52,7 @@ The table below demonstrates support of key features by OpenVINO device plugins.
 :doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>`            Yes                         Yes                    No           
 :doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>`                            Yes                         No                     Yes          
 :doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>`                               Yes                         Yes                    No           
-========================================================================================= ============================ =============== ===============
+========================================================================================= ============================ ========== =========== ===========  

 For more details on plugin-specific feature limitations, see the corresponding plugin pages.

--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
@ -3,27 +3,158 @@
 NPU Device
 ==========

-
 .. meta::
-   :description: The NPU plugin in the Intel® Distribution of OpenVINO™ toolkit 
-                 aims at high performance inference of neural 
-                 networks on the low-power NPU processing device.
+   :description: OpenVINO™ supports the Neural Processing Unit,  
+                 a low-power processing device dedicated to running AI inference.


-NPU is a new generation of low-power processing unit dedicated to processing neural networks. 
-The NPU plugin is a core part of the OpenVINO™ toolkit. For its in-depth description, see:
+The Neural Processing Unit is a low-power hardware solution, introduced with the
+Intel® Core™ Ultra generation of CPUs (formerly known as Meteor Lake). It enables
+you to offload certain neural network computation tasks from other devices,
+for more streamlined resource management.

-..
-   - `NPU plugin developer documentation <    cmake_options_for_custom_compilation.md ???       >`__.
-   - `NPU plugin source files <              ???                  >`__.
+For an in-depth description of the NPU plugin, see:
+
+•	`NPU plugin developer documentation <https://github.com/openvinotoolkit/npu_plugin/blob/develop/docs/VPUX_DG/index.md>`__
+•	`OpenVINO Runtime NPU plugin source files <https://github.com/openvinotoolkit/npu_plugin>`__
+
+
+| **Supported Platforms:**
+|   Host: Intel® Core™ Ultra (former Meteor Lake)
+|   NPU device: NPU 3720
+|   OS: Ubuntu* 20, MS Windows* 11 (both 64-bit)
+		
+
+| **Supported Inference Data Types**
+| The NPU plugin supports the following data types as inference precision of internal primitives:
+|    Floating-point data types: f32, f16O
+|    Quantized data types: u8 (quantized models may be int8 or mixed fp16-int8)
+|    Computation precision for the HW is fp16.
+|
+| For more details on how to get a quantized model, refer to the
+  :doc:`Model Optimization guide <openvino_docs_model_optimization_guide>`, and 
+  :doc:`NNCF tool quantization guide <basic_quantization_flow>`.



+Model Caching
+#############################
+
+Model Caching helps reduce application startup delays by exporting and reusing the compiled
+model automatically. The following two compilation-related metrics are crucial in this area:
+
+| **First Ever Inference Latency (FEIL)**
+|   Measures all steps required to compile and execute a model on the device for the
+    first time. It includes model compilation time, the time required to load and
+    initialize the model on the device and the first inference execution.
+| **First Inference Latency (FIL)**
+|   Measures the time required to load and initialize the pre-compiled model on the
+    device and the first inference execution.


+UMD Dynamic Model Caching
+++++++++++++++++++++++++++++
+
+UMD model caching is a solution enabled by default in the current NPU driver.
+It improves time to first inference (FIL) by storing the model in the cache 
+after the compilation (included in FEIL), based on a hash key. The process
+may be summarized in three stages:
+
+1. UMD generates the key from the input IR model and build arguments
+2. UMD requests the DirectX Shader cache session to store the model
+   with the computed key. 
+3. All subsequent requests to compile the same IR model with the same arguments
+   use the pre-compiled model, reading it from the cache instead of recompiling.


+OpenVINO Model Caching
+++++++++++++++++++++++++++++
+
+OpenVINO Model Caching is a common mechanism for all OpenVINO device plugins and
+can be enabled by setting the ``ov::cache_dir`` property. This way, the UMD model
+caching is automatically bypassed by the NPU plugin, which means the model
+will only be stored in the OpenVINO cache after compilation. When a cache hit 
+occurs for subsequent compilation requests, the plugin will import the model
+instead of recompiling it.
+
+For more details about OpenVINO model caching, see the
+:doc:`Model Caching Overview <openvino_docs_OV_UG_Model_caching_overview>`.


+Supported Features and properties
+#######################################
+
+The NPU device is currently supported by AUTO and MULTI inference modes.
+
+The NPU support in OpenVINO is still under active development and may 
+offer a limited set of supported OpenVINO features. 
+
+**Supported Properties:**
+
+.. tab-set::
+
+   .. tab-item:: Read-write properties
+
+      .. code-block::
+
+         ov::caching_properties
+         ov::enable_profiling
+         ov::hint::performance_mode
+         ov::hint::num_requests
+         ov::hint::model_priority
+         ov::hint::enable_cpu_pinning
+         ov::log::level
+         ov::device::id
+         ov::cache_dir
+         ov::internal::exclusive_async_requests
+         ov::intel_vpux::dpu_groups
+         ov::intel_vpux::dma_engines
+         ov::intel_vpux::compilation_mode
+         ov::intel_vpux::compilation_mode_params 
+         ov::intel_vpux::print_profiling 
+         ov::intel_vpux::profiling_output_file 
+         ov::intel_vpux::vpux_platform 
+         ov::intel_vpux::use_elf_compiler_backend 
+
+   .. tab-item:: Read-only properties
+
+      .. code-block::
+
+         ov::supported_properties
+         ov::streams::num
+         ov::optimal_number_of_infer_requests
+         ov::range_for_async_infer_requests
+         ov::range_for_streams
+         ov::available_devices
+         ov::device::uuid
+         ov::device::architecture
+         ov::device::full_name
+         ov::intel_vpux::device_total_mem_size
+         ov::intel_vpux::driver_version
+
+.. note:: 
+
+   The optimum number of inference requests returned by the plugin
+   based on the performance mode is **4 for THROUGHPUT** and **1 for LATENCY**.
+   The default mode for the NPU device is LATENCY.


+Limitations
+#############################
+
+* Currently, only the models with static shapes are supported on NPU. 
+* If the path to the model file includes non-Unicode symbols, such as in Chinese,
+  the model cannot be used for inference on NPU. It will return an error. 
+* Running the Alexnet model with NPU may result in a drop in accuracy. 
+  At this moment, the googlenet-v4 model is recommended for classification tasks.
+
+
+Additional Resources
+#############################
+
+* `Vision colorization Notebook <notebooks/222-vision-image-colorization-with-output.html>`__
+* `Classification Benchmark C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/classification_benchmark_demo/cpp>`__
+* `3D Human Pose Estimation Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/3d_segmentation_demo/python>`__
+* `Object Detection C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/cpp>`__
+* `Object Detection Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/python>`__
+* `POT-specific sample with sparse resnet-50 generation <https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/prune_and_quantize>`__
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
@ -3,35 +3,37 @@
 Query Device Properties - Configuration
 =======================================

-
-
 .. meta::
   :description: Learn the details on the process of querying different device 
                 properties and configuration values at runtime.


-The OpenVINO™ toolkit supports inference with several types of devices (processors or accelerators).
-This section provides a high-level description of the process of querying of different device properties and configuration values at runtime.
+This article provides an overview of how to query different device properties
+and configuration values at runtime.

 OpenVINO runtime has two types of properties:

- Read only properties which provide information about the devices (such as device name or execution capabilities, etc.)
-  and information about configuration values used to compile the model (``ov::CompiledModel``) .
- Mutable properties which are primarily used to configure the ``ov::Core::compile_model`` process and affect final 
-  inference on a specific set of devices. Such properties can be set globally per device via ``ov::Core::set_property`` 
-  or locally for particular model in the ``ov::Core::compile_model`` and the ``ov::Core::query_model`` calls.
+- **Read only properties** which provide information about devices, such as device
+  name and execution capabilities, and information about configuration values
+  used to compile the model - ``ov::CompiledModel``.
+- **Mutable properties**, primarily used to configure the ``ov::Core::compile_model``
+  process and affect final inference on a specific set of devices. Such properties 
+  can be set globally per device via ``ov::Core::set_property`` or locally for a
+  particular model in the ``ov::Core::compile_model`` and ``ov::Core::query_model``
+  calls.


-An OpenVINO property is represented as a named constexpr variable with a given string name and a type. 
-The following example represents a read-only property with a C++ name of ``ov::available_devices``, 
-a string name of ``AVAILABLE_DEVICES`` and a type of ``std::vector<std::string>``:
+An OpenVINO property is represented as a named constexpr variable with a given string
+name and a type. The following example represents a read-only property with the C++ name
+of ``ov::available_devices``, the string name of ``AVAILABLE_DEVICES`` and the type of
+``std::vector<std::string>``:

 .. code-block:: sh
   
   static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};


-Refer to the :doc:`Hello Query Device С++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and 
+Refer to the :doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and 
 the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using 
 setting and getting properties in user applications.

--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
@ -3,6 +3,11 @@
 Automatic Device Selection
 ==========================

+.. meta::
+   :description: The Automatic Device Selection mode in OpenVINO™ Runtime 
+                 detects available devices and selects the optimal processing 
+                 unit for inference automatically.
+

 .. toctree::
   :maxdepth: 1
@ -10,34 +15,25 @@ Automatic Device Selection

   Debugging Auto-Device Plugin <openvino_docs_OV_UG_supported_plugins_AUTO_debugging>

-.. meta::
-   :description: The Automatic Device Selection mode in OpenVINO™ Runtime 
-                 detects available devices and selects the optimal processing 
-                 unit for inference automatically.
-
-
-This article introduces how Automatic Device Selection works and how to use it for inference.
-

 .. _how-auto-works:

-
-How AUTO Works
-##############
-
 The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device,
-which does not bind to a specific type of hardware, but rather selects the processing unit for inference automatically. 
-It detects available devices, picks the one best-suited for the task, and configures its optimization settings. 
-This way, you can write the application once and deploy it anywhere.
+which does not bind to a specific type of hardware, but rather selects the processing unit
+for inference automatically. It detects available devices, picks the one best-suited for the
+task, and configures its optimization settings. This way, you can write the application once
+and deploy it anywhere.

-The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as device priority list limitations, if you choose to exclude some hardware from the process.
+The selection also depends on your performance requirements, defined by the “hints”
+configuration API, as well as device priority list limitations, if you choose to exclude
+some hardware from the process.

 The logic behind the choice is as follows:

 1. Check what supported devices are available.
 2. Check precisions of the input model (for detailed information on precisions read more on the ``ov::device::capabilities``).
 3. Select the highest-priority device capable of supporting the given model, as listed in the table below.
-4. If model’s precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
+4. If model's precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.


 +----------+-----------------------------------------------------+------------------------------------+
@ -53,7 +49,18 @@ The logic behind the choice is as follows:
 | 3        | Intel® CPU                                          | FP32, FP16, INT8, BIN              |
 |          | (e.g. Intel® Core™ i7-1165G7)                       |                                    |
 +----------+-----------------------------------------------------+------------------------------------+
+| 4        | Intel® NPU                                          |                                    |
+|          | (e.g. Intel® Core™ Ultra)                           |                                    |
+----------+-----------------------------------------------------+------------------------------------+

+.. note:: 
+
+   Note that NPU is currently excluded from the default priority list. To use it for inference, you
+   need to specify it explicitly
+   
+
+How AUTO Works
+##############

 To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
 What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
@ -61,12 +68,19 @@ While the CPU is performing inference, AUTO continues to load the model to the d
 This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages.
 For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.

-Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage. The models with dynamic input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` operations will be loaded to the CPU if it is in the candidate list. Otherwise, these models will follow the normal flow and be loaded to the device based on priority.
+Note that if you choose to exclude CPU from the priority list or disable the initial
+CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be
+unable to support the initial model compilation stage. The models with dynamic
+input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` 
+operations will be loaded to the CPU if it is in the candidate list. Otherwise, 
+these models will follow the normal flow and be loaded to the device based on priority.

 .. image:: _static/images/autoplugin_accelerate.svg


-This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` section, showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example:
+This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` 
+section, showing how the first-inference latency (the time it takes to compile the
+model and perform the first inference) is reduced when using AUTO. For example:


 .. code-block:: sh
@ -88,8 +102,9 @@ This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app
 Using AUTO
 ##########

-Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
-
+Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". 
+It may be defined with no additional parameters, resulting in defaults being used, or configured further with
+the following setup options:

 +----------------------------------------------+--------------------------------------------------------------------+
 | Property(C++ version)                        | Values and Description                                             |
@ -205,7 +220,6 @@ The code samples on this page assume following import(Python)/using (C++) are in
 Device Candidates and Priority
 ++++++++++++++++++++++++++++++

-
 The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO.

 * If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
@ -496,3 +510,4 @@ Additional Resources
 - :doc:`Running on Multiple Devices Simultaneously <openvino_docs_OV_UG_Running_on_multiple_devices>`
 - :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`

+
--- a/docs/sphinx_setup/_static/images/ov_homepage_diagram.png
+++ b/docs/sphinx_setup/_static/images/ov_homepage_diagram.png
@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e0791abad48ec62d3ebcd111cf42139abe4bfb809c84882c0e8aa88ff7b430b7
-size 85563
+oid sha256:27bff5eb0b93754e6f8cff0ae294d0221cc9184a517d1991da06bea9cc272eb7
+size 84550