[DOCS] NPU articles (#21430)

merging with the reservation that additional changes will be done in a follow-up PR
2023-12-14 17:31:44 +01:00 · 2023-12-14 17:31:44 +01:00 · 6367206ea8
commit 6367206ea8
parent f4b2f950f2
6 changed files with 237 additions and 116 deletions
--- a/docs/articles_en/get_started/configurations-header/configurations-for-intel-npu.rst
+++ b/docs/articles_en/get_started/configurations-header/configurations-for-intel-npu.rst
@ -3,70 +3,43 @@
 Configurations for Intel® NPU with OpenVINO™
 ===============================================
 .. meta::
   :description: Learn how to provide additional configuration for Intel® 
                 NPU to work with the OpenVINO™ toolkit on your system.
-
+The Intel® NPU device requires a proper driver to be installed in the system.
-Drivers and Dependencies
+Make sure you use the most recent supported driver for your hardware setup.
 ########################
-The Intel® NPU device requires a proper driver to be installed on the system.
+.. tab-set::
   .. tab-item:: Linux
      The driver is maintained as open source and may be found in the following repository,
      together with comprehensive information on installation and system requirements:
      `github.com/intel/linux-npu-driver <https://github.com/intel/linux-npu-driver>`__
-Linux
+      It is recommended to check for the latest version of the driver.
 ####################
-Prerequisites
+      Make sure you use a supported OS version, as well as install make, gcc,
-++++++++++++++++++++
+      and Linux kernel headers. To check the NPU state, use the ``dmesg``
      command in the console. A successful boot-up of the NPU should give you
      a message like this one:
-Ensure that make, gcc, and Linux kernel headers are installed. Use the following command to install the required software:
+      ``[  797.193201] [drm] Initialized intel_vpu 0.<version number> for 0000:00:0b.0 on minor 0``
-.. code-block:: sh
+      The current requirement for inference on NPU is Ubuntu 22.04 with the kernel
      version of 6.6 or higher.
-   sudo apt-get install gcc make linux-headers-generic
+   .. tab-item:: Windows
 Configuration steps
 ++++++++++++++++++++
 Windows
 ####################
 Intel® NPU driver for Windows is available through Windows Update.
 What’s Next?
 ####################
 Now you are ready to try out OpenVINO™. You can use the following tutorials to write your applications using Python and C/C++.
 * Developing in Python:
  * `Start with tensorflow models with OpenVINO™ <notebooks/101-tensorflow-to-openvino-with-output.html>`__
  * `Start with ONNX and PyTorch models with OpenVINO™ <notebooks/102-pytorch-onnx-to-openvino-with-output.html>`__
  * `Start with PaddlePaddle models with OpenVINO™ <notebooks/103-paddle-to-openvino-classification-with-output.html>`__
 * Developing in C/C++:
  * :doc:`Image Classification Async C++ Sample <openvino_inference_engine_samples_classification_sample_async_README>`
  * :doc:`Hello Classification C++ Sample <openvino_inference_engine_samples_hello_classification_README>`
  * :doc:`Hello Reshape SSD C++ Sample <openvino_inference_engine_samples_hello_reshape_ssd_README>`
      The Intel® NPU driver for Windows is available through Windows Update but
      it may also be installed manually by downloading the 
      `NPU driver package <https://www.intel.com/content/www/us/en/download-center/home.html>`__ and following the 
      `Windows driver installation guide <https://support.microsoft.com/en-us/windows/update-drivers-manually-in-windows-ec62f46c-ff14-c91d-eead-d7126dc1f7b6>`__.
      If a driver has already been installed you should be able to find 
      'Intel(R) NPU Accelerator' in Windows Device Manager. If you 
      cannot find such a device, the NPU is most likely listed in "Other devices"
      as "Multimedia Video Controller."
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins.rst
@ -14,17 +14,18 @@ Inference Device Support
   :maxdepth: 1
   :hidden:
   openvino_docs_OV_UG_query_api
   openvino_docs_OV_UG_supported_plugins_CPU
   openvino_docs_OV_UG_supported_plugins_GPU
   openvino_docs_OV_UG_supported_plugins_NPU
   openvino_docs_OV_UG_supported_plugins_GNA
   openvino_docs_OV_UG_query_api
 OpenVINO™ Runtime can infer deep learning models using the following device types:
 * :doc:`CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
 * :doc:`GPU <openvino_docs_OV_UG_supported_plugins_GPU>`
 * :doc:`NPU <openvino_docs_OV_UG_supported_plugins_NPU>`
 * :doc:`GNA <openvino_docs_OV_UG_supported_plugins_GNA>`
 * :doc:`Arm® CPU <openvino_docs_OV_UG_supported_plugins_CPU>`
@ -33,15 +34,14 @@ For a more detailed list of hardware, see :doc:`Supported Devices <openvino_docs
 .. _devicesupport-feature-support-matrix:
 Feature Support Matrix
 #######################################
 The table below demonstrates support of key features by OpenVINO device plugins.
-========================================================================================= ============================ =============== ===============
+========================================================================================= ============================ ========== =========== ===========  
- Capability                                                                                CPU                         GPU             GNA            
+ Capability                                                                                CPU                         GPU        NPU         GNA          
-========================================================================================= ============================ =============== ===============
+========================================================================================= ============================ ========== =========== ===========  
 :doc:`Heterogeneous execution <openvino_docs_OV_UG_Hetero_execution>`                     Yes                         Yes                    No           
 :doc:`Multi-device execution <openvino_docs_OV_UG_Running_on_multiple_devices>`           Yes                         Yes                    Partial      
 :doc:`Automatic batching <openvino_docs_OV_UG_Automatic_Batching>`                        No                          Yes                    No           
@ -52,7 +52,7 @@ The table below demonstrates support of key features by OpenVINO device plugins.
 :doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>`            Yes                         Yes                    No           
 :doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>`                            Yes                         No                     Yes          
 :doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>`                               Yes                         Yes                    No           
-========================================================================================= ============================ =============== ===============
+========================================================================================= ============================ ========== =========== ===========  
 For more details on plugin-specific feature limitations, see the corresponding plugin pages.
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/NPU.rst
@ -3,27 +3,158 @@
 NPU Device
 ==========
 .. meta::
-   :description: The NPU plugin in the Intel® Distribution of OpenVINO™ toolkit 
+   :description: OpenVINO™ supports the Neural Processing Unit,  
-                 aims at high performance inference of neural 
+                 a low-power processing device dedicated to running AI inference.
                 networks on the low-power NPU processing device.
-NPU is a new generation of low-power processing unit dedicated to processing neural networks. 
+The Neural Processing Unit is a low-power hardware solution, introduced with the
-The NPU plugin is a core part of the OpenVINO™ toolkit. For its in-depth description, see:
+Intel® Core™ Ultra generation of CPUs (formerly known as Meteor Lake). It enables
 you to offload certain neural network computation tasks from other devices,
 for more streamlined resource management.
-..
+For an in-depth description of the NPU plugin, see:
-   - `NPU plugin developer documentation <    cmake_options_for_custom_compilation.md ???       >`__.
+
-   - `NPU plugin source files <              ???                  >`__.
+•	`NPU plugin developer documentation <https://github.com/openvinotoolkit/npu_plugin/blob/develop/docs/VPUX_DG/index.md>`__
 •	`OpenVINO Runtime NPU plugin source files <https://github.com/openvinotoolkit/npu_plugin>`__
 | **Supported Platforms:**
 |   Host: Intel® Core™ Ultra (former Meteor Lake)
 |   NPU device: NPU 3720
 |   OS: Ubuntu* 20, MS Windows* 11 (both 64-bit)
 | **Supported Inference Data Types**
 | The NPU plugin supports the following data types as inference precision of internal primitives:
 |    Floating-point data types: f32, f16O
 |    Quantized data types: u8 (quantized models may be int8 or mixed fp16-int8)
 |    Computation precision for the HW is fp16.
 |
 | For more details on how to get a quantized model, refer to the
  :doc:`Model Optimization guide <openvino_docs_model_optimization_guide>`, and 
  :doc:`NNCF tool quantization guide <basic_quantization_flow>`.
 Model Caching
 #############################
 Model Caching helps reduce application startup delays by exporting and reusing the compiled
 model automatically. The following two compilation-related metrics are crucial in this area:
 | **First Ever Inference Latency (FEIL)**
 |   Measures all steps required to compile and execute a model on the device for the
    first time. It includes model compilation time, the time required to load and
    initialize the model on the device and the first inference execution.
 | **First Inference Latency (FIL)**
 |   Measures the time required to load and initialize the pre-compiled model on the
    device and the first inference execution.
 UMD Dynamic Model Caching
 +++++++++++++++++++++++++++++
 UMD model caching is a solution enabled by default in the current NPU driver.
 It improves time to first inference (FIL) by storing the model in the cache 
 after the compilation (included in FEIL), based on a hash key. The process
 may be summarized in three stages:
 1. UMD generates the key from the input IR model and build arguments
 2. UMD requests the DirectX Shader cache session to store the model
   with the computed key. 
 3. All subsequent requests to compile the same IR model with the same arguments
   use the pre-compiled model, reading it from the cache instead of recompiling.
 OpenVINO Model Caching
 +++++++++++++++++++++++++++++
 OpenVINO Model Caching is a common mechanism for all OpenVINO device plugins and
 can be enabled by setting the ``ov::cache_dir`` property. This way, the UMD model
 caching is automatically bypassed by the NPU plugin, which means the model
 will only be stored in the OpenVINO cache after compilation. When a cache hit 
 occurs for subsequent compilation requests, the plugin will import the model
 instead of recompiling it.
 For more details about OpenVINO model caching, see the
 :doc:`Model Caching Overview <openvino_docs_OV_UG_Model_caching_overview>`.
 Supported Features and properties
 #######################################
 The NPU device is currently supported by AUTO and MULTI inference modes.
 The NPU support in OpenVINO is still under active development and may 
 offer a limited set of supported OpenVINO features. 
 **Supported Properties:**
 .. tab-set::
   .. tab-item:: Read-write properties
      .. code-block::
         ov::caching_properties
         ov::enable_profiling
         ov::hint::performance_mode
         ov::hint::num_requests
         ov::hint::model_priority
         ov::hint::enable_cpu_pinning
         ov::log::level
         ov::device::id
         ov::cache_dir
         ov::internal::exclusive_async_requests
         ov::intel_vpux::dpu_groups
         ov::intel_vpux::dma_engines
         ov::intel_vpux::compilation_mode
         ov::intel_vpux::compilation_mode_params 
         ov::intel_vpux::print_profiling 
         ov::intel_vpux::profiling_output_file 
         ov::intel_vpux::vpux_platform 
         ov::intel_vpux::use_elf_compiler_backend 
   .. tab-item:: Read-only properties
      .. code-block::
         ov::supported_properties
         ov::streams::num
         ov::optimal_number_of_infer_requests
         ov::range_for_async_infer_requests
         ov::range_for_streams
         ov::available_devices
         ov::device::uuid
         ov::device::architecture
         ov::device::full_name
         ov::intel_vpux::device_total_mem_size
         ov::intel_vpux::driver_version
 .. note:: 
   The optimum number of inference requests returned by the plugin
   based on the performance mode is **4 for THROUGHPUT** and **1 for LATENCY**.
   The default mode for the NPU device is LATENCY.
 Limitations
 #############################
 * Currently, only the models with static shapes are supported on NPU. 
 * If the path to the model file includes non-Unicode symbols, such as in Chinese,
  the model cannot be used for inference on NPU. It will return an error. 
 * Running the Alexnet model with NPU may result in a drop in accuracy. 
  At this moment, the googlenet-v4 model is recommended for classification tasks.
 Additional Resources
 #############################
 * `Vision colorization Notebook <notebooks/222-vision-image-colorization-with-output.html>`__
 * `Classification Benchmark C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/classification_benchmark_demo/cpp>`__
 * `3D Human Pose Estimation Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/3d_segmentation_demo/python>`__
 * `Object Detection C++ Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/cpp>`__
 * `Object Detection Python Demo <https://github.com/openvinotoolkit/open_model_zoo/tree/master/demos/object_detection_demo/python>`__
 * `POT-specific sample with sparse resnet-50 generation <https://github.com/openvinotoolkit/openvino/tree/master/tools/pot/openvino/tools/pot/api/samples/prune_and_quantize>`__
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/Device_Plugins/config_properties.rst
@ -3,35 +3,37 @@
 Query Device Properties - Configuration
 =======================================
 .. meta::
   :description: Learn the details on the process of querying different device 
                 properties and configuration values at runtime.
-The OpenVINO™ toolkit supports inference with several types of devices (processors or accelerators).
+This article provides an overview of how to query different device properties
-This section provides a high-level description of the process of querying of different device properties and configuration values at runtime.
+and configuration values at runtime.
 OpenVINO runtime has two types of properties:
- Read only properties which provide information about the devices (such as device name or execution capabilities, etc.)
+- **Read only properties** which provide information about devices, such as device
-  and information about configuration values used to compile the model (``ov::CompiledModel``) .
+  name and execution capabilities, and information about configuration values
- Mutable properties which are primarily used to configure the ``ov::Core::compile_model`` process and affect final 
+  used to compile the model - ``ov::CompiledModel``.
-  inference on a specific set of devices. Such properties can be set globally per device via ``ov::Core::set_property`` 
+- **Mutable properties**, primarily used to configure the ``ov::Core::compile_model``
-  or locally for particular model in the ``ov::Core::compile_model`` and the ``ov::Core::query_model`` calls.
+  process and affect final inference on a specific set of devices. Such properties 
  can be set globally per device via ``ov::Core::set_property`` or locally for a
  particular model in the ``ov::Core::compile_model`` and ``ov::Core::query_model``
  calls.
-An OpenVINO property is represented as a named constexpr variable with a given string name and a type. 
+An OpenVINO property is represented as a named constexpr variable with a given string
-The following example represents a read-only property with a C++ name of ``ov::available_devices``, 
+name and a type. The following example represents a read-only property with the C++ name
-a string name of ``AVAILABLE_DEVICES`` and a type of ``std::vector<std::string>``:
+of ``ov::available_devices``, the string name of ``AVAILABLE_DEVICES`` and the type of
 ``std::vector<std::string>``:
 .. code-block:: sh
   static constexpr Property<std::vector<std::string>, PropertyMutability::RO> available_devices{"AVAILABLE_DEVICES"};
-Refer to the :doc:`Hello Query Device С++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and 
+Refer to the :doc:`Hello Query Device C++ Sample <openvino_inference_engine_samples_hello_query_device_README>` sources and 
 the :doc:`Multi-Device execution <openvino_docs_OV_UG_Running_on_multiple_devices>` documentation for examples of using 
 setting and getting properties in user applications.
--- a/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
+++ b/docs/articles_en/openvino_workflow/running_inference_with_openvino/inference_modes_overview/auto_device_selection.rst
@ -3,6 +3,11 @@
 Automatic Device Selection
 ==========================
 .. meta::
   :description: The Automatic Device Selection mode in OpenVINO™ Runtime 
                 detects available devices and selects the optimal processing 
                 unit for inference automatically.
 .. toctree::
   :maxdepth: 1
@ -10,34 +15,25 @@ Automatic Device Selection
   Debugging Auto-Device Plugin <openvino_docs_OV_UG_supported_plugins_AUTO_debugging>
 .. meta::
   :description: The Automatic Device Selection mode in OpenVINO™ Runtime 
                 detects available devices and selects the optimal processing 
                 unit for inference automatically.
 This article introduces how Automatic Device Selection works and how to use it for inference.
 .. _how-auto-works:
 How AUTO Works
 ##############
 The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device,
-which does not bind to a specific type of hardware, but rather selects the processing unit for inference automatically. 
+which does not bind to a specific type of hardware, but rather selects the processing unit
-It detects available devices, picks the one best-suited for the task, and configures its optimization settings. 
+for inference automatically. It detects available devices, picks the one best-suited for the
-This way, you can write the application once and deploy it anywhere.
+task, and configures its optimization settings. This way, you can write the application once
 and deploy it anywhere.
-The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as device priority list limitations, if you choose to exclude some hardware from the process.
+The selection also depends on your performance requirements, defined by the “hints”
 configuration API, as well as device priority list limitations, if you choose to exclude
 some hardware from the process.
 The logic behind the choice is as follows:
 1. Check what supported devices are available.
 2. Check precisions of the input model (for detailed information on precisions read more on the ``ov::device::capabilities``).
 3. Select the highest-priority device capable of supporting the given model, as listed in the table below.
-4. If model’s precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
+4. If model's precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
 +----------+-----------------------------------------------------+------------------------------------+
@ -53,7 +49,18 @@ The logic behind the choice is as follows:
 | 3        | Intel® CPU                                          | FP32, FP16, INT8, BIN              |
 |          | (e.g. Intel® Core™ i7-1165G7)                       |                                    |
 +----------+-----------------------------------------------------+------------------------------------+
 | 4        | Intel® NPU                                          |                                    |
 |          | (e.g. Intel® Core™ Ultra)                           |                                    |
 +----------+-----------------------------------------------------+------------------------------------+
 .. note:: 
   Note that NPU is currently excluded from the default priority list. To use it for inference, you
   need to specify it explicitly
 How AUTO Works
 ##############
 To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
 What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
@ -61,12 +68,19 @@ While the CPU is performing inference, AUTO continues to load the model to the d
 This way, the devices which are much slower in compiling models, GPU being the best example, do not impact inference at its initial stages.
 For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.
-Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be unable to support the initial model compilation stage. The models with dynamic input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` operations will be loaded to the CPU if it is in the candidate list. Otherwise, these models will follow the normal flow and be loaded to the device based on priority.
+Note that if you choose to exclude CPU from the priority list or disable the initial
 CPU acceleration feature via ``ov::intel_auto::enable_startup_fallback``, it will be
 unable to support the initial model compilation stage. The models with dynamic
 input/output or stateful :doc:`stateful<openvino_docs_OV_UG_model_state_intro>` 
 operations will be loaded to the CPU if it is in the candidate list. Otherwise, 
 these models will follow the normal flow and be loaded to the device based on priority.
 .. image:: _static/images/autoplugin_accelerate.svg
-This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` section, showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example:
+This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app sample <using-auto-with-openvino-samples-and-benchmark-app>` 
 section, showing how the first-inference latency (the time it takes to compile the
 model and perform the first inference) is reduced when using AUTO. For example:
 .. code-block:: sh
@ -88,8 +102,9 @@ This mechanism can be easily observed in the :ref:`Using AUTO with Benchmark app
 Using AUTO
 ##########
-Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
+Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of "AUTO". 
-
+It may be defined with no additional parameters, resulting in defaults being used, or configured further with
 the following setup options:
 +----------------------------------------------+--------------------------------------------------------------------+
 | Property(C++ version)                        | Values and Description                                             |
@ -205,7 +220,6 @@ The code samples on this page assume following import(Python)/using (C++) are in
 Device Candidates and Priority
 ++++++++++++++++++++++++++++++
 The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO.
 * If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
@ -496,3 +510,4 @@ Additional Resources
 - :doc:`Running on Multiple Devices Simultaneously <openvino_docs_OV_UG_Running_on_multiple_devices>`
 - :doc:`Supported Devices <openvino_docs_OV_UG_supported_plugins_Supported_Devices>`
--- a/docs/sphinx_setup/_static/images/ov_homepage_diagram.png
+++ b/docs/sphinx_setup/_static/images/ov_homepage_diagram.png
@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e0791abad48ec62d3ebcd111cf42139abe4bfb809c84882c0e8aa88ff7b430b7
+oid sha256:27bff5eb0b93754e6f8cff0ae294d0221cc9184a517d1991da06bea9cc272eb7
-size 85563
+size 84550