Cumulative throughput 2022.2 (#11831)

* Add Overview page * Revert "Add Overview page" * update auto with cumulative throughput * update formatting * update formatting * update content * update * fix formatting * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Chen Peter <peter.chen@intel.com> * update * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * Update docs/OV_Runtime_UG/multi_device.md Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Chen Peter <peter.chen@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * update indentation of table Co-authored-by: Chen Peter <peter.chen@intel.com> Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
2022-06-17 00:23:10 +08:00 · 2022-06-17 00:23:10 +08:00 · d7b8d80a61
commit d7b8d80a61
parent 2fec03024d
5 changed files with 159 additions and 59 deletions
--- a/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md
+++ b/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md
--- a/docs/OV_Runtime_UG/auto_device_selection.md
+++ b/docs/OV_Runtime_UG/auto_device_selection.md
@ -1,4 +1,4 @@
-# Automatic device selection {#openvino_docs_OV_UG_supported_plugins_AUTO}
+# Automatic Device Selection {#openvino_docs_OV_UG_supported_plugins_AUTO}

@sphinxdirective

@ -10,13 +10,16 @@

@endsphinxdirective

+This article introduces how Automatic Device Selection works and how to use it for inference.
+
+## How AUTO Works
+
 The Automatic Device Selection mode, or AUTO for short, uses a "virtual" or a "proxy" device, 
 which does not bind to a specific type of hardware, but rather selects the processing unit for inference automatically. 
 It detects available devices, picks the one best-suited for the task, and configures its optimization settings. 
 This way, you can write the application once and deploy it anywhere.

-The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as 
-device priority list limitations, if you choose to exclude some hardware from the process.
+The selection also depends on your performance requirements, defined by the “hints” configuration API, as well as device priority list limitations, if you choose to exclude some hardware from the process.

 The logic behind the choice is as follows: 
 1. Check what supported devices are available. 
@ -44,16 +47,16 @@ The logic behind the choice is as follows:
@endsphinxdirective

 To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds. 
-What is important, **AUTO always starts inference with the CPU**, as it provides very low latency and can start inference with no additional delays. 
+What is important, **AUTO always starts inference with the CPU of the system**, as it provides very low latency and can start inference with no additional delays. 
 While the CPU is performing inference, AUTO continues to load the model to the device best suited for the purpose and transfers the task to it when ready.
 This way, the devices which are much slower in compiling models, GPU being the best example, do not impede inference at its initial stages.
-For example, if you use a CPU and a GPU, first-inference latency of AUTO will be better than GPU itself.
+For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.

-Note that if you choose to exclude the CPU from the priority list, it will also be unable to support the initial model compilation stage.
+Note that if you choose to exclude CPU from the priority list, it will be unable to support the initial model compilation stage.
     
 ![autoplugin_accelerate]

-This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example: 
+This mechanism can be easily observed in the [Using AUTO with Benchmark app sample](#using-auto-with-openvino-samples-and-benchmark-app) section, showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example: 

 ```sh
 benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128
@ -70,43 +73,61 @@ benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO -niter 128
   The longer the process runs, the closer realtime performance will be to that of the best-suited device.
@endsphinxdirective

-## Using the Auto-Device Mode 
+## Using AUTO 

 Following the OpenVINO™ naming convention, the Automatic Device Selection mode is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options: 

@sphinxdirective

-+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| Property                  | Property values                               | Description                                               |
-+===========================+===============================================+===========================================================+
-| <device candidate list>   | | AUTO: <device names>                        | | Lists the devices available for selection.              |
-|                           | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
-|                           | |                                             | | from high to low.                                       |
-|                           | |                                             | | If not specified, “AUTO” will be used as default        |
-|                           | |                                             | | and all devices will be included.                       |
-+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::device:priorities     | | device names                                | | Specifies the devices for Auto-Device plugin to select. |
-|                           | | comma-separated, no spaces                  | | The device sequence will be taken as priority           |
-|                           | |                                             | | from high to low.                                       |
-|                           | |                                             | | This configuration is optional.                         |
-+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::hint::performance_mode| | ov::hint::PerformanceMode::LATENCY          | | Specifies the performance mode preferred                |
-|                           | | ov::hint::PerformanceMode::THROUGHPUT       | | by the application.                                     |
-+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
-| ov::hint::model_priority  | | ov::hint::Priority::HIGH                    | | Indicates the priority for a model.                     |
-|                           | | ov::hint::Priority::MEDIUM                  | | Importantly!                                            |
-|                           | | ov::hint::Priority::LOW                     | | This property is still not fully supported              |
-+---------------------------+-----------------------------------------------+-----------------------------------------------------------+
+--------------------------------+----------------------------------------------------------------------+
+| | Property                     | | Values and Description                                             |
+================================+======================================================================+
+| | <device candidate list>      | | **Values**:                                                        |
+| |                              | |       empty                                                        |
+| |                              | |       `AUTO`                                                       |
+| |                              | |       `AUTO: <device names>` (comma-separated, no spaces)          |
+| |                              | |                                                                    |
+| |                              | | Lists the devices available for selection.                         |
+| |                              | | The device sequence will be taken as priority from high to low.    |
+| |                              | | If not specified, `AUTO` will be used as default,                  |
+| |                              | | and all devices will be "viewed" as candidates.                    |
+--------------------------------+----------------------------------------------------------------------+
+| | `ov::device:priorities`      | | **Values**:                                                        |
+| |                              | |       `<device names>` (comma-separated, no spaces)                |
+| |                              | |                                                                    |
+| |                              | | Specifies the devices for AUTO to select.                          |
+| |                              | | The device sequence will be taken as priority from high to low.    |
+| |                              | | This configuration is optional.                                    |
+--------------------------------+----------------------------------------------------------------------+
+| | `ov::hint::performance_mode` | | **Values**:                                                        |
+| |                              | |       `ov::hint::PerformanceMode::LATENCY`                         |
+| |                              | |       `ov::hint::PerformanceMode::THROUGHPUT`                      |
+| |                              | |       `ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT`           |
+| |                              | |                                                                    |
+| |                              | | Specifies the performance option preferred by the application.     |
+--------------------------------+----------------------------------------------------------------------+
+| | `ov::hint::model_priority`   | | **Values**:                                                        |
+| |                              | |       `ov::hint::Priority::HIGH`                                   |
+| |                              | |       `ov::hint::Priority::MEDIUM`                                 |
+| |                              | |       `ov::hint::Priority::LOW`                                    |
+| |                              | |                                                                    |
+| |                              | | Indicates the priority for a model.                                |
+| |                              | | IMPORTANT: This property is not fully supported yet.               |
+--------------------------------+----------------------------------------------------------------------+

@endsphinxdirective

 Inference with AUTO is configured similarly to when device plugins are used:
 you compile the model on the plugin with configuration and execute inference.

-### Device candidate list
-The device candidate list allows users to customize the priority and limit the choice of devices available to the AUTO plugin. If not specified, the plugin assumes all the devices present in the system can be used. Note, that OpenVINO™ Runtime lets you use “GPU” as an alias for “GPU.0” in function calls. More detail on enumerating devices can be found in [Working with devices](supported_plugins/Device_Plugins.md).
+### Device Candidates and Priority
+The device candidate list enables you to customize the priority and limit the choice of devices available to AUTO. 
+- If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used. 
+- If `AUTO` without any device names is specified, AUTO assumes all the devices present in the system can be used, and will load the network to all devices and run inference based on their default priorities, from high to low.

-The following commands are accepted by the API: 
+To specify the priority of devices, enter the device names in the priority order (from high to low) in `AUTO: <device names>`, or use the `ov::device:priorities` property.
+
+See the following code for using AUTO and specifying devices: 

@sphinxdirective

@ -124,29 +145,86 @@ The following commands are accepted by the API:

@endsphinxdirective

-To check what devices are present in the system, you can use Device API. For information on how to do it, check [Query device properties and configuration](supported_plugins/config_properties.md)
+Note that OpenVINO Runtime lets you use “GPU” as an alias for “GPU.0” in function calls. More details on enumerating devices can be found in [Working with devices](supported_plugins/Device_Plugins.md).

-For C++
+#### Checking Available Devices

+To check what devices are present in the system, you can use Device API, as listed below. For information on how to use it, see [Query device properties and configuration](supported_plugins/config_properties.md).
+
+@sphinxdirective
+
+.. tab:: C++   
+
+   .. code-block:: sh
+
+      ov::runtime::Core::get_available_devices() 
+
+   See the Hello Query Device C++ Sample for reference.
+
+.. tab:: Python
+
+   .. code-block:: sh
+
+      openvino.runtime.Core.available_devices
+
+   See the Hello Query Device Python Sample for reference.
+
+@endsphinxdirective
+
+#### Excluding Devices from Device Candidate List
+
+You can also exclude hardware devices from AUTO, for example, to reserve CPU for other jobs. AUTO will not use the device for inference then. To do that, add a minus sign (-) before CPU in `AUTO: <device names>`, as in the following example:
+
+@sphinxdirective
+
+.. tab:: C++
+
+   .. code-block:: sh
+
+      ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:-CPU"); 
+
+.. tab:: Python
+
+   .. code-block:: sh
+
+      compiled_model = core.compile_model(model=model, device_name="AUTO:-CPU")
+
+@endsphinxdirective
+
+AUTO will then query all available devices and remove CPU from the candidate list. 
+
+Note that if you choose to exclude CPU from device candidate list, CPU will not be able to support the initial model compilation stage. See more information in [How AUTO Works](#how-auto-works).
+
+### Performance Hints for AUTO
+The `ov::hint::performance_mode` property enables you to specify a performance option for AUTO to be more efficient for particular use cases.
+
+> **NOTE**: Currently, the `ov::hint` property is supported by CPU and GPU devices only.
+
+#### THROUGHPUT
+This option prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, such as inference of video feeds or large numbers of images.
+
+#### LATENCY
+This option prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, e.g. a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
+
+@sphinxdirective
+
+.. _cumulative throughput:
+
+@endsphinxdirective
+
+#### CUMULATIVE_THROUGHPUT
+While `LATENCY` and `THROUGHPUT` can select one target device with your preferred performance option, the `CUMULATIVE_THROUGHPUT` option enables running inference on multiple devices for higher throughput. With `CUMULATIVE_THROUGHPUT`, AUTO loads the network model to all available devices in the candidate list, and then runs inference on them based on the default or specified priority. 
+
+CUMULATIVE_THROUGHPUT has similar behavior as [the Multi-Device execution mode (MULTI)](./multi_device.md). The only difference is that CUMULATIVE_THROUGHPUT uses the devices specified by AUTO, which means that it's not mandatory to add devices manually, while with MULTI, you need to specify the devices before inference. 
+
+With the CUMULATIVE_THROUGHPUT option:
+- If `AUTO` without any device names is specified, and the system has more than one GPU devices, AUTO will remove CPU from the device candidate list to keep GPU running at full capacity.
+- If device priority is specified, AUTO will run inference requests on devices based on the priority. In the following example, AUTO will always try to use GPU first, and then use CPU if GPU is busy:
   ```sh
-ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample)
+   ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT));
   ```

-For Python
-
-```sh
-openvino.runtime.Core.available_devices (see Hello Query Device Python Sample)
-```
-
-### Performance Hints
-The `ov::hint::performance_mode` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases.
-
-#### THROUGHPUT Mode
-This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images.
-
-#### LATENCY Mode
-This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
-Note that currently the `ov::hint` property is supported by CPU and GPU devices only.
+#### Code Examples

 To enable performance hints for your application, use the following code: 
@sphinxdirective
@ -165,7 +243,7 @@ To enable performance hints for your application, use the following code:

@endsphinxdirective

-### Model Priority
+### Configuring Model Priority

 The `ov::hint::model_priority` property enables you to control the priorities of models in the Auto-Device plugin. A high-priority model will be loaded to a supported high-priority device. A lower-priority model will not be loaded to a device that is occupied by a higher-priority model.

@ -206,8 +284,8 @@ Although the methods described above are currently the preferred way to execute

@endsphinxdirective

-<a name="Benchmark App Info"></a>
-## Using AUTO with OpenVINO™ Samples and the Benchmark App
+## Using AUTO with OpenVINO Samples and Benchmark app
+
 To see how the Auto-Device plugin is used in practice and test its performance, take a look at OpenVINO™ samples. All samples supporting the "-d" command-line option (which stands for "device") will accept the plugin out-of-the-box. The Benchmark Application will be a perfect place to start – it presents the optimal performance of the plugin without the need for additional settings, like the number of requests or CPU threads. To evaluate the AUTO performance, you can use the following commands:

 For unlimited device choice:
@ -234,5 +312,11 @@ For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.
   No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on.
@endsphinxdirective

+## See Also
+
+- [Debugging AUTO](AutoPlugin_Debugging.md)
+- [Running on Multiple Devices Simultaneously](./multi_device.md)
+- [Supported Devices](supported_plugins/Supported_Devices.md)
+

 [autoplugin_accelerate]: ../img/autoplugin_accelerate.png
--- a/docs/OV_Runtime_UG/multi_device.md
+++ b/docs/OV_Runtime_UG/multi_device.md
@ -1,6 +1,15 @@
-# Running on multiple devices simultaneously {#openvino_docs_OV_UG_Running_on_multiple_devices}
+# Running on Multiple Devices Simultaneously {#openvino_docs_OV_UG_Running_on_multiple_devices}

+@sphinxdirective

+To run inference on multiple devices, you can choose either of the following ways:
+
+   - Use the :ref:`CUMULATIVE_THROUGHPUT option <cumulative throughput>` of the Automatic Device Selection mode. This way, you can use all available devices in the system without the need to specify them. 
+   - Use the Multi-Device execution mode. This page will explain how it works and how to use it.
+
+@endsphinxdirective
+
+## How MULTI Works

 The Multi-Device execution mode, or MULTI for short, acts as a "virtual" or a "proxy" device, which does not bind to a specific type of hardware. Instead, it assigns available computing devices to particular inference requests, which are then executed in parallel. 

@ -154,7 +163,9 @@ To facilitate the copy savings, it is recommended to run the requests in the ord


 ## See Also
-[Supported Devices](supported_plugins/Supported_Devices.md)
+
+- [Supported Devices](supported_plugins/Supported_Devices.md)
+- [Automatic Device Selection](./auto_device_selection.md)

@sphinxdirective
 .. raw:: html
--- a/docs/snippets/AUTO3.cpp
+++ b/docs/snippets/AUTO3.cpp
@ -9,13 +9,16 @@ ov::Core core;
 // Read a network in IR, PaddlePaddle, or ONNX format:
 std::shared_ptr<ov::Model> model = core.read_model("sample.xml");

-// compile a model on AUTO with Performance Hints enabled:
-// To use the “throughput” mode:
+// Compile a model on AUTO with Performance Hint enabled:
+// To use the “THROUGHPUT” option:
 ov::CompiledModel compiled_model = core.compile_model(model, "AUTO",
    ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
-// or the “latency” mode:
+// To use the “LATENCY” option:
 ov::CompiledModel compiled_mode2 = core.compile_model(model, "AUTO",
    ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY));
+// To use the “CUMULATIVE_THROUGHPUT” option:
+ov::CompiledModel compiled_mode3 = core.compile_model(model, "AUTO",
+    ov::hint::performance_mode(ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT));    
 //! [part3]
 }
    return 0;
--- a/docs/snippets/ov_auto.py
+++ b/docs/snippets/ov_auto.py
@ -57,11 +57,13 @@ def part3():
    core = Core()
    # Read a network in IR, PaddlePaddle, or ONNX format:
    model = core.read_model(model_path)
-    # compile a model on AUTO with Performance Hints enabled:
-    # To use the “throughput” mode:
+    # Compile a model on AUTO with Performance Hints enabled:
+    # To use the “THROUGHPUT” mode:
    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})
-    # or the “latency” mode:
+    # To use the “LATENCY” mode:
    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
+    # To use the “CUMULATIVE_THROUGHPUT” mode:
+    compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"CUMULATIVE_THROUGHPUT"})
 #! [part3]

 def part4():