* Extensibility guide with FE extensions and remove OV_FRAMEWORK_MAP from docs * Rework of Extensibility Intro, adopted examples to missing OPENVINO_FRAMEWORK_MAP * Removed OPENVINO_FRAMEWORK_MAP reference * Frontend extension detailed documentation * Fixed distributed snippets * Fixed snippet inclusion in FE extension document and chapter headers * Fixed wrong name in a snippet reference * Fixed test for template extension due to changed number of loaded extensions * Update docs/Extensibility_UG/frontend_extensions.md Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> * Minor fixes in extension snippets * Small grammar fix Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> * DOCS: transition banner (#10973) * transition banner * minor fix * update transition banner * updates * update custom.js * updates * updates * Documentation fixes (#11044) * Benchmark app usage * Fixed link to the devices * More fixes * Update docs/OV_Runtime_UG/multi_device.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Removed several hardcoded links Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Updated documentation for compile_tool (#11049) * Added deployment guide (#11060) * Added deployment guide * Added local distribution * Updates * Fixed more indentations * Removed obsolete code snippets (#11061) * Removed obsolete code snippets * NCC style * Fixed NCC for BA * Add a troubleshooting issue for PRC installation (#11074) * updates * adding gna to linux * add missing reference * update * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * Update docs/install_guides/installing-model-dev-tools.md Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * update * minor updates * add gna item to yum and apt * add gna to get started page * update reference formatting * merge commit * add a troubleshooting issue * update * update * fix CVS-71846 Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> * DOCS: fixed hardcoded links (#11100) * Fixes * Use links * applying reviewers comments to the Opt Guide (#11093) * applying reviewrs comments * fixed refs, more structuring (bold, bullets, etc) * refactoring tput/latency sections * next iteration (mostly latency), also brushed the auto-batching and other sections * updates sync/async images * common opts brushed * WIP tput redesigned * minor brushing of common and auto-batching * Tput fully refactored * fixed doc name in the link * moved int8 perf counters to the right section * fixed links * fixed broken quotes * fixed more links * add ref to the internals to the TOC * Added a note on the batch size Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com> * [80085] New images for docs (#11114) * change doc structure * fix manager tools * fix manager tools 3 step * fix manager tools 3 step * new img * new img for OV Runtime * fix steps * steps * fix intendents * change list * fix space * fix space * code snippets fix * change display * Benchmarks 2022 1 (#11130) * Minor fixes * Updates for 2022.1 * Edits according to the review * Edits according to review comments * Edits according to review comments * Edits according to review comments * Fixed table * Edits according to review comments * Removed config for Intel® Core™ i7-11850HE * Removed forward-tacotron-duration-prediction-241 graph * Added resnet-18-pytorch * Add info about Docker images in Deployment guide (#11136) * Renamed user guides (#11137) * fix screenshot (#11140) * More conservative recommendations on dynamic shapes usage in docs (#11161) * More conservative recommendations about using dynamic shapes * Duplicated statement from C++ part to Python part of reshape doc (no semantical changes) * Update ShapeInference.md (#11168) * Benchmarks 2022 1 updates (#11180) * Updated graphs * Quick fix for TODO in Dynamic Shapes article * Anchor link fixes * Fixed DM config (#11199) * DOCS: doxy sphinxtabs (#11027) * initial implementation of doxy sphinxtabs * fixes * fixes * fixes * fixes * fixes * WA for ignored visibility attribute * Fixes Co-authored-by: Sergey Lyalin <sergey.lyalin@intel.com> Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com> Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com> Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com> Co-authored-by: Yuan Xu <yuan1.xu@intel.com> Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com> Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com> Co-authored-by: Tatiana Savina <tatiana.savina@intel.com> Co-authored-by: Ilya Naumov <ilya.naumov@intel.com> Co-authored-by: Evgenya Stepyreva <evgenya.stepyreva@intel.com>
17 KiB
Running on multiple devices simultaneously
Introducing the Multi-Device Plugin (C++)
@sphinxdirective .. raw:: html
<div id="switcher-cpp" class="switcher-anchor">C++</div>
@endsphinxdirective
The Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. By contrast, the Heterogeneous plugin can run different layers on different devices but not in parallel. The potential gains with the Multi-Device plugin are:
- Improved throughput from using multiple devices (compared to single-device execution)
- More consistent performance, since the devices share the inference burden (if one device is too busy, another can take more of the load)
Note that with Multi-Device the application logic is left unchanged, so you don't need to explicitly compile the model on every device, create and balance the inference requests and so on. From the application point of view, this is just another device that handles the actual machinery. The only thing that is required to leverage performance is to provide the multi-device (and hence the underlying devices) with enough inference requests to process. For example, if you were processing 4 cameras on the CPU (with 4 inference requests), it might be desirable to process more cameras (with more requests in flight) to keep CPU and GPU busy via Multi-Device.
The setup of Multi-Device can be described in three major steps:
- Prepare configure for each device.
- Compile the model on the Multi-Device plugin created on top of a (prioritized) list of the configured devices with the configure prepared in step one.
- As with any other CompiledModel call (resulting from
compile_model), you create as many requests as needed to saturate the devices.
These steps are covered below in detail.
Defining and Configuring the Multi-Device Plugin
Following the OpenVINO™ convention of labeling devices, the Multi-Device plugin uses the name "MULTI". The only configuration option for the Multi-Device plugin is a prioritized list of devices to use:
| Parameter name | Parameter values | Default | Description |
|---|---|---|---|
| ov::device::priorities | comma-separated device names with no spaces | N/A | Prioritized list of devices |
You can set the priorities directly as a string.
Basically, there are three ways to specify the devices to be use by the "MULTI":
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI0.cpp
:language: cpp
:fragment: [part0]
@endsphinxdirective
Notice that the priorities of the devices can be changed in real time for the compiled model:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI1.cpp
:language: cpp
:fragment: [part1]
@endsphinxdirective
Finally, there is a way to specify number of requests that the Multi-Device will internally keep for each device. Suppose your original app was running 4 cameras with 4 inference requests. You would probably want to share these 4 requests between 2 devices used in MULTI. The easiest way is to specify a number of requests for each device using parentheses: "MULTI:CPU(2),GPU(2)" and use the same 4 requests in your app. However, such an explicit configuration is not performance-portable and hence not recommended. Instead, the better way is to configure the individual devices and query the resulting number of requests to be used at the application level (see Configuring the Individual Devices and Creating the Multi-Device On Top).
Enumerating Available Devices
The OpenVINO Runtime API features a dedicated methods to enumerate devices and their capabilities. See the Hello Query Device C++ Sample. This is example output from the sample (truncated to device names only):
./hello_query_device
Available devices:
Device: CPU
...
Device: GPU.0
...
Device: GPU.1
...
Device: HDDL
A simple programmatic way to enumerate the devices and use with the multi-device is as follows:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI2.cpp
:language: cpp
:fragment: [part2]
@endsphinxdirective
Beyond the trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified. For example, this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample:
...
Device: MYRIAD.1.2-ma2480
...
Device: MYRIAD.1.4-ma2480
So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480". Accordingly, the code that loops over all available devices of "MYRIAD" type only is below:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI3.cpp
:language: cpp
:fragment: [part3]
@endsphinxdirective
Configuring the Individual Devices and Creating the Multi-Device On Top
As discussed in the first section, you shall configure each individual device as usual and then just create the "MULTI" device on top:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI4.cpp
:language: cpp
:fragment: [part4]
@endsphinxdirective
An alternative is to combine all the individual device settings into a single config file and load that, allowing the Multi-Device plugin to parse and apply settings to the right devices. See the code example in the next section.
Note that while the performance of accelerators combines really well with Multi-Device, the CPU+GPU execution poses some performance caveats, as these devices share the power, bandwidth and other resources. For example it is recommended to enable the GPU throttling hint (which save another CPU thread for the CPU inference). See the Using the Multi-Device with OpenVINO samples and benchmarking the performance section below.
Querying the Optimal Number of Inference Requests
You can use the configure devices to query the optimal number of requests. Similarly, when using the Multi-Device you don't need to sum over included devices yourself, you can query property directly:
@sphinxdirective
.. tab:: C++
.. doxygensnippet:: docs/snippets/MULTI5.cpp
:language: cpp
:fragment: [part5]
@endsphinxdirective
Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance
Every OpenVINO sample that supports the -d (which stands for "device") command-line option transparently accepts Multi-Device. The Benchmark Application is the best reference for the optimal usage of Multi-Device. As discussed earlier, you do not need to set up the number of requests, CPU streams or threads because the application provides optimal performance out of the box. Below is an example command to evaluate HDDL+GPU performance with that:
./benchmark_app –d MULTI:HDDL,GPU –m <model> -i <input> -niter 1000
The Multi-Device plugin supports FP16 IR files. The CPU plugin automatically upconverts it to FP32 and the other devices support it natively. Note that no demos are (yet) fully optimized for Multi-Device, by means of supporting the ov::optimal_number_of_infer_requests property, using the GPU streams/throttling, and so on.
Video: MULTI Plugin
@sphinxdirective .. raw:: html
<iframe allowfullscreen mozallowfullscreen msallowfullscreen oallowfullscreen webkitallowfullscreen width="560" height="315" src="https://www.youtube.com/embed/xbORYFEmrqU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
@endsphinxdirective
See Also
Performance Considerations for the Multi-Device Execution
This section covers few recommendations for the multi-device execution (applicable for both Python and C++):
- MULTI usually performs best when the fastest device is specified first in the list of the devices. This is particularly important when the request-level parallelism is not sufficient (e.g. the number of request in the flight is not enough to saturate all devices).
- Just like with any throughput-oriented execution, it is highly recommended to query the optimal number of inference requests directly from the instance of the
ov:compiled_model. Please refer to the code of thebenchmark_app, that exists in both C++ and Python, for more details. - Notice that for example CPU+GPU execution performs better with certain knobs which you can find in the code of the same Benchmark App sample. One specific example is disabling GPU driver polling, which in turn requires multiple GPU streams to amortize slower communication of inference completion from the device to the host.
- Multi-device logic always attempts to save on the (e.g. inputs) data copies between device-agnostic, user-facing inference requests and device-specific 'worker' requests that are being actually scheduled behind the scene. To facilitate the copy savings, it is recommended to run the requests in the order that they were created.
Introducing the Multi-Device Plugin (Python)
@sphinxdirective .. raw:: html
<div id="switcher-python" class="switcher-anchor">Python</div>
@endsphinxdirective
The Multi-Device plugin automatically assigns inference requests to available computational devices to execute the requests in parallel. By contrast, the Heterogeneous plugin can run different layers on different devices but not in parallel. The potential gains with the Multi-Device plugin are:
- Improved throughput from using multiple devices (compared to single-device execution)
- More consistent performance, since the devices share the inference burden (if one device is too busy, another can take more of the load)
Note that with Multi-Device the application logic is left unchanged, so you don't need to explicitly compile the model on every device, create and balance the inference requests and so on. From the application point of view, this is just another device that handles the actual machinery. The only thing that is required to leverage performance is to provide the multi-device (and hence the underlying devices) with enough inference requests to process. For example, if you were processing 4 cameras on the CPU (with 4 inference requests), it might be desirable to process more cameras (with more requests in flight) to keep CPU and GPU busy via Multi-Device.
The setup of Multi-Device can be described in three major steps:
- Configure each device (using the conventional configure devices method
- Compile the model on the Multi-Device plugin created on top of a (prioritized) list of the configured devices. This is the only change needed in the application.
- As with any other CompiledModel call (resulting from
compile_model), you create as many requests as needed to saturate the devices.
These steps are covered below in detail.
Defining and Configuring the Multi-Device Plugin
Following the OpenVINO™ convention of labeling devices, the Multi-Device plugin uses the name "MULTI". The only configuration option for the Multi-Device plugin is a prioritized list of devices to use:
| Parameter name | Parameter values | Default | Description |
|---|---|---|---|
| "MULTI_DEVICE_PRIORITIES" | comma-separated device names with no spaces | N/A | Prioritized list of devices |
You can set the configuration directly as a string, or use the metric key MULTI_DEVICE_PRIORITIES from the multi/multi_device_config.hpp file, which defines the same string.
The Three Ways to Specify Devices Targets for the MULTI plugin
- Option 1 - Pass a Prioritized List as a Parameter in ie.load_network()
@sphinxdirective
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_multi.py
:language: python
:fragment: [Option_1]
@endsphinxdirective
- Option 2 - Pass a List as a Parameter, and Dynamically Change Priorities during Execution Notice that the priorities of the devices can be changed in real time for the compiled model:
@sphinxdirective
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_multi.py
:language: python
:fragment: [Option_2]
@endsphinxdirective
- Option 3 - Use Explicit Hints for Controlling Request Numbers Executed by Devices There is a way to specify the number of requests that Multi-Device will internally keep for each device. If the original app was running 4 cameras with 4 inference requests, it might be best to share these 4 requests between 2 devices used in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: “MULTI:CPU(2),GPU(2)” and use the same 4 requests in the app. However, such an explicit configuration is not performance-portable and not recommended. The better way is to configure the individual devices and query the resulting number of requests to be used at the application level. See Configuring the Individual Devices and Creating the Multi-Device On Top.
Enumerating Available Devices
The OpenVINO Runtime API features a dedicated methods to enumerate devices and their capabilities. See the Hello Query Device Python Sample. This is example output from the sample (truncated to device names only):
./hello_query_device
Available devices:
Device: CPU
...
Device: GPU.0
...
Device: GPU.1
...
Device: HDDL
A simple programmatic way to enumerate the devices and use with the multi-device is as follows:
@sphinxdirective
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_multi.py
:language: python
:fragment: [available_devices_1]
@endsphinxdirective
Beyond the trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified. For example, this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample:
...
Device: MYRIAD.1.2-ma2480
...
Device: MYRIAD.1.4-ma2480
So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480". Accordingly, the code that loops over all available devices of "MYRIAD" type only is below:
@sphinxdirective
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_multi.py
:language: python
:fragment: [available_devices_2]
@endsphinxdirective
Configuring the Individual Devices and Creating the Multi-Device On Top
It is possible to configure each individual device as usual and then create the "MULTI" device on top:
@sphinxdirective
.. tab:: Python
.. doxygensnippet:: docs/snippets/ov_multi.py
:language: python
:fragment: [set_property]
@endsphinxdirective
An alternative is to combine all the individual device settings into a single config file and load that, allowing the Multi-Device plugin to parse and apply settings to the right devices. See the code example in the next section.
Note that while the performance of accelerators works well with Multi-Device, the CPU+GPU execution poses some performance caveats, as these devices share power, bandwidth and other resources. For example it is recommended to enable the GPU throttling hint (which saves another CPU thread for CPU inferencing). See the section below titled Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance.
Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance
Every OpenVINO sample that supports the -d (which stands for "device") command-line option transparently accepts Multi-Device. The Benchmark application is the best reference for the optimal usage of Multi-Device. As discussed earlier, you do not need to set up the number of requests, CPU streams or threads because the application provides optimal performance out of the box. Below is an example command to evaluate CPU+GPU performance with the Benchmark application:
benchmark_app –d MULTI:CPU,GPU –m <model>
The Multi-Device plugin supports FP16 IR files. The CPU plugin automatically upconverts it to FP32 and the other devices support it natively. Note that no demos are (yet) fully optimized for Multi-Device, by means of supporting the ov::optimal_number_of_infer_requests property, using the GPU streams/throttling, and so on.
Video: MULTI Plugin
Note
: This video is currently available only for C++, but many of the same concepts apply to Python.
@sphinxdirective .. raw:: html
<iframe allowfullscreen mozallowfullscreen msallowfullscreen oallowfullscreen webkitallowfullscreen width="560" height="315" src="https://www.youtube.com/embed/xbORYFEmrqU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
@endsphinxdirective