Files
openvino/docs/OV_Runtime_UG/auto_device_selection.md
Ilya Lavrenov a883dc0b85 DOCS: ported changes from 2022.1 release branch (#11206)
* Extensibility guide with FE extensions and remove OV_FRAMEWORK_MAP from docs

* Rework of Extensibility Intro, adopted examples to missing OPENVINO_FRAMEWORK_MAP

* Removed OPENVINO_FRAMEWORK_MAP reference

* Frontend extension detailed documentation

* Fixed distributed snippets

* Fixed snippet inclusion in FE extension document and chapter headers

* Fixed wrong name in a snippet reference

* Fixed test for template extension due to changed number of loaded extensions

* Update docs/Extensibility_UG/frontend_extensions.md

Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com>

* Minor fixes in extension snippets

* Small grammar fix

Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com>

Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com>

* DOCS: transition banner (#10973)

* transition banner

* minor fix

* update transition banner

* updates

* update custom.js

* updates

* updates

* Documentation fixes (#11044)

* Benchmark app usage

* Fixed link to the devices

* More fixes

* Update docs/OV_Runtime_UG/multi_device.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Removed several hardcoded links

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Updated documentation for compile_tool (#11049)

* Added deployment guide (#11060)

* Added deployment guide

* Added local distribution

* Updates

* Fixed more indentations

* Removed obsolete code snippets (#11061)

* Removed obsolete code snippets

* NCC style

* Fixed NCC for BA

* Add a troubleshooting issue for PRC installation (#11074)

* updates

* adding gna to linux

* add missing reference

* update

* Update docs/install_guides/installing-model-dev-tools.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Update docs/install_guides/installing-model-dev-tools.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Update docs/install_guides/installing-model-dev-tools.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Update docs/install_guides/installing-model-dev-tools.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* Update docs/install_guides/installing-model-dev-tools.md

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* update

* minor updates

* add gna item to yum and apt

* add gna to get started page

* update reference formatting

* merge commit

* add a troubleshooting issue

* update

* update

* fix CVS-71846

Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>

* DOCS: fixed hardcoded links  (#11100)

* Fixes

* Use links

* applying reviewers comments to the Opt Guide (#11093)

* applying reviewrs comments

* fixed refs, more structuring (bold, bullets, etc)

* refactoring tput/latency sections

* next iteration (mostly latency), also brushed the auto-batching and other sections

* updates sync/async images

* common opts brushed

* WIP tput redesigned

* minor brushing of common and auto-batching

* Tput fully refactored

* fixed doc name in the link

* moved int8 perf counters to the right section

* fixed links

* fixed broken quotes

* fixed more links

* add ref to the internals to the TOC

* Added a note on the batch size

Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com>

* [80085] New images for docs (#11114)

* change doc structure

* fix manager tools

* fix manager tools 3 step

* fix manager tools 3 step

* new img

* new img for OV Runtime

* fix steps

* steps

* fix intendents

* change list

* fix space

* fix space

* code snippets fix

* change display

* Benchmarks 2022 1 (#11130)

* Minor fixes

* Updates for 2022.1

* Edits according to the review

* Edits according to review comments

* Edits according to review comments

* Edits according to review comments

* Fixed table

* Edits according to review comments

* Removed config for Intel® Core™ i7-11850HE

* Removed forward-tacotron-duration-prediction-241 graph

* Added resnet-18-pytorch

* Add info about Docker images in Deployment guide (#11136)

* Renamed user guides (#11137)

* fix screenshot (#11140)

* More conservative recommendations on dynamic shapes usage in docs (#11161)

* More conservative recommendations about using dynamic shapes

* Duplicated statement from C++ part to Python part of reshape doc (no semantical changes)

* Update ShapeInference.md (#11168)

* Benchmarks 2022 1 updates (#11180)

* Updated graphs

* Quick fix for TODO in Dynamic Shapes article

* Anchor link fixes

* Fixed DM config (#11199)

* DOCS: doxy sphinxtabs (#11027)

* initial implementation of doxy sphinxtabs

* fixes

* fixes

* fixes

* fixes

* fixes

* WA for ignored visibility attribute

* Fixes

Co-authored-by: Sergey Lyalin <sergey.lyalin@intel.com>
Co-authored-by: Ivan Tikhonov <ivan.tikhonov@intel.com>
Co-authored-by: Nikolay Tyukaev <nikolay.tyukaev@intel.com>
Co-authored-by: Sergey Lyubimtsev <sergey.lyubimtsev@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
Co-authored-by: Andrey Zaytsev <andrey.zaytsev@intel.com>
Co-authored-by: Tatiana Savina <tatiana.savina@intel.com>
Co-authored-by: Ilya Naumov <ilya.naumov@intel.com>
Co-authored-by: Evgenya Stepyreva <evgenya.stepyreva@intel.com>
2022-03-24 22:27:29 +03:00

13 KiB
Raw Blame History

Automatic device selection

@sphinxdirective

.. toctree:: :maxdepth: 1 :hidden:

Debugging Auto-Device Plugin <openvino_docs_IE_DG_supported_plugins_AUTO_debugging>

@endsphinxdirective

Auto Device (or AUTO in short) is a new special "virtual" or "proxy" device in the OpenVINO toolkit, it doesnt bind to a specific type of HW device. AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device. It does this by self-discovering all available accelerators & capabilities in the system, matching to the users performance requirements by respecting new “hints” configuration API to dynamically optimize for latency or throughput respectively. Developer can write application once and deploy anywhere. For developer who want to limit inference on specific HW candidates, AUTO also provide device priority list as optional property. After developer set device priority list, AUTO will not discover all available accelerators in the system and only try device in list with priority order.

AUTO always choose the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds. If priority list is set, AUTO only select devices according to the list.

The best device is chosen using the following logic:

  1. Check which supported devices are available.
  2. Check the precision of the input model (for detailed information on precisions read more on the ov::device::capabilities)
  3. Select the first device capable of supporting the given precision, as presented in the table below.
  4. If the models precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.

+----------+------------------------------------------------------+-------------------------------------+ | Choice || Supported || Supported | | Priority || Device || model precision | +==========+======================================================+=====================================+ | 1 || dGPU || FP32, FP16, INT8, BIN | | || (e.g. Intel® Iris® Xe MAX) || | +----------+------------------------------------------------------+-------------------------------------+ | 2 || iGPU || FP32, FP16, BIN | | || (e.g. Intel® UHD Graphics 620 (iGPU)) || | +----------+------------------------------------------------------+-------------------------------------+ | 3 || Intel® Movidius™ Myriad™ X VPU || FP16 | | || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2)) || | +----------+------------------------------------------------------+-------------------------------------+ | 4 || Intel® CPU || FP32, FP16, INT8, BIN | | || (e.g. Intel® Core™ i7-1165G7) || | +----------+------------------------------------------------------+-------------------------------------+

What is important, AUTO starts inference with the CPU by default except the priority list is set and there is no CPU in it. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in compile the model, GPU being the best example, do not impede inference at its initial stages.

autoplugin_accelerate

This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example:

@sphinxdirective .. code-block:: sh

./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128 @endsphinxdirective

@sphinxdirective .. code-block:: sh

./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO -niter 128 @endsphinxdirective

Assume there are CPU and GPU on the machine, first-inference latency of "AUTO" will be better than "GPU".

@sphinxdirective .. note:: The realtime performance will be closer to the best suited device the longer the process runs. @endsphinxdirective

Using the Auto-Device Plugin

Inference with AUTO is configured similarly to other plugins: compile the model on the plugin whth configuration, and finally, execute inference.

Following the OpenVINO™ naming convention, the Auto-Device plugin is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:

@sphinxdirective +---------------------------+-----------------------------------------------+-----------------------------------------------------------+ | Property | Property values | Description | +===========================+===============================================+===========================================================+ | | | AUTO: | | Lists the devices available for selection. | | | | comma-separated, no spaces | | The device sequence will be taken as priority | | | | | | from high to low. | | | | | | If not specified, “AUTO” will be used as default | | | | | | and all devices will be included. | +---------------------------+-----------------------------------------------+-----------------------------------------------------------+ | ov::device:priorities | | device names | | Specifies the devices for Auto-Device plugin to select. | | | | comma-separated, no spaces | | The device sequence will be taken as priority | | | | | | from high to low. | | | | | | This configuration is optional. | +---------------------------+-----------------------------------------------+-----------------------------------------------------------+ | ov::hint::performance_mode| | ov::hint::PerformanceMode::LATENCY | | Specifies the performance mode preferred | | | | ov::hint::PerformanceMode::THROUGHPUT | | by the application. | +---------------------------+-----------------------------------------------+-----------------------------------------------------------+ | ov::hint::model_priority | | ov::hint::Priority::HIGH | | Indicates the priority for a model. | | | | ov::hint::Priority::MEDIUM | | Importantly! | | | | ov::hint::Priority::LOW | | This property is still not fully supported | +---------------------------+-----------------------------------------------+-----------------------------------------------------------+ @endsphinxdirective

Device candidate list

The device candidate list allows users to customize the priority and limit the choice of devices available to the AUTO plugin. If not specified, the plugin assumes all the devices present in the system can be used. Note, that OpenVINO™ Runtime lets you use “GPU” as an alias for “GPU.0” in function calls. The following commands are accepted by the API:

@sphinxdirective

.. tab:: C++

.. doxygensnippet:: docs/snippets/AUTO0.cpp
   :language: cpp
   :fragment: [part0]

.. tab:: Python

.. doxygensnippet:: docs/snippets/ov_auto.py
   :language: python
   :fragment: [part0]

@endsphinxdirective

To check what devices are present in the system, you can use Device API. For information on how to do it, check Query device properties and configuration

For C++ @sphinxdirective .. code-block:: sh

ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample) @endsphinxdirective

For Python @sphinxdirective .. code-block:: sh

openvino.runtime.Core.available_devices (see Hello Query Device Python Sample) @endsphinxdirective

Performance Hints

The ov::hint::performance_mode property enables you to specify a performance mode for the plugin to be more efficient for particular use cases.

ov::hint::PerformanceMode::THROUGHPUT

This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images.

ov::hint::PerformanceMode::LATENCY

This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. Note that currently the ov::hint property is supported by CPU and GPU devices only.

To enable performance hints for your application, use the following code: @sphinxdirective

.. tab:: C++

.. doxygensnippet:: docs/snippets/AUTO3.cpp
   :language: cpp
   :fragment: [part3]

.. tab:: Python

.. doxygensnippet:: docs/snippets/ov_auto.py
   :language: python
   :fragment: [part3]

@endsphinxdirective

ov::hint::model_priority

The property enables you to control the priorities of models in the Auto-Device plugin. A high-priority model will be loaded to a supported high-priority device. A lower-priority model will not be loaded to a device that is occupied by a higher-priority model.

@sphinxdirective

.. tab:: C++

.. doxygensnippet:: docs/snippets/AUTO4.cpp
   :language: cpp
   :fragment: [part4]

.. tab:: Python

.. doxygensnippet:: docs/snippets/ov_auto.py
   :language: python
   :fragment: [part4]

@endsphinxdirective

Configuring Individual Devices and Creating the Auto-Device plugin on Top

Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes Myriad (devices uncapable of utilizing the Performance Hints option).

@sphinxdirective

.. tab:: C++

.. doxygensnippet:: docs/snippets/AUTO5.cpp
   :language: cpp
   :fragment: [part5]

.. tab:: Python

.. doxygensnippet:: docs/snippets/ov_auto.py
   :language: python
   :fragment: [part5]

@endsphinxdirective

Using AUTO with OpenVINO™ Samples and the Benchmark App

To see how the Auto-Device plugin is used in practice and test its performance, take a look at OpenVINO™ samples. All samples supporting the "-d" command-line option (which stands for "device") will accept the plugin out-of-the-box. The Benchmark Application will be a perfect place to start it presents the optimal performance of the plugin without the need for additional settings, like the number of requests or CPU threads. To evaluate the AUTO performance, you can use the following commands:

For unlimited device choice: @sphinxdirective .. code-block:: sh

benchmark_app d AUTO m -i -niter 1000 @endsphinxdirective

For limited device choice: @sphinxdirective .. code-block:: sh

benchmark_app d AUTO:CPU,GPU,MYRIAD m -i -niter 1000 @endsphinxdirective

For more information, refer to the C++ or Python version instructions.

@sphinxdirective .. note::

The default CPU stream is 1 if using “-d AUTO”.

You can use the FP16 IR to work with auto-device.

No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on. @endsphinxdirective