[Auto PLUGIN] update Auto docs (#10889)

* update Auto docs

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* update python snippets

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* remove vpu, fix a mistaken in python code

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* update MYRIAD device full name

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* update API name

old API use name Inference Engine API
NEW API usen name OpenVINO Runtime API 2.0

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* update tab name, and code format

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* fix AUTO4 format issue

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* update set_property code

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* auto draft

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* mv code into .cpp and .py

modify the devicelist part accoding to the review

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* remove priority list in code and document

modify the begning of the document
remove perfomance data
remove old API
use compile_model instead of set_property
add a image about cpu accelerate

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* fix mis print and code is not match document

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* try to fix doc build issue

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>

* fix snippets code compile issue

Signed-off-by: Hu, Yuan2 <yuan2.hu@intel.com>
This commit is contained in:
Yuan Hu 2022-03-19 23:25:35 +08:00 committed by GitHub
parent 76fde1f7b0
commit 72e8661157
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 346 additions and 332 deletions

View File

@ -10,41 +10,41 @@
@endsphinxdirective @endsphinxdirective
The Auto-Device plugin, or AUTO, is a virtual device which automatically selects the processing unit to use for inference with OpenVINO™. It chooses from a list of available devices defined by the user and aims at finding the most suitable hardware for the given model. The best device is chosen using the following logic: Auto Device (or `AUTO` in short) is a new special "virtual" or "proxy" device in the OpenVINO toolkit, it doesnt bind to a specific type of HW device. AUTO solves the complexity in application required to code a logic for the HW device selection (through HW devices) and then, on the deducing the best optimization settings on that device. It does this by self-discovering all available accelerators & capabilities in the system, matching to the users performance requirements by respecting new “hints” configuration API to dynamically optimize for latency or throughput respectively. Developer can write application once and deploy anywhere.
For developer who want to limit inference on specific HW candidates, AUTO also provide device priority list as optional property. After developer set device priority list, AUTO will not discover all available accelerators in the system and only try device in list with priority order.
AUTO always choose the best device, if compiling model fails on this device, AUTO will try to compile it on next best device until one of them succeeds.
If priority list is set, AUTO only select devices according to the list.
The best device is chosen using the following logic:
1. Check which supported devices are available. 1. Check which supported devices are available.
2. Check the precision of the input model (for detailed information on precisions read more on the [OPTIMIZATION_CAPABILITIES metric](../IE_PLUGIN_DG/Plugin.md)) 2. Check the precision of the input model (for detailed information on precisions read more on the `ov::device::capabilities`)
3. From the priority list, select the first device capable of supporting the given precision. 3. Select the first device capable of supporting the given precision, as presented in the table below.
4. If the networks precision is FP32 but there is no device capable of supporting it, offload the network to a device supporting FP16. 4. If the models precision is FP32 but there is no device capable of supporting it, offload the model to a device supporting FP16.
@sphinxdirective +----------+------------------------------------------------------+-------------------------------------+
+----------+-------------------------------------------------+-------------------------------------+
| Choice || Supported || Supported | | Choice || Supported || Supported |
| Priority || Device || model precision | | Priority || Device || model precision |
+==========+=================================================+=====================================+ +==========+======================================================+=====================================+
| 1 || dGPU | FP32, FP16, INT8, BIN | | 1 || dGPU | FP32, FP16, INT8, BIN |
| || (e.g. Intel® Iris® Xe MAX) | | | || (e.g. Intel® Iris® Xe MAX) | |
+----------+-------------------------------------------------+-------------------------------------+ +----------+------------------------------------------------------+-------------------------------------+
| 2 | | VPUX | INT8 | | 2 || iGPU | FP32, FP16, BIN |
| | | (e.g. Intel® Movidius® VPU 3700VE) | |
+----------+-------------------------------------------------+-------------------------------------+
| 3 | | iGPU | FP32, FP16, BIN, |
| || (e.g. Intel® UHD Graphics 620 (iGPU)) | | | || (e.g. Intel® UHD Graphics 620 (iGPU)) | |
+----------+-------------------------------------------------+-------------------------------------+ +----------+------------------------------------------------------+-------------------------------------+
| 4 | | Intel® Neural Compute Stick 2 (Intel® NCS2) | FP16 | | 3 || Intel® Movidius™ Myriad™ X VPU | FP16 |
| | | | | || (e.g. Intel® Neural Compute Stick 2 (Intel® NCS2)) | |
+----------+-------------------------------------------------+-------------------------------------+ +----------+------------------------------------------------------+-------------------------------------+
| 5 | | Intel® CPU | FP32, FP16, INT8, BIN | | 4 || Intel® CPU | FP32, FP16, INT8, BIN |
| || (e.g. Intel® Core™ i7-1165G7) | | | || (e.g. Intel® Core™ i7-1165G7) | |
+----------+-------------------------------------------------+-------------------------------------+ +----------+------------------------------------------------------+-------------------------------------+
@endsphinxdirective
To put it simply, when loading the network to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds. For example: What is important, **AUTO starts inference with the CPU by default except the priority list is set and there is no CPU in it**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in compile the model, GPU being the best example, do not impede inference at its initial stages.
If you have dGPU in your system, it will be selected for most jobs (first on the priority list and supports multiple precisions). But if you want to run a WINOGRAD-enabled IR, your CPU will be selected (WINOGRAD optimization is not supported by dGPU). If you have Myriad and IA CPU in your system, Myriad will be selected for FP16 models, but IA CPU will be chosen for FP32 ones.
What is important, **AUTO always starts inference with the CPU**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in loading the network, GPU being the best example, do not impede inference at its initial stages. ![autoplugin_accelerate]
This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to load the network and perform the first inference) is reduced when using AUTO. For example: This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to compile the model and perform the first inference) is reduced when using AUTO. For example:
@sphinxdirective @sphinxdirective
.. code-block:: sh .. code-block:: sh
@ -52,15 +52,13 @@ This mechanism can be easily observed in our Benchmark Application sample ([see
./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128 ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128
@endsphinxdirective @endsphinxdirective
first-inference latency: **2594.29 ms + 9.21 ms**
@sphinxdirective @sphinxdirective
.. code-block:: sh .. code-block:: sh
./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO:CPU,GPU -niter 128 ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO -niter 128
@endsphinxdirective @endsphinxdirective
first-inference latency: **173.13 ms + 13.20 ms** Assume there are CPU and GPU on the machine, first-inference latency of "AUTO" will be better than "GPU".
@sphinxdirective @sphinxdirective
.. note:: .. note::
@ -69,45 +67,32 @@ first-inference latency: **173.13 ms + 13.20 ms**
## Using the Auto-Device Plugin ## Using the Auto-Device Plugin
Inference with AUTO is configured similarly to other plugins: first you configure devices, then load a network to the plugin, and finally, execute inference. Inference with AUTO is configured similarly to other plugins: compile the model on the plugin whth configuration, and finally, execute inference.
Following the OpenVINO™ naming convention, the Auto-Device plugin is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options: Following the OpenVINO™ naming convention, the Auto-Device plugin is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options:
@sphinxdirective @sphinxdirective
+-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +---------------------------+-----------------------------------------------+-----------------------------------------------------------+
| Property | Property values | Description | | Property | Property values | Description |
+=========================+===============================================+===========================================================+ +===========================+===============================================+===========================================================+
| <device candidate list> | | AUTO: <device names> | | Lists the devices available for selection. | | <device candidate list> | | AUTO: <device names> | | Lists the devices available for selection. |
| | | comma-separated, no spaces | | The device sequence will be taken as priority | | | | comma-separated, no spaces | | The device sequence will be taken as priority |
| | | | | from high to low. | | | | | | from high to low. |
| | | | | If not specified, “AUTO” will be used as default | | | | | | If not specified, “AUTO” will be used as default |
| | | | | and all devices will be included. | | | | | | and all devices will be included. |
+-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +---------------------------+-----------------------------------------------+-----------------------------------------------------------+
| ov::device:priorities | | device names | | Specifies the devices for Auto-Device plugin to select. | | ov::device:priorities | | device names | | Specifies the devices for Auto-Device plugin to select. |
| | | comma-separated, no spaces | | The device sequence will be taken as priority | | | | comma-separated, no spaces | | The device sequence will be taken as priority |
| | | | | from high to low. | | | | | | from high to low. |
| | | | | This configuration is optional. | | | | | | This configuration is optional. |
+-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +---------------------------+-----------------------------------------------+-----------------------------------------------------------+
| ov::hint | | THROUGHPUT | | Specifies the performance mode preferred | | ov::hint::performance_mode| | ov::hint::PerformanceMode::LATENCY | | Specifies the performance mode preferred |
| | | LATENCY | | by the application. | | | | ov::hint::PerformanceMode::THROUGHPUT | | by the application. |
+-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +---------------------------+-----------------------------------------------+-----------------------------------------------------------+
| ov::hint:model_priority | | MODEL_PRIORITY_HIGH | | Indicates the priority for a network. | | ov::hint::model_priority | | ov::hint::Priority::HIGH | | Indicates the priority for a model. |
| | | MODEL_PRIORITY_MED | | Importantly! | | | | ov::hint::Priority::MEDIUM | | Importantly! |
| | | MODEL_PRIORITY_LOW | | This property is still not fully supported | | | | ov::hint::Priority::LOW | | This property is still not fully supported |
+-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +---------------------------+-----------------------------------------------+-----------------------------------------------------------+
@endsphinxdirective
@sphinxdirective
.. dropdown:: Click for information on Legacy APIs
For legacy APIs like LoadNetwork/SetConfig/GetConfig/GetMetric:
- replace {ov::device:priorities, "GPU,CPU"} with {"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}
- replace {ov::hint:model_priority, "LOW"} with {"MODEL_PRIORITY", "LOW"}
- InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES is defined as same string "MULTI_DEVICE_PRIORITIES"
- CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU is equal to "GPU,CPU"
- InferenceEngine::PluginConfigParams::KEY_MODEL_PRIORITY is defined as same string "MODEL_PRIORITY"
- InferenceEngine::PluginConfigParams::MODEL_PRIORITY_LOW is defined as same string "LOW"
@endsphinxdirective @endsphinxdirective
### Device candidate list ### Device candidate list
@ -115,117 +100,31 @@ The device candidate list allows users to customize the priority and limit the c
The following commands are accepted by the API: The following commands are accepted by the API:
@sphinxdirective @sphinxdirective
.. tab:: C++ API
.. code-block:: cpp .. tab:: C++
/*** With Inference Engine 2.0 API ***/ .. doxygensnippet:: docs/snippets/AUTO0.cpp
ov::Core core; :language: cpp
:fragment: [part0]
// Read a network in IR, PaddlePaddle, or ONNX format: .. tab:: Python
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
// Load a network to AUTO using the default list of device candidates. .. doxygensnippet:: docs/snippets/ov_auto.py
// The following lines are equivalent: :language: python
ov::CompiledModel model0 = core.compile_model(model); :fragment: [part0]
ov::CompiledModel model1 = core.compile_model(model, "AUTO");
ov::CompiledModel model2 = core.compile_model(model, "AUTO", {});
// You can also specify the devices to be used by AUTO in its selection process.
// The following lines are equivalent:
ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
ov::CompiledModel model4 = core.compile_model(model, "AUTO", {{ov::device::priorities.name(), "GPU,CPU"}});
// the AUTO plugin is pre-configured (globally) with the explicit option:
core.set_property("AUTO", ov::device::priorities("GPU,CPU"));
.. tab:: C++ legacy API
.. code-block:: cpp
/*** With API Prior to 2022.1 Release ***/
InferenceEngine::Core ie;
// Read a network in IR, PaddlePaddle, or ONNX format:
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// Load a network to AUTO using the default list of device candidates.
// The following lines are equivalent:
InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network);
InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO");
InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO", {});
// You can also specify the devices to be used by AUTO in its selection process.
// The following lines are equivalent:
InferenceEngine::ExecutableNetwork exec3 = ie.LoadNetwork(network, "AUTO:GPU,CPU");
InferenceEngine::ExecutableNetwork exec4 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}});
// the AUTO plugin is pre-configured (globally) with the explicit option:
ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}, "AUTO");
.. tab:: Python API
.. code-block:: python
### New IE 2.0 API ###
from openvino.runtime import Core
core = Core()
# Read a network in IR, PaddlePaddle, or ONNX format:
model = core.read_model(model_path)
# Load a network to AUTO using the default list of device candidates.
# The following lines are equivalent:
model = core.compile_model(model=model)
compiled_model = core.compile_model(model=model, device_name="AUTO")
compiled_model = core.compile_model(model=model, device_name="AUTO", config={})
# You can also specify the devices to be used by AUTO in its selection process.
# The following lines are equivalent:
compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU")
compiled_model = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"})
# the AUTO plugin is pre-configured (globally) with the explicit option:
core.set_config(config={"MULTI_DEVICE_PRIORITIES":"CPU,GPU"}, device_name="AUTO")
.. tab:: Python legacy API
.. code-block:: python
### API before 2022.1 ###
from openvino.inference_engine import IECore
ie = IECore()
# Read a network in IR, PaddlePaddle, or ONNX format:
net = ie.read_network(model=path_to_model)
# Load a network to AUTO using the default list of device candidates.
# The following lines are equivalent:
exec_net = ie.load_network(network=net)
exec_net = ie.load_network(network=net, device_name="AUTO")
exec_net = ie.load_network(network=net, device_name="AUTO", config={})
# You can also specify the devices to be used by AUTO in its selection process.
# The following lines are equivalent:
exec_net = ie.load_network(network=net, device_name="AUTO:CPU,GPU")
exec_net = ie.load_network(network=net, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"})
# the AUTO plugin is pre-configured (globally) with the explicit option:
ie.SetConfig(config={"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}, device_name="AUTO");
@endsphinxdirective @endsphinxdirective
To check what devices are present in the system, you can use Device API: To check what devices are present in the system, you can use Device API. For information on how to do it, check [Query device properties and configuration](supported_plugins/config_properties.md)
For C++ API For C++
@sphinxdirective @sphinxdirective
.. code-block:: sh .. code-block:: sh
ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample) ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample)
@endsphinxdirective @endsphinxdirective
For Python API For Python
@sphinxdirective @sphinxdirective
.. code-block:: sh .. code-block:: sh
@ -234,7 +133,7 @@ For Python API
### Performance Hints ### Performance Hints
The `ov::hint` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases. The `ov::hint::performance_mode` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases.
#### ov::hint::PerformanceMode::THROUGHPUT #### ov::hint::PerformanceMode::THROUGHPUT
This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images. This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images.
@ -243,140 +142,59 @@ This mode prioritizes high throughput, balancing between latency and power. It i
This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
Note that currently the `ov::hint` property is supported by CPU and GPU devices only. Note that currently the `ov::hint` property is supported by CPU and GPU devices only.
To enable Performance Hints for your application, use the following code: To enable performance hints for your application, use the following code:
@sphinxdirective @sphinxdirective
.. tab:: C++ API
.. code-block:: cpp .. tab:: C++
ov::Core core; .. doxygensnippet:: docs/snippets/AUTO3.cpp
:language: cpp
:fragment: [part3]
// Read a network in IR, PaddlePaddle, or ONNX format: .. tab:: Python
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
// Load a network to AUTO with Performance Hints enabled: .. doxygensnippet:: docs/snippets/ov_auto.py
// To use the “throughput” mode: :language: python
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "THROUGHPUT"}}); :fragment: [part3]
// or the “latency” mode:
ov::CompiledModel compiledModel1 = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "LATENCY"}});
.. tab:: Python API
.. code-block:: python
from openvino.runtime import Core
core = Core()
# Read a network in IR, PaddlePaddle, or ONNX format:
model = core.read_model(model_path)
# Load a network to AUTO with Performance Hints enabled:
# To use the “throughput” mode:
compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"THROUGHPUT"})
# or the “latency” mode:
compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"LATENCY"})
@endsphinxdirective @endsphinxdirective
### ov::hint::model_priority ### ov::hint::model_priority
The property enables you to control the priorities of networks in the Auto-Device plugin. A high-priority network will be loaded to a supported high-priority device. A lower-priority network will not be loaded to a device that is occupied by a higher-priority network. The property enables you to control the priorities of models in the Auto-Device plugin. A high-priority model will be loaded to a supported high-priority device. A lower-priority model will not be loaded to a device that is occupied by a higher-priority model.
@sphinxdirective @sphinxdirective
.. tab:: C++ API
.. code-block:: cpp .. tab:: C++
// Example 1 .. doxygensnippet:: docs/snippets/AUTO4.cpp
// Compile and load networks: :language: cpp
ov::CompiledModel compiled_model0 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "HIGH"}}); :fragment: [part4]
ov::CompiledModel compiled_model1 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}});
ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}});
/************ .. tab:: Python
Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
************/
// Example 2 .. doxygensnippet:: docs/snippets/ov_auto.py
// Compile and load networks: :language: python
ov::CompiledModel compiled_model3 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}}); :fragment: [part4]
ov::CompiledModel compiled_model4 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}});
ov::CompiledModel compiled_model5 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}});
/************
Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
************/
.. tab:: Python API
.. code-block:: python
# Example 1
# Compile and load networks:
compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"0"})
compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"})
compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
# Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
# Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model3 will use CPU.
# Example 2
# Compile and load networks:
compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"})
compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"})
# Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
# Result: compiled_model0 will use GPU, compiled_model1 will use GPU, compiled_model3 will use MYRIAD.
@endsphinxdirective @endsphinxdirective
## Configuring Individual Devices and Creating the Auto-Device plugin on Top ## Configuring Individual Devices and Creating the Auto-Device plugin on Top
Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes VPUX or Myriad (devices uncapable of utilizing the Performance Hints option). Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes Myriad (devices uncapable of utilizing the Performance Hints option).
@sphinxdirective @sphinxdirective
.. tab:: C++ API
.. code-block:: cpp .. tab:: C++
ovCore core; .. doxygensnippet:: docs/snippets/AUTO5.cpp
:language: cpp
:fragment: [part5]
// Read a network in IR, PaddlePaddle, or ONNX format .. tab:: Python
stdshared_ptrovModel model = core.read_model(sample.xml);
// Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin .. doxygensnippet:: docs/snippets/ov_auto.py
set VPU config :language: python
core.set_property(VPUX, {}); :fragment: [part5]
// set MYRIAD config
core.set_property(MYRIAD, {});
ovCompiledModel compiled_model = core.compile_model(model, AUTO);
.. tab:: Python API
.. code-block:: python
from openvino.runtime import Core
core = Core()
# Read a network in IR, PaddlePaddle, or ONNX format:
model = core.read_model(model_path)
# Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin:
core.set_config(config=vpux_config, device_name="VPUX")
core.set_config (config=vpux_config, device_name="MYRIAD")
compiled_model = core.compile_model(model=model)
# Alternatively, you can combine the individual device settings into one configuration and load the network.
# The AUTO plugin will parse and apply the settings to the right devices.
# The 'device_name' of "AUTO:VPUX,MYRIAD" will configure auto-device to use devices.
compiled_model = core.compile_model(model=model, device_name=device_name, config=full_config)
# To query the optimization capabilities:
device_cap = core.get_metric("CPU", "OPTIMIZATION_CAPABILITIES")
@endsphinxdirective @endsphinxdirective
<a name="Benchmark App Info"></a> <a name="Benchmark App Info"></a>
@ -408,3 +226,6 @@ For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.
No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on. No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on.
@endsphinxdirective @endsphinxdirective
[autoplugin_accelerate]: ../img/autoplugin_accelerate.png

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ba092e65d9c5c6fb585c4a394ba3e6a913bf4f129a386b7a8664b94aeb47878b
size 61218

View File

@ -1,12 +1,28 @@
#include <ie_core.hpp> #include <openvino/openvino.hpp>
int main() { int main() {
{
//! [part0] //! [part0]
InferenceEngine::Core ie; ov::Core core;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// these 2 lines below are equivalent // Read a network in IR, PaddlePaddle, or ONNX format:
InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO"); std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "");
// compile a model on AUTO using the default list of device candidates.
// The following lines are equivalent:
ov::CompiledModel model0 = core.compile_model(model);
ov::CompiledModel model1 = core.compile_model(model, "AUTO");
// Optional
// You can also specify the devices to be used by AUTO.
// The following lines are equivalent:
ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
ov::CompiledModel model4 = core.compile_model(model, "AUTO", ov::device::priorities("GPU,CPU"));
//Optional
// the AUTO plugin is pre-configured (globally) with the explicit option:
core.set_property("AUTO", ov::device::priorities("GPU,CPU"));
//! [part0] //! [part0]
}
return 0; return 0;
} }

View File

@ -1,15 +1,30 @@
#include <ie_core.hpp> #include <ie_core.hpp>
int main() { int main() {
{
//! [part1] //! [part1]
// Inference Engine API
InferenceEngine::Core ie; InferenceEngine::Core ie;
// Read a network in IR, PaddlePaddle, or ONNX format:
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml"); InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// "AUTO" plugin is (globally) pre-configured with the explicit option:
ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}}, "AUTO"); // Load a network to AUTO using the default list of device candidates.
// the below 3 lines are equivalent (the first line leverages the pre-configured AUTO, while second and third explicitly pass the same settings) // The following lines are equivalent:
InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO", {}); InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network);
InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}}); InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO");
InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO:CPU,GPU"); InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO", {});
// Optional
// You can also specify the devices to be used by AUTO in its selection process.
// The following lines are equivalent:
InferenceEngine::ExecutableNetwork exec3 = ie.LoadNetwork(network, "AUTO:GPU,CPU");
InferenceEngine::ExecutableNetwork exec4 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}});
// Optional
// the AUTO plugin is pre-configured (globally) with the explicit option:
ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}, "AUTO");
//! [part1] //! [part1]
}
return 0; return 0;
} }

View File

@ -1,10 +1,12 @@
#include <ie_core.hpp> #include <ie_core.hpp>
int main() { int main() {
{
//! [part2] //! [part2]
InferenceEngine::Core ie; InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml"); InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO"); InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
//! [part2] //! [part2]
}
return 0; return 0;
} }

View File

@ -1,10 +1,22 @@
#include <ie_core.hpp> #include <openvino/openvino.hpp>
int main() { int main() {
{
//! [part3] //! [part3]
InferenceEngine::Core ie; ov::Core core;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO:CPU,GPU"); // Read a network in IR, PaddlePaddle, or ONNX format:
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
// compile a model on AUTO with Performance Hints enabled:
// To use the “throughput” mode:
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO",
ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
// or the “latency” mode:
ov::CompiledModel compiled_mode2 = core.compile_model(model, "AUTO",
ov::hint::performance_mode(ov::hint::PerformanceMode::LATENCY));
//! [part3] //! [part3]
}
return 0; return 0;
} }

View File

@ -1,19 +1,36 @@
#include <ie_core.hpp> #include <openvino/openvino.hpp>
int main() { int main() {
const std::map<std::string, std::string> cpu_config = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } }; ov::Core core;
const std::map<std::string, std::string> gpu_config = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } };
// Read a network in IR, PaddlePaddle, or ONNX format:
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
{
//! [part4] //! [part4]
InferenceEngine::Core ie; // Example 1
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml"); ov::CompiledModel compiled_model0 = core.compile_model(model, "AUTO",
// configure the CPU device first ov::hint::model_priority(ov::hint::Priority::HIGH));
ie.SetConfig(cpu_config, "CPU"); ov::CompiledModel compiled_model1 = core.compile_model(model, "AUTO",
// configure the GPU device ov::hint::model_priority(ov::hint::Priority::MEDIUM));
ie.SetConfig(gpu_config, "GPU"); ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO",
// load the network to the auto-device ov::hint::model_priority(ov::hint::Priority::LOW));
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO"); /************
// new metric allows to query the optimization capabilities Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES)); Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
************/
// Example 2
ov::CompiledModel compiled_model3 = core.compile_model(model, "AUTO",
ov::hint::model_priority(ov::hint::Priority::LOW));
ov::CompiledModel compiled_model4 = core.compile_model(model, "AUTO",
ov::hint::model_priority(ov::hint::Priority::MEDIUM));
ov::CompiledModel compiled_model5 = core.compile_model(model, "AUTO",
ov::hint::model_priority(ov::hint::Priority::LOW));
/************
Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
************/
//! [part4] //! [part4]
}
return 0; return 0;
} }

View File

@ -1,15 +1,18 @@
#include <ie_core.hpp> #include <openvino/openvino.hpp>
int main() { int main() {
std::string device_name = "AUTO:CPU,GPU"; ov::AnyMap cpu_config = {};
const std::map< std::string, std::string > full_config = {}; ov::AnyMap myriad_config = {};
//! [part5] //! [part5]
InferenceEngine::Core ie; ov::Core core;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// 'device_name' can be "AUTO:CPU,GPU" to configure the auto-device to use CPU and GPU // Read a network in IR, PaddlePaddle, or ONNX format:
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, device_name, full_config); std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
// new metric allows to query the optimization capabilities
std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES)); // Configure CPU and the MYRIAD devices when compiled model
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO",
ov::device::properties("CPU", cpu_config),
ov::device::properties("MYRIAD", myriad_config));
//! [part5] //! [part5]
return 0; return 0;
} }

20
docs/snippets/AUTO6.cpp Normal file
View File

@ -0,0 +1,20 @@
#include <openvino/openvino.hpp>
int main() {
{
//! [part6]
ov::Core core;
// read a network in IR, PaddlePaddle, or ONNX format
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
// load a network to AUTO and set log level to debug
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO", ov::log::level(ov::log::Level::DEBUG));
// or set log level with set_config and load network
core.set_property("AUTO", ov::log::level(ov::log::Level::DEBUG));
ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO");
//! [part6]
}
return 0;
}

105
docs/snippets/ov_auto.py Normal file
View File

@ -0,0 +1,105 @@
import sys
from openvino.runtime import Core
from openvino.inference_engine import IECore
model_path = "/openvino_CI_CD/result/install_pkg/tests/test_model_zoo/core/models/ir/add_abc.xml"
path_to_model = "/openvino_CI_CD/result/install_pkg/tests/test_model_zoo/core/models/ir/add_abc.xml"
def part0():
#! [part0]
core = Core()
# Read a network in IR, PaddlePaddle, or ONNX format:
model = core.read_model(model_path)
# compile a model on AUTO using the default list of device candidates.
# The following lines are equivalent:
compiled_model = core.compile_model(model=model)
compiled_model = core.compile_model(model=model, device_name="AUTO")
# Optional
# You can also specify the devices to be used by AUTO.
# The following lines are equivalent:
compiled_model = core.compile_model(model=model, device_name="AUTO:GPU,CPU")
compiled_model = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
# Optional
# the AUTO plugin is pre-configured (globally) with the explicit option:
core.set_property(device_name="AUTO", properties={"MULTI_DEVICE_PRIORITIES":"GPU,CPU"})
#! [part0]
def part1():
#! [part1]
### IE API ###
ie = IECore()
# Read a network in IR, PaddlePaddle, or ONNX format:
net = ie.read_network(model=path_to_model)
# Load a network to AUTO using the default list of device candidates.
# The following lines are equivalent:
exec_net = ie.load_network(network=net)
exec_net = ie.load_network(network=net, device_name="AUTO")
exec_net = ie.load_network(network=net, device_name="AUTO", config={})
# Optional
# You can also specify the devices to be used by AUTO in its selection process.
# The following lines are equivalent:
exec_net = ie.load_network(network=net, device_name="AUTO:GPU,CPU")
exec_net = ie.load_network(network=net, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})
# Optional
# the AUTO plugin is pre-configured (globally) with the explicit option:
ie.set_config(config={"MULTI_DEVICE_PRIORITIES":"GPU,CPU"}, device_name="AUTO");
#! [part1]
def part3():
#! [part3]
core = Core()
# Read a network in IR, PaddlePaddle, or ONNX format:
model = core.read_model(model_path)
# compile a model on AUTO with Performance Hints enabled:
# To use the “throughput” mode:
compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})
# or the “latency” mode:
compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"LATENCY"})
#! [part3]
def part4():
#! [part4]
core = Core()
model = core.read_model(model_path)
# Example 1
compiled_model0 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"HIGH"})
compiled_model1 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"MEDIUM"})
compiled_model2 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"LOW"})
# Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
# Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU.
# Example 2
compiled_model3 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"HIGH"})
compiled_model4 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"MEDIUM"})
compiled_model5 = core.compile_model(model=model, device_name="AUTO", config={"MODEL_PRIORITY":"LOW"})
# Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks.
# Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD.
#! [part4]
def part5():
#! [part5]
core = Core()
model = core.read_model(model_path)
core.set_property(device_name="CPU", properties={})
core.set_property(device_name="MYRIAD", properties={})
compiled_model = core.compile_model(model=model)
compiled_model = core.compile_model(model=model, device_name="AUTO")
#! [part5]
def main():
part0()
part1()
part3()
part4()
part5()
if __name__ == '__main__':
sys.exit(main())