[Doc] Add doc for Auto-Device Plugin 2021.4 (#5982)

* [Doc] Add doc for Auto-Device Plugin 2021.4 (#6190)

* Add doc for Auto-Device Plugin

Signed-off-by: Zhengtian Xie <zhengtian.xie@intel.com>

* Update doc for auto-device plugin

Signed-off-by: Zhengtian Xie <zhengtian.xie@intel.com>

* Update auto-device plugin doc

* Add openvino_docs_IE_DG_supported_plugins_AUTO into web page

Signed-off-by: Zhengtian Xie <zhengtian.xie@intel.com>

* Update AUTO.md

Co-authored-by: Maxim Shevtsov <maxim.y.shevtsov@intel.com>
This commit is contained in:
Xie Zhengtian 2021-08-13 19:23:44 +08:00 committed by GitHub
parent d1e5e848b4
commit 79290a7dc0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
9 changed files with 212 additions and 1 deletions

View File

@ -0,0 +1,128 @@
# Auto-Device Plugin {#openvino_docs_IE_DG_supported_plugins_AUTO}
## Auto-Device Plugin Execution
Auto-device is a new special "virtual" or "proxy" device in the OpenVINO™ toolkit.
Use "AUTO" as the device name to delegate selection of an actual accelerator to OpenVINO.
With the 2021.4 release, Auto-device internally recognizes and selects devices from CPU,
integrated GPU and discrete Intel GPUs (when available) depending on the device capabilities and the characteristic of CNN models,
for example, precisions. Then Auto-device assigns inference requests to the selected device.
From the application point of view, this is just another device that handles all accelerators in full system.
With the 2021.4 release, Auto-device setup is done in three major steps:
* Step 1: Configure each device as usual (for example, via the conventional <code>SetConfig</code> method)
* Step 2: Load a network to the Auto-device plugin. This is the only change needed in your application
* Step 3: Just like with any other executable network (resulted from <code>LoadNetwork</code>), create as many requests as needed to saturate the devices.
These steps are covered below in details.
## Defining and Configuring the Auto-Device Plugin
Following the OpenVINO notions of “devices”, the Auto-device has “AUTO” name. The only configuration option for Auto-device is a limited device list:
| Parameter name | Parameter values | Default | Description |
| :--- | :--- | :--- |:-----------------------------------------------------------------------------|
| "AUTO_DEVICE_LIST" | comma-separated device names <span style="color:red">with no spaces</span>| N/A | Device candidate list to be selected |
You can use the configuration name directly as a string or use <code>IE::KEY_AUTO_DEVICE_LIST</code> from <code>ie_plugin_config.hpp</code>,
which defines the same string.
There are two ways to use Auto-device:
1. Directly indicate device by “AUTO” or empty string:
@snippet snippets/AUTO0.cpp part0
2. Use Auto-device configuration to limit the device candidates list to be selected:
@snippet snippets/AUTO1.cpp part1
Auto-device supports query device optimization capabilities in metric;
| Parameter name | Parameter values |
| :--- | :--- |
| "OPTIMIZATION_CAPABILITIES" | Auto-Device capabilities |
## Enumerating Available Devices and Auto-Device Selecting Logic
### Enumerating Available Devices
Inference Engine now features a dedicated API to enumerate devices and their capabilities.
See [Hello Query Device C++ Sample](../../../inference-engine/samples/hello_query_device/README.md).
This is the example output from the sample (truncated to the devices' names only):
```sh
./hello_query_device
Available devices:
Device: CPU
...
Device: GPU.0
...
Device: GPU.1
```
### Default Auto-Device selecting logic
With the 2021.4 release, Auto-Device selects the most suitable device with following default logic:
1. Check if dGPU, iGPU and CPU device are available
2. Get the precision of the input model, such as FP32
3. According to the priority of dGPU, iGPU and CPU (in this order), if the device supports the precision of input network, select it as the most suitable device
For example, CPU, dGPU and iGPU can support below precision and optimization capabilities:
| Device | OPTIMIZATION_CAPABILITIES |
| :--- | :--- |
| CPU | WINOGRAD FP32 FP16 INT8 BIN |
| dGPU | FP32 BIN BATCHED_BLOB FP16 INT8 |
| iGPU | FP32 BIN BATCHED_BLOB FP16 INT8 |
When application use Auto-device to run FP16 IR on system with CPU, dGPU and iGPU, Auto-device will offload this workload to dGPU.
When application use Auto-device to run FP16 IR on system with CPU and iGPU, Auto-device will offload this workload to iGPU.
When application use Auto-device to run WINOGRAD-enabled IR on system with CPU, dGPU and iGPU, Auto-device will offload this workload to CPU.
In any case, when loading the network to dGPU or iGPU fails, the networks falls back to CPU as the last choice.
### Limit Auto Target Devices Logic
According to the Auto-device selection logic from the previous section,
the most suitable device from available devices to load mode as follows:
@snippet snippets/AUTO2.cpp part2
Another way to load mode to device from limited choice of devices is with Auto-device:
@snippet snippets/AUTO3.cpp part3
## Configuring the Individual Devices and Creating the Auto-Device on Top
As described in the first section, configure each individual device as usual and then just create the "AUTO" device on top:
@snippet snippets/AUTO4.cpp part4
Alternatively, you can combine all the individual device settings into single config and load it,
allowing the Auto-device plugin to parse and apply it to the right devices. See the code example here:
@snippet snippets/AUTO5.cpp part5
## Using the Auto-Device with OpenVINO Samples and Benchmark App
Note that every OpenVINO sample that supports "-d" (which stands for "device") command-line option transparently accepts the Auto-device.
The Benchmark Application is the best example of the optimal usage of the Auto-device.
You do not need to set the number of requests and CPU threads, as the application provides optimal out-of-the-box performance.
Below is the example command-line to evaluate AUTO performance with that:
```sh
./benchmark_app d AUTO m <model> -i <input> -niter 1000
```
You can also use the auto-device with limit device choice:
```sh
./benchmark_app d AUTO:CPU,GPU m <model> -i <input> -niter 1000
```
Note that the default CPU stream is 1 if using “-d AUTO”.
Note that you can use the FP16 IR to work with auto-device.
Also note that no demos are (yet) fully optimized for the auto-device, by means of selecting the most suitable device,
using the GPU streams/throttling, and so on.

View File

@ -13,7 +13,8 @@ The Inference Engine provides unique capabilities to infer deep learning models
|[CPU plugin](CPU.md) |Intel&reg; Xeon&reg; with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel&reg; Core&trade; Processors with Intel&reg; AVX2, Intel&reg; Atom&reg; Processors with Intel® Streaming SIMD Extensions (Intel® SSE) |
|[VPU plugins](VPU.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs |
|[GNA plugin](GNA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel&reg; Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel&reg; Pentium&reg; Silver J5005 Processor, Intel&reg; Pentium&reg; Silver N5000 Processor, Intel&reg; Celeron&reg; J4005 Processor, Intel&reg; Celeron&reg; J4105 Processor, Intel&reg; Celeron&reg; Processor N4100, Intel&reg; Celeron&reg; Processor N4000, Intel&reg; Core&trade; i3-8121U Processor, Intel&reg; Core&trade; i7-1065G7 Processor, Intel&reg; Core&trade; i7-1060G7 Processor, Intel&reg; Core&trade; i5-1035G4 Processor, Intel&reg; Core&trade; i5-1035G7 Processor, Intel&reg; Core&trade; i5-1035G1 Processor, Intel&reg; Core&trade; i5-1030G7 Processor, Intel&reg; Core&trade; i5-1030G4 Processor, Intel&reg; Core&trade; i3-1005G1 Processor, Intel&reg; Core&trade; i3-1000G1 Processor, Intel&reg; Core&trade; i3-1000G4 Processor|
|[Multi-Device plugin](MULTI.md) |Multi-Device plugin enables simultaneous inference of the same network on several Intel&reg; devices in parallel |
|[Multi-Device plugin](MULTI.md) |Multi-Device plugin enables simultaneous inference of the same network on several Intel&reg; devices in parallel |
|[Auto-Device plugin](AUTO.md) |Auto-Device plugin enables selecting Intel&reg; device for inference automatically |
|[Heterogeneous plugin](HETERO.md) |Heterogeneous plugin enables automatic inference splitting between several Intel&reg; devices (for example if a device doesn't [support certain layers](#supported-layers)). |
Devices similar to the ones we have used for benchmarking can be accessed using [Intel® DevCloud for the Edge](https://devcloud.intel.com/edge/), a remote development environment with access to Intel® hardware and the latest versions of the Intel® Distribution of the OpenVINO™ Toolkit. [Learn more](https://devcloud.intel.com/edge/get_started/devcloud/) or [Register here](https://inteliot.force.com/DevcloudForEdge/s/).

View File

@ -326,6 +326,7 @@ limitations under the License.
</tab>
<tab type="user" title="Heterogeneous Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_HETERO"/>
<tab type="user" title="Multi-Device Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_MULTI"/>
<tab type="user" title="Auto-Device Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_AUTO"/>
<tab type="user" title="GNA Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_GNA"/>
</tab>
<tab type="user" title="Known Issues" url="@ref openvino_docs_IE_DG_Known_Issues_Limitations"/>

12
docs/snippets/AUTO0.cpp Normal file
View File

@ -0,0 +1,12 @@
#include <ie_core.hpp>
int main() {
//! [part0]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// these 2 lines below are equivalent
InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO");
InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "");
//! [part0]
return 0;
}

15
docs/snippets/AUTO1.cpp Normal file
View File

@ -0,0 +1,15 @@
#include <ie_core.hpp>
int main() {
//! [part1]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// "AUTO" plugin is (globally) pre-configured with the explicit option:
ie.SetConfig({{"AUTO_DEVICE_LIST", "CPU,GPU"}}, "AUTO");
// the below 3 lines are equivalent (the first line leverages the pre-configured AUTO, while second and third explicitly pass the same settings)
InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network, "AUTO", {});
InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO", {{"AUTO_DEVICE_LIST", "CPU,GPU"}});
InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO:CPU,GPU");
//! [part1]
return 0;
}

10
docs/snippets/AUTO2.cpp Normal file
View File

@ -0,0 +1,10 @@
#include <ie_core.hpp>
int main() {
//! [part2]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
//! [part2]
return 0;
}

10
docs/snippets/AUTO3.cpp Normal file
View File

@ -0,0 +1,10 @@
#include <ie_core.hpp>
int main() {
//! [part3]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO:CPU,GPU");
//! [part3]
return 0;
}

19
docs/snippets/AUTO4.cpp Normal file
View File

@ -0,0 +1,19 @@
#include <ie_core.hpp>
int main() {
const std::map<std::string, std::string> cpu_config = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } };
const std::map<std::string, std::string> gpu_config = { { InferenceEngine::PluginConfigParams::KEY_PERF_COUNT, InferenceEngine::PluginConfigParams::YES } };
//! [part4]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// configure the CPU device first
ie.SetConfig(cpu_config, "CPU");
// configure the GPU device
ie.SetConfig(gpu_config, "GPU");
// load the network to the auto-device
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, "AUTO");
// new metric allows to query the optimization capabilities
std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES));
//! [part4]
return 0;
}

15
docs/snippets/AUTO5.cpp Normal file
View File

@ -0,0 +1,15 @@
#include <ie_core.hpp>
int main() {
std::string device_name = "AUTO:CPU,GPU";
const std::map< std::string, std::string > full_config = {};
//! [part5]
InferenceEngine::Core ie;
InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml");
// 'device_name' can be "AUTO:CPU,GPU" to configure the auto-device to use CPU and GPU
InferenceEngine::ExecutableNetwork exeNetwork = ie.LoadNetwork(network, device_name, full_config);
// new metric allows to query the optimization capabilities
std::vector<std::string> device_cap = exeNetwork.GetMetric(METRIC_KEY(OPTIMIZATION_CAPABILITIES));
//! [part5]
return 0;
}