diff --git a/docs/OV_Runtime_UG/auto_device_selection.md b/docs/OV_Runtime_UG/auto_device_selection.md index e127f7e31b8..c3a667a65bc 100644 --- a/docs/OV_Runtime_UG/auto_device_selection.md +++ b/docs/OV_Runtime_UG/auto_device_selection.md @@ -1,332 +1,410 @@ # Automatic device selection {#openvino_docs_IE_DG_supported_plugins_AUTO} -## Auto-Device Plugin Execution (C++) - @sphinxdirective -.. raw:: html -
C++
-@endsphinxdirective +.. toctree:: + :maxdepth: 1 + :hidden: -The AUTO device is a new, special "virtual" or "proxy" device in the OpenVINO™ toolkit. - -Use "AUTO" as the device name to delegate selection of an actual accelerator to OpenVINO. The Auto-device plugin internally recognizes and selects devices from among CPU, integrated GPU and discrete Intel GPUs (when available) depending on the device capabilities and the characteristics of CNN models (for example, precision). Then the Auto-device assigns inference requests to the selected device. - -From the application's point of view, this is just another device that handles all accelerators in the full system. - -With the 2021.4 release, Auto-device setup is done in three major steps: -1. Configure each device as usual (for example, via the conventional `SetConfig()` method) -2. Load a network to the Auto-device plugin. This is the only change needed in your application. -3. As with any other executable network resulting from `LoadNetwork()`, create as many requests as needed to saturate the devices. - -These steps are covered below in detail. - -### Defining and Configuring the Auto-Device Plugin -Following the OpenVINO convention for devices names, the Auto-device uses the label "AUTO". The only configuration option for Auto-device is a limited device list: - -| Parameter name | Parameter values | Default | Description | -| :--- | :--- | :--- |:-----------------------------------------------------------------------------| -| "MULTI_DEVICE_PRIORITIES" | comma-separated device names with no spaces| N/A | Device candidate list to be selected | - -You can use the configuration name directly as a string or use `InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES` from `multi-device/multi_device_config.hpp`, which defines the same string. - -There are two ways to use Auto-device: -1. Directly indicate device by "AUTO" or an empty string: -@snippet snippets/AUTO0.cpp part0 - -2. Use the Auto-device configuration: -@snippet snippets/AUTO1.cpp part1 - -Both methods allow limiting the list of device candidates for the AUTO plugin. - -> **NOTE**: The OpenVINO Runtime lets you use "GPU" as an alias for "GPU.0" in function calls. - -The Auto-device plugin supports query device optimization capabilities in metric. - -| Parameter name | Parameter values | -| :--- | :--- | -| "OPTIMIZATION_CAPABILITIES" | Auto-Device capabilities | - -### Enumerating Devices and Selection Logic - -The OpenVINO Runtime API now features a dedicated methods to enumerate devices and their capabilities. -See [Hello Query Device C++ Sample](../../samples/cpp/hello_query_device/README.md). -This is the example output from the sample (truncated to device names only): - -```sh -./hello_query_device -Available devices: - Device: CPU -... - Device: GPU.0 -... - Device: GPU.1 -``` - -### Default Auto-Device Selection Logic - -With the 2021.4 release, the Auto-Device selects the most suitable device using the following default logic: - -1. Check if dGPU (discrete), iGPU (integrated) and CPU devices are available -2. Get the precision of the input model, such as FP32 -3. According to the priority of dGPU, iGPU, and CPU (in this order), if the device supports the precision of the input network, select it as the most suitable device - -For example, CPU, dGPU and iGPU can support the following precision and optimization capabilities: - -| Device | OPTIMIZATION_CAPABILITIES | -| :--- | :--- | -| CPU | WINOGRAD FP32 FP16 INT8 BIN | -| dGPU | FP32 BIN BATCHED_BLOB FP16 INT8 | -| iGPU | FP32 BIN BATCHED_BLOB FP16 INT8 | - -* When the application uses the Auto-device to run FP16 IR on a system with CPU, dGPU and iGPU, Auto-device will offload this workload to dGPU. -* When the application uses the Auto-device to run FP16 IR on a system with CPU and iGPU, Auto-device will offload this workload to iGPU. -* When the application uses the Auto-device to run WINOGRAD-enabled IR on a system with CPU, dGPU and iGPU, Auto-device will offload this workload to CPU. - -In cases when loading the network to dGPU or iGPU fails, CPU is the fall-back choice. - -According to the Auto-device selection logic from the previous section, tell the OpenVINO Runtime -to use the most suitable device from available devices as follows: - -@snippet snippets/AUTO2.cpp part2 - -You can also use the Auto-device plugin to choose a device from a limited choice of devices, in this example CPU and GPU: - -@snippet snippets/AUTO3.cpp part3 - -### Configuring the Individual Devices and Creating the Auto-Device on Top - -It is possible to configure each individual device as usual and create the "AUTO" device on top: - -@snippet snippets/AUTO4.cpp part4 - -Alternatively, you can combine all the individual device settings into single config file and load it, allowing the Auto-device plugin to parse and apply it to the right devices. See the code example here: - -@snippet snippets/AUTO5.cpp part5 - -### Using the Auto-Device with OpenVINO Samples and Benchmark App - -Note that every OpenVINO sample or application that supports the "-d" (which stands for "device") command-line option transparently accepts the Auto-device. The Benchmark Application is the best example of the optimal usage of the Auto-device. You do not need to set the number of requests and CPU threads, as the application provides optimal out-of-the-box performance. Below is the example command-line to evaluate AUTO performance with that: - -@sphinxdirective -.. tab:: Package, Docker, open-source installation - - .. code-block:: sh - - ./benchmark_app.py –d AUTO –m - -.. tab:: pip installation - - .. code-block:: sh - - benchmark_app –d AUTO –m + Debugging Auto-Device Plugin @endsphinxdirective +The Auto-Device plugin, or AUTO, is a virtual device which automatically selects the processing unit to use for inference with OpenVINO™. It chooses from a list of available devices defined by the user and aims at finding the most suitable hardware for the given model. The best device is chosen using the following logic: -You can also use the auto-device with limit device choice: +1. Check which supported devices are available. +2. Check the precision of the input model (for detailed information on precisions read more on the [OPTIMIZATION_CAPABILITIES metric](../IE_PLUGIN_DG/Plugin.md)) +3. From the priority list, select the first device capable of supporting the given precision. +4. If the network’s precision is FP32 but there is no device capable of supporting it, offload the network to a device supporting FP16. @sphinxdirective -.. tab:: Package, Docker, open-source installation ++----------+-------------------------------------------------+-------------------------------------+ +| Choice | | Supported | | Supported | +| Priority | | Device | | model precision | ++==========+=================================================+=====================================+ +| 1 | | dGPU | FP32, FP16, INT8, BIN | +| | | (e.g. Intel® Iris® Xe MAX) | | ++----------+-------------------------------------------------+-------------------------------------+ +| 2 | | VPUX | INT8 | +| | | (e.g. Intel® Movidius® VPU 3700VE) | | ++----------+-------------------------------------------------+-------------------------------------+ +| 3 | | iGPU | FP32, FP16, BIN, | +| | | (e.g. Intel® UHD Graphics 620 (iGPU)) | | ++----------+-------------------------------------------------+-------------------------------------+ +| 4 | | Intel® Neural Compute Stick 2 (Intel® NCS2) | FP16 | +| | | | ++----------+-------------------------------------------------+-------------------------------------+ +| 5 | | Intel® CPU | FP32, FP16, INT8, BIN | +| | | (e.g. Intel® Core™ i7-1165G7) | | ++----------+-------------------------------------------------+-------------------------------------+ +@endsphinxdirective - .. code-block:: sh +To put it simply, when loading the network to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds. For example: +If you have dGPU in your system, it will be selected for most jobs (first on the priority list and supports multiple precisions). But if you want to run a WINOGRAD-enabled IR, your CPU will be selected (WINOGRAD optimization is not supported by dGPU). If you have Myriad and IA CPU in your system, Myriad will be selected for FP16 models, but IA CPU will be chosen for FP32 ones. - ./benchmark_app.py –d AUTO:CPU,GPU –m +What is important, **AUTO always starts inference with the CPU**. CPU provides very low latency and can start inference with no additional delays. While it performs inference, the Auto-Device plugin continues to load the model to the device best suited for the purpose and transfers the task to it when ready. This way, the devices which are much slower in loading the network, GPU being the best example, do not impede inference at its initial stages. -.. tab:: pip installation +This mechanism can be easily observed in our Benchmark Application sample ([see here](#Benchmark App Info)), showing how the first-inference latency (the time it takes to load the network and perform the first inference) is reduced when using AUTO. For example: - .. code-block:: sh +@sphinxdirective +.. code-block:: sh - benchmark_app –d AUTO:CPU,GPU –m + ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d GPU -niter 128 +@endsphinxdirective + +first-inference latency: **2594.29 ms + 9.21 ms** + +@sphinxdirective +.. code-block:: sh + + ./benchmark_app -m ../public/alexnet/FP32/alexnet.xml -d AUTO:CPU,GPU -niter 128 +@endsphinxdirective + +first-inference latency: **173.13 ms + 13.20 ms** + +@sphinxdirective +.. note:: + The realtime performance will be closer to the best suited device the longer the process runs. +@endsphinxdirective + +## Using the Auto-Device Plugin + +Inference with AUTO is configured similarly to other plugins: first you configure devices, then load a network to the plugin, and finally, execute inference. + +Following the OpenVINO™ naming convention, the Auto-Device plugin is assigned the label of “AUTO.” It may be defined with no additional parameters, resulting in defaults being used, or configured further with the following setup options: + +@sphinxdirective ++-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +| Property | Property values | Description | ++=========================+===============================================+===========================================================+ +| | | AUTO: | | Lists the devices available for selection. | +| | | comma-separated, no spaces | | The device sequence will be taken as priority | +| | | | | from high to low. | +| | | | | If not specified, “AUTO” will be used as default | +| | | | | and all devices will be included. | ++-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +| ov::device:priorities | | device names | | Specifies the devices for Auto-Device plugin to select. | +| | | comma-separated, no spaces | | The device sequence will be taken as priority | +| | | | | from high to low. | +| | | | | This configuration is optional. | ++-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +| ov::hint | | THROUGHPUT | | Specifies the performance mode preferred | +| | | LATENCY | | by the application. | ++-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +| ov::hint:model_priority | | MODEL_PRIORITY_HIGH | | Indicates the priority for a network. | +| | | MODEL_PRIORITY_MED | | Importantly! | +| | | MODEL_PRIORITY_LOW | | This property is still not fully supported | ++-------------------------+-----------------------------------------------+-----------------------------------------------------------+ +@endsphinxdirective + +@sphinxdirective +.. dropdown:: Click for information on Legacy APIs + + For legacy APIs like LoadNetwork/SetConfig/GetConfig/GetMetric: + + - replace {ov::device:priorities, "GPU,CPU"} with {"MULTI_DEVICE_PRIORITIES", "GPU,CPU"} + - replace {ov::hint:model_priority, "LOW"} with {"MODEL_PRIORITY", "LOW"} + - InferenceEngine::MultiDeviceConfigParams::KEY_MULTI_DEVICE_PRIORITIES is defined as same string "MULTI_DEVICE_PRIORITIES" + - CommonTestUtils::DEVICE_GPU + std::string(",") + CommonTestUtils::DEVICE_CPU is equal to "GPU,CPU" + - InferenceEngine::PluginConfigParams::KEY_MODEL_PRIORITY is defined as same string "MODEL_PRIORITY" + - InferenceEngine::PluginConfigParams::MODEL_PRIORITY_LOW is defined as same string "LOW" +@endsphinxdirective + +### Device candidate list +The device candidate list allows users to customize the priority and limit the choice of devices available to the AUTO plugin. If not specified, the plugin assumes all the devices present in the system can be used. Note, that OpenVINO™ Runtime lets you use “GPU” as an alias for “GPU.0” in function calls. +The following commands are accepted by the API: + +@sphinxdirective +.. tab:: C++ API + + .. code-block:: cpp + + /*** With Inference Engine 2.0 API ***/ + ov::Core core; + + // Read a network in IR, PaddlePaddle, or ONNX format: + std::shared_ptr model = core.read_model("sample.xml"); + + // Load a network to AUTO using the default list of device candidates. + // The following lines are equivalent: + ov::CompiledModel model0 = core.compile_model(model); + ov::CompiledModel model1 = core.compile_model(model, "AUTO"); + ov::CompiledModel model2 = core.compile_model(model, "AUTO", {}); + + // You can also specify the devices to be used by AUTO in its selection process. + // The following lines are equivalent: + ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU"); + ov::CompiledModel model4 = core.compile_model(model, "AUTO", {{ov::device::priorities.name(), "GPU,CPU"}}); + + // the AUTO plugin is pre-configured (globally) with the explicit option: + core.set_property("AUTO", ov::device::priorities("GPU,CPU")); + +.. tab:: C++ legacy API + + .. code-block:: cpp + + /*** With API Prior to 2022.1 Release ***/ + InferenceEngine::Core ie; + + // Read a network in IR, PaddlePaddle, or ONNX format: + InferenceEngine::CNNNetwork network = ie.ReadNetwork("sample.xml"); + + // Load a network to AUTO using the default list of device candidates. + // The following lines are equivalent: + InferenceEngine::ExecutableNetwork exec0 = ie.LoadNetwork(network); + InferenceEngine::ExecutableNetwork exec1 = ie.LoadNetwork(network, "AUTO"); + InferenceEngine::ExecutableNetwork exec2 = ie.LoadNetwork(network, "AUTO", {}); + + // You can also specify the devices to be used by AUTO in its selection process. + // The following lines are equivalent: + InferenceEngine::ExecutableNetwork exec3 = ie.LoadNetwork(network, "AUTO:GPU,CPU"); + InferenceEngine::ExecutableNetwork exec4 = ie.LoadNetwork(network, "AUTO", {{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}); + + // the AUTO plugin is pre-configured (globally) with the explicit option: + ie.SetConfig({{"MULTI_DEVICE_PRIORITIES", "GPU,CPU"}}, "AUTO"); + +.. tab:: Python API + + .. code-block:: python + + ### New IE 2.0 API ### + + from openvino.runtime import Core + core = Core() + + # Read a network in IR, PaddlePaddle, or ONNX format: + model = core.read_model(model_path) + + # Load a network to AUTO using the default list of device candidates. + # The following lines are equivalent: + model = core.compile_model(model=model) + compiled_model = core.compile_model(model=model, device_name="AUTO") + compiled_model = core.compile_model(model=model, device_name="AUTO", config={}) + + # You can also specify the devices to be used by AUTO in its selection process. + # The following lines are equivalent: + compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU") + compiled_model = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"}) + + # the AUTO plugin is pre-configured (globally) with the explicit option: + core.set_config(config={"MULTI_DEVICE_PRIORITIES":"CPU,GPU"}, device_name="AUTO") + +.. tab:: Python legacy API + + .. code-block:: python + + ### API before 2022.1 ### + from openvino.inference_engine import IECore + ie = IECore() + + # Read a network in IR, PaddlePaddle, or ONNX format: + net = ie.read_network(model=path_to_model) + + # Load a network to AUTO using the default list of device candidates. + # The following lines are equivalent: + exec_net = ie.load_network(network=net) + exec_net = ie.load_network(network=net, device_name="AUTO") + exec_net = ie.load_network(network=net, device_name="AUTO", config={}) + + # You can also specify the devices to be used by AUTO in its selection process. + # The following lines are equivalent: + exec_net = ie.load_network(network=net, device_name="AUTO:CPU,GPU") + exec_net = ie.load_network(network=net, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "CPU,GPU"}) + + # the AUTO plugin is pre-configured (globally) with the explicit option: + ie.SetConfig(config={"MULTI_DEVICE_PRIORITIES", "CPU,GPU"}, device_name="AUTO"); @endsphinxdirective -**NOTES:** -* The default CPU stream is 1 if using `-d AUTO`. -* You can use the FP16 IR to work with Auto-device. -* No demos are fully optimized for Auto-device yet to select the most suitable device, -use GPU streams/throttling, and so on. - -## Auto-Device Plugin Execution (Python) +To check what devices are present in the system, you can use Device API: +For C++ API @sphinxdirective -.. raw:: html +.. code-block:: sh -
Python
+ ov::runtime::Core::get_available_devices() (see Hello Query Device C++ Sample) @endsphinxdirective -The AUTO device is a new, special "virtual" or "proxy" device in the OpenVINO™ toolkit. - -Use "AUTO" as the device name to delegate selection of an actual accelerator to OpenVINO. The Auto-device plugin internally recognizes and selects devices from among CPU, integrated GPU and discrete Intel GPUs (when available) depending on the device capabilities and the characteristics of CNN models (for example, precision). Then the Auto-device assigns inference requests to the selected device. - -From the application's point of view, this is just another device that handles all accelerators in the full system. - -With the 2021.4 release, Auto-device setup is done in three major steps: - -1. Configure each device as usual (for example, via the conventional [IECore.set_config](https://docs.openvino.ai/latest/ie_python_api/classie__api_1_1IECore.html#a2c738cee90fca27146e629825c039a05) method). -2. Load a network to the Auto-device plugin. This is the only change needed in your application. -3. As with any other executable network resulting from [IECore.load_network](https://docs.openvino.ai/latest/ie_python_api/classie__api_1_1IECore.html#ac9a2e043d14ccfa9c6bbf626cfd69fcc), create as many requests as needed to saturate the devices. - -These steps are covered below in detail. - -### Defining and Configuring the Auto-Device Plugin -Following the OpenVINO convention for devices names, the Auto-device uses the label "AUTO". The only configuration option for Auto-device is a limited device list: - -| Parameter name | Parameter values | Default | Description | -| -------------- | ---------------- | ------- | ----------- | -| "AUTO_DEVICE_LIST" | comma-separated device names with no spaces | N/A | Device candidate list to be selected - -There are two ways to use the Auto-device plugin: - -1. Directly indicate device by "AUTO" or an empty string. -2. Use the Auto-device configuration - -Both methods allow limiting the list of device candidates for the AUTO plugin. - -```python -from openvino.inference_engine import IECore - -ie = IECore() -# Read a network in IR or ONNX format -net = ie.read_network(model=path_to_model) - -# Load a network on the "AUTO" device -exec_net = ie.load_network(network=net, device_name="AUTO") - -# Optionally specify the list of device candidates for the AUTO plugin -# The following two lines are equivalent -exec_net = ie.load_network(network=net, device_name="AUTO:CPU,GPU") -exec_net = ie.load_network(network=net, device_name="AUTO", - config={"AUTO_DEVICE_LIST": "CPU,GPU"}) -``` - -The Auto-device plugin supports query device optimization capabilities in metric. - -| Parameter name | Parameter values | -| --- | --- | -| "OPTIMIZATION_CAPABILITIES" | Auto-Device capabilities | - -### Enumerating Devices and Selection Logic - -The OpenVINO Runtime API now features a dedicated methods to enumerate devices and their capabilities. See the [Hello Query Device Python Sample](../../samples/python/hello_query_device/README.md) for code. - -This is the example output from the sample (truncated to device names only): - -```python -./hello_query_device - -Available devices: - Device: CPU -... - Device: GPU.0 -... - Device: GPU.1 -``` - -### Default Auto-Device Selection Logic - -With the 2021.4 release, the Auto-Device selects the most suitable device using the following default logic: - -1. Check if dGPU (discrete), iGPU (integrated) and CPU devices are available -2. Get the precision of the input model, such as FP32 -3. According to the priority of dGPU, iGPU, and CPU (in this order), if the device supports the precision of the input network, select it as the most suitable device - -For example, CPU, dGPU and iGPU can support the following precision and optimization capabilities: - -| Device | OPTIMIZATION_CAPABILITIES | -| --- | --- | -| CPU | WINOGRAD FP32 FP16 INT8 BIN | -| dGPU | FP32 BIN BATCHED_BLOB FP16 INT8 | -| iGPU | FP32 BIN BATCHED_BLOB FP16 INT8 | - -* When the application uses the Auto-device to run FP16 IR on a system with CPU, dGPU and iGPU, Auto-device will offload this workload to dGPU. -* When the application uses the Auto-device to run FP16 IR on a system with CPU and iGPU, Auto-device will offload this workload to iGPU. -* When the application uses the Auto-device to run WINOGRAD-enabled IR on a system with CPU, dGPU and iGPU, Auto-device will offload this workload to CPU. - -In cases when loading the network to dGPU or iGPU fails, CPU is the fall-back choice. - -To show the capabilities for a specific device, query the OPTIMIZATION_CAPABILITIES metric: - - -```python -from openvino.inference_engine import IECore - -ie = IECore() -ie.get_metric(device_name=device, - metric_name="OPTIMIZATION_CAPABILITIES") -``` - -### Configuring the Individual Devices and Creating the Auto-Device on Top - -It is possible to configure each individual device as usual and create the "AUTO" device on top: - -```python -from openvino.inference_engine import IECore - -ie = IECore() -net = ie.read_network(model=path_to_model) - -cpu_config = {} -gpu_config = {} - -ie.set_config(config=cpu_config, device_name="CPU") -ie.set_config(config=gpu_config, device_name="GPU") - -# Load the network to the AUTO device -exec_net = ie.load_network(network=net, device_name="AUTO") -``` - -Alternatively, you can combine all the individual device settings into single config file and load it, allowing the Auto-device plugin to parse and apply it to the right devices. See the code example here: - -```python -from openvino.inference_engine import IECore - -# Init the Inference Engine Core -ie = IECore() - -# Read a network in IR or ONNX format -net = ie.read_network(model=path_to_model) - -full_config = {} - -# Load the network to the AUTO device -exec_net = ie.load_network(network=net, device_name="AUTO", config=full_config) -``` - -### Using the Auto-Device with OpenVINO Samples and Benchmark App - -Note that every OpenVINO sample or application that supports the "-d" (which stands for "device") command-line option transparently accepts the Auto-device. The Benchmark Application is the best example of the optimal usage of the Auto-device. You do not need to set the number of requests and CPU threads, as the application provides optimal out-of-the-box performance. Below is the example command-line to evaluate AUTO performance with that: - +For Python API @sphinxdirective -.. tab:: Package, Docker, open-source installation - - .. code-block:: sh - - ./benchmark_app.py –d AUTO –m - -.. tab:: pip installation - - .. code-block:: sh - - benchmark_app –d AUTO –m +.. code-block:: sh + openvino.runtime.Core.available_devices (see Hello Query Device Python Sample) @endsphinxdirective -You can also use the auto-device with limit device choice: +### Performance Hints +The `ov::hint` property enables you to specify a performance mode for the plugin to be more efficient for particular use cases. + +#### ov::hint::PerformanceMode::THROUGHPUT +This mode prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, like inference of video feeds or large numbers of images. + +#### ov::hint::PerformanceMode::LATENCY +This mode prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, like a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. +Note that currently the `ov::hint` property is supported by CPU and GPU devices only. + +To enable Performance Hints for your application, use the following code: @sphinxdirective -.. tab:: Package, Docker, open-source installation +.. tab:: C++ API - .. code-block:: sh + .. code-block:: cpp - ./benchmark_app.py –d AUTO:CPU,GPU –m + ov::Core core; -.. tab:: pip installation + // Read a network in IR, PaddlePaddle, or ONNX format: + std::shared_ptr model = core.read_model("sample.xml"); + + // Load a network to AUTO with Performance Hints enabled: + // To use the “throughput” mode: + ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "THROUGHPUT"}}); + + // or the “latency” mode: + ov::CompiledModel compiledModel1 = core.compile_model(model, "AUTO:GPU,CPU", {{ov::hint::performance_mode.name(), "LATENCY"}}); + +.. tab:: Python API - .. code-block:: sh - - benchmark_app –d AUTO:CPU,GPU –m + .. code-block:: python + from openvino.runtime import Core + + core = Core() + + # Read a network in IR, PaddlePaddle, or ONNX format: + model = core.read_model(model_path) + + # Load a network to AUTO with Performance Hints enabled: + # To use the “throughput” mode: + compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"THROUGHPUT"}) + + # or the “latency” mode: + compiled_model = core.compile_model(model=model, device_name="AUTO:CPU,GPU", config={"PERFORMANCE_HINT":"LATENCY"}) @endsphinxdirective -> **NOTE**: If you installed OpenVINO with pip, use `benchmark_app -d AUTO:CPU,GPU -m ` +### ov::hint::model_priority +The property enables you to control the priorities of networks in the Auto-Device plugin. A high-priority network will be loaded to a supported high-priority device. A lower-priority network will not be loaded to a device that is occupied by a higher-priority network. + +@sphinxdirective +.. tab:: C++ API + + .. code-block:: cpp + + // Example 1 + // Compile and load networks: + ov::CompiledModel compiled_model0 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "HIGH"}}); + ov::CompiledModel compiled_model1 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}}); + ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}}); + + /************ + Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks. + Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model2 will use CPU. + ************/ + + // Example 2 + // Compile and load networks: + ov::CompiledModel compiled_model3 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}}); + ov::CompiledModel compiled_model4 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "MEDIUM"}}); + ov::CompiledModel compiled_model5 = core.compile_model(model, "AUTO:GPU,MYRIAD,CPU", {{ov::hint::model_priority.name(), "LOW"}}); + + /************ + Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks. + Result: compiled_model3 will use GPU, compiled_model4 will use GPU, compiled_model5 will use MYRIAD. + ************/ + +.. tab:: Python API + + .. code-block:: python + + # Example 1 + # Compile and load networks: + compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"0"}) + compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"}) + compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"}) + + # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks. + # Result: compiled_model0 will use GPU, compiled_model1 will use MYRIAD, compiled_model3 will use CPU. + + # Example 2 + # Compile and load networks: + compiled_model0 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"}) + compiled_model1 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"1"}) + compiled_model2 = core.compile_model(model=model, device_name="AUTO:CPU,GPU,MYRIAD", config={"AUTO_NETWORK_PRIORITY":"2"}) + + # Assume that all the devices (CPU, GPU, and MYRIAD) can support all the networks. + # Result: compiled_model0 will use GPU, compiled_model1 will use GPU, compiled_model3 will use MYRIAD. +@endsphinxdirective + +## Configuring Individual Devices and Creating the Auto-Device plugin on Top +Although the methods described above are currently the preferred way to execute inference with AUTO, the following steps can be also used as an alternative. It is currently available as a legacy feature and used if the device candidate list includes VPUX or Myriad (devices uncapable of utilizing the Performance Hints option). + +@sphinxdirective +.. tab:: C++ API + + .. code-block:: cpp + + ovCore core; + + // Read a network in IR, PaddlePaddle, or ONNX format + stdshared_ptrovModel model = core.read_model(sample.xml); + + // Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin + set VPU config + core.set_property(VPUX, {}); + + // set MYRIAD config + core.set_property(MYRIAD, {}); + ovCompiledModel compiled_model = core.compile_model(model, AUTO); + +.. tab:: Python API + + .. code-block:: python + + from openvino.runtime import Core + + core = Core() + + # Read a network in IR, PaddlePaddle, or ONNX format: + model = core.read_model(model_path) + + # Configure the VPUX and the Myriad devices separately and load the network to the Auto-Device plugin: + core.set_config(config=vpux_config, device_name="VPUX") + core.set_config (config=vpux_config, device_name="MYRIAD") + compiled_model = core.compile_model(model=model) + + # Alternatively, you can combine the individual device settings into one configuration and load the network. + # The AUTO plugin will parse and apply the settings to the right devices. + # The 'device_name' of "AUTO:VPUX,MYRIAD" will configure auto-device to use devices. + compiled_model = core.compile_model(model=model, device_name=device_name, config=full_config) + + # To query the optimization capabilities: + device_cap = core.get_metric("CPU", "OPTIMIZATION_CAPABILITIES") +@endsphinxdirective + + +## Using AUTO with OpenVINO™ Samples and the Benchmark App +To see how the Auto-Device plugin is used in practice and test its performance, take a look at OpenVINO™ samples. All samples supporting the "-d" command-line option (which stands for "device") will accept the plugin out-of-the-box. The Benchmark Application will be a perfect place to start – it presents the optimal performance of the plugin without the need for additional settings, like the number of requests or CPU threads. To evaluate the AUTO performance, you can use the following commands: + +For unlimited device choice: +@sphinxdirective +.. code-block:: sh + + ./benchmark_app –d AUTO –m -i -niter 1000 +@endsphinxdirective + +For limited device choice: +@sphinxdirective +.. code-block:: sh + + ./benchmark_app –d AUTO:CPU,GPU,MYRIAD –m -i -niter 1000 +@endsphinxdirective + +For more information, refer to the [C++](../../samples/cpp/benchmark_app/README.md) or [Python](../../tools/benchmark_tool/README.md) version instructions. + +@sphinxdirective +.. note:: + + The default CPU stream is 1 if using “-d AUTO”. + + You can use the FP16 IR to work with auto-device. + + No demos are yet fully optimized for AUTO, by means of selecting the most suitable device, using the GPU streams/throttling, and so on. +@endsphinxdirective diff --git a/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md b/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md new file mode 100644 index 00000000000..b0c38920e55 --- /dev/null +++ b/docs/OV_Runtime_UG/supported_plugins/AutoPlugin_Debugging.md @@ -0,0 +1,136 @@ +# Debugging Auto-Device Plugin {#openvino_docs_IE_DG_supported_plugins_AUTO_debugging} + +## Using Debug Log +In case of execution problems, just like all other plugins, Auto-Device provides the user with information on exceptions and error values. If the returned data is not enough for debugging purposes, more information may be acquired by means of `ov::log::Level`. + +There are six levels of logs, which can be called explicitly or set via the `OPENVINO_LOG_LEVEL` environment variable (can be overwritten by `compile_model()` or `set_property()`): + +0 - ov::log::Level::NO +1 - ov::log::Level::ERR +2 - ov::log::Level::WARNING +3 - ov::log::Level::INFO +4 - ov::log::Level::DEBUG +5 - ov::log::Level::TRACE + +@sphinxdirective +.. tab:: C++ API + + .. code-block:: cpp + + ov::Core core; + + // read a network in IR, PaddlePaddle, or ONNX format + std::shared_ptr model = core.read_model("sample.xml"); + + // load a network to AUTO and set log level to debug + ov::CompiledModel compiled_model = core.compile_model(model, "AUTO", {{ov::log::level.name(), "LOG_DEBUG"}}); + + // or set log level with set_config and load network + core.set_property("AUTO", {{ov::log::level.name(), "LOG_DEBUG"}}); + ov::CompiledModel compiled_model2 = core.compile_model(model, "AUTO"); + +.. tab:: Python API + + .. code-block:: python + + from openvino.runtime import Core + core = Core() + + # read a network in IR, PaddlePaddle, or ONNX format + model = core.read_model(model_path) + + # load a network to AUTO and set log level to debug + compiled_model = core.compile_model(model=model, device_name="AUTO", config={"LOG_LEVEL":"LOG_DEBUG"}); + + // or set log level with set_config and load network + ie.SetConfig(config={"LOG_LEVEL":"LOG_DEBUG"}, device_name="AUTO"); + compiled_model = core.compile_model(model=model, device_name="AUTO"); + +.. tab:: OS environment variable + + .. code-block:: sh + + When defining it via the variable, + a number needs to be used instead of a log level name, e.g.: + + Linux + export OPENVINO_LOG_LEVEL=0 + + Windows + set OPENVINO_LOG_LEVEL=0 +@endsphinxdirective + +The property returns information in the following format: + +@sphinxdirective +.. code-block:: sh + + [time]LOG_LEVEL[file] [PLUGIN]: message +@endsphinxdirective + +in which the `LOG_LEVEL` is represented by the first letter of its name (ERROR being an exception and using its full name). For example: + +@sphinxdirective +.. code-block:: sh + + [17:09:36.6188]D[plugin.cpp:167] deviceName:MYRIAD, defaultDeviceID:, uniqueName:MYRIAD_ + [17:09:36.6242]I[executable_network.cpp:181] [AUTOPLUGIN]:select device:MYRIAD + [17:09:36.6809]ERROR[executable_network.cpp:384] [AUTOPLUGIN] load failed, MYRIAD:[ GENERAL_ERROR ] +@endsphinxdirective + + +## Instrumentation and Tracing Technology + +All major performance calls of both OpenVINO™ Runtime and the AUTO plugin are instrumented with Instrumentation and Tracing Technology (ITT) APIs. To enable ITT in OpenVINO™ Runtime, compile it with the following option: +@sphinxdirective +.. code-block:: sh + + -DENABLE_PROFILING_ITT=ON +@endsphinxdirective + +For more information, you can refer to: +* [OpenVINO profiling](https://docs.openvino.ai/latest/groupie_dev_profiling.html) +* [Intel® VTune™ Profiler User Guide](https://www.intel.com/content/www/us/en/develop/documentation/vtune-help/top/api-support/instrumentation-and-tracing-technology-apis.html) + +### Analyze Code Performance on Linux + +You can analyze code performance using Intel® VTune™ Profiler. For more information and installation instructions refer to the [installation guide (PDF)](https://software.intel.com/content/www/us/en/develop/download/intel-vtune-install-guide-linux-os.html) +With Intel® VTune™ Profiler installed you can configure your analysis with the following steps: + +1. Open Intel® VTune™ Profiler GUI on the host machine with the following command: +@sphinxdirective + +.. code-block:: sh + + cd /vtune install dir/intel/oneapi/vtune/2021.6.0/env + source vars.sh + vtune-gui +@endsphinxdirective + +2. select **Configure Analysis** +3. In the **where** pane, select **Local Host** +@sphinxdirective +.. image:: _static/images/IE_DG_supported_plugins_AUTO_debugging-img01-localhost.png + :align: center +@endsphinxdirective +4. In the **what** pane, specify your target application/script on the local system. +@sphinxdirective +.. image:: _static/images/IE_DG_supported_plugins_AUTO_debugging-img02-launch.png + :align: center +@endsphinxdirective +5. In the **how** pane, choose and configure the analysis type you want to perform, for example, **Hotspots Analysis**: +identify the most time-consuming functions and drill down to see time spent on each line of source code. Focus optimization efforts on hot code for the greatest performance impact. +@sphinxdirective +.. image:: _static/images/IE_DG_supported_plugins_AUTO_debugging-img03-hotspots.png + :align: center +@endsphinxdirective +6. Start the analysis by clicking the start button. When it is done, you will get a summary of the run, including top hotspots and top tasks in your application: +@sphinxdirective +.. image:: _static/images/IE_DG_supported_plugins_AUTO_debugging-img04-vtunesummary.png + :align: center +@endsphinxdirective +7. To analyze ITT info related to the Auto plugin, click on the **Bottom-up** tab, choose the **Task Domain/Task Type/Function/Call Stack** from the dropdown list - Auto plugin-related ITT info is under the MULTIPlugin task domain: +@sphinxdirective +.. image:: _static/images/IE_DG_supported_plugins_AUTO_debugging-img05-vtunebottomup.png + :align: center +@endsphinxdirective diff --git a/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img01-localhost.png b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img01-localhost.png new file mode 100644 index 00000000000..c5b186c02e7 --- /dev/null +++ b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img01-localhost.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:36f4b9e0714e819b0c98a30f3c08d6ce1f9206906be42e80cb1fa746e6354ad6 +size 25333 diff --git a/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img02-launch.png b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img02-launch.png new file mode 100644 index 00000000000..b00d9f0dcb8 --- /dev/null +++ b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img02-launch.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a5a0022bda018ae7e5261bbb9f5e8cc28374254c272dd3cbc2ab2f872381e2c5 +size 21106 diff --git a/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img03-hotspots.png b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img03-hotspots.png new file mode 100644 index 00000000000..dc1f7d7c0b1 --- /dev/null +++ b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img03-hotspots.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:0319b56afe702fb09957f4d3c996155be79efc672edc0e780d34441eaa660b2c +size 43521 diff --git a/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img04-vtunesummary.png b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img04-vtunesummary.png new file mode 100644 index 00000000000..9769b6eb0e6 --- /dev/null +++ b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img04-vtunesummary.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:813e629fe674b676c8484a92f94d55ffc13cacc1e077fa19a2c82cd528819d72 +size 256217 diff --git a/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img05-vtunebottomup.png b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img05-vtunebottomup.png new file mode 100644 index 00000000000..feecb907c87 --- /dev/null +++ b/docs/_static/images/IE_DG_supported_plugins_AUTO_debugging-img05-vtunebottomup.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:62fb4b08191499cfe765f3777dd7ae543739232cfd8861e21f976ba09aa9797b +size 176560