Docs: Update the doc on execution devices property and enable_startup_fallback property (#14750)
* Docs: Update the doc on default hint and execution devices property (#14836) * Docs: Update to LATENCY as default hint * Docs: Update the doc on execution devices property * Update auto_device_selection.md Co-authored-by: Yuan Xu <yuan1.xu@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Yuan Xu <yuan1.xu@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Yuan Xu <yuan1.xu@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md Co-authored-by: Yuan Xu <yuan1.xu@intel.com> * Update docs/OV_Runtime_UG/auto_device_selection.md * Update docs/OV_Runtime_UG/auto_device_selection.md --------- Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
This commit is contained in:
parent
a5ec5f5476
commit
9e3b3e0566
@ -44,12 +44,12 @@ The logic behind the choice is as follows:
|
||||
@endsphinxdirective
|
||||
|
||||
To put it simply, when loading the model to the first device on the list fails, AUTO will try to load it to the next device in line, until one of them succeeds.
|
||||
What is important, **AUTO always starts inference with the CPU of the system**, as it provides very low latency and can start inference with no additional delays.
|
||||
What is important, **AUTO starts inference with the CPU of the system by default**, as it provides very low latency and can start inference with no additional delays.
|
||||
While the CPU is performing inference, AUTO continues to load the model to the device best suited for the purpose and transfers the task to it when ready.
|
||||
This way, the devices which are much slower in compiling models, GPU being the best example, do not impede inference at its initial stages.
|
||||
For example, if you use a CPU and a GPU, the first-inference latency of AUTO will be better than that of using GPU alone.
|
||||
|
||||
Note that if you choose to exclude CPU from the priority list, it will be unable to support the initial model compilation stage.
|
||||
Note that if you choose to exclude CPU from the priority list or disable the initial CPU acceleration feature via `ov::intel_auto::enable_startup_fallback`, it will be unable to support the initial model compilation stage.
|
||||
|
||||

|
||||
|
||||
@ -76,41 +76,56 @@ Following the OpenVINO™ naming convention, the Automatic Device Selection mode
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
+--------------------------------+----------------------------------------------------------------------+
|
||||
| | Property | | Values and Description |
|
||||
+================================+======================================================================+
|
||||
| | <device candidate list> | | **Values**: |
|
||||
| | | | empty |
|
||||
| | | | `AUTO` |
|
||||
| | | | `AUTO: <device names>` (comma-separated, no spaces) |
|
||||
| | | | |
|
||||
| | | | Lists the devices available for selection. |
|
||||
| | | | The device sequence will be taken as priority from high to low. |
|
||||
| | | | If not specified, `AUTO` will be used as default, |
|
||||
| | | | and all devices will be "viewed" as candidates. |
|
||||
+--------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::device:priorities` | | **Values**: |
|
||||
| | | | `<device names>` (comma-separated, no spaces) |
|
||||
| | | | |
|
||||
| | | | Specifies the devices for AUTO to select. |
|
||||
| | | | The device sequence will be taken as priority from high to low. |
|
||||
| | | | This configuration is optional. |
|
||||
+--------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::hint::performance_mode` | | **Values**: |
|
||||
| | | | `ov::hint::PerformanceMode::LATENCY` |
|
||||
| | | | `ov::hint::PerformanceMode::THROUGHPUT` |
|
||||
| | | | `ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT` |
|
||||
| | | | |
|
||||
| | | | Specifies the performance option preferred by the application. |
|
||||
+--------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::hint::model_priority` | | **Values**: |
|
||||
| | | | `ov::hint::Priority::HIGH` |
|
||||
| | | | `ov::hint::Priority::MEDIUM` |
|
||||
| | | | `ov::hint::Priority::LOW` |
|
||||
| | | | |
|
||||
| | | | Indicates the priority for a model. |
|
||||
| | | | IMPORTANT: This property is not fully supported yet. |
|
||||
+--------------------------------+----------------------------------------------------------------------+
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | Property | | Values and Description |
|
||||
+=============================================+======================================================================+
|
||||
| | <device candidate list> | | **Values**: |
|
||||
| | | | empty |
|
||||
| | | | `AUTO` |
|
||||
| | | | `AUTO: <device names>` (comma-separated, no spaces) |
|
||||
| | | | |
|
||||
| | | | Lists the devices available for selection. |
|
||||
| | | | The device sequence will be taken as priority from high to low. |
|
||||
| | | | If not specified, `AUTO` will be used as default, |
|
||||
| | | | and all devices will be "viewed" as candidates. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::device::priorities` | | **Values**: |
|
||||
| | | | `<device names>` (comma-separated, no spaces) |
|
||||
| | | | |
|
||||
| | | | Specifies the devices for AUTO to select. |
|
||||
| | | | The device sequence will be taken as priority from high to low. |
|
||||
| | | | This configuration is optional. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::hint::performance_mode` | | **Values**: |
|
||||
| | | | `ov::hint::PerformanceMode::LATENCY` |
|
||||
| | | | `ov::hint::PerformanceMode::THROUGHPUT` |
|
||||
| | | | `ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT` |
|
||||
| | | | |
|
||||
| | | | Specifies the performance option preferred by the application. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::hint::model_priority` | | **Values**: |
|
||||
| | | | `ov::hint::Priority::HIGH` |
|
||||
| | | | `ov::hint::Priority::MEDIUM` |
|
||||
| | | | `ov::hint::Priority::LOW` |
|
||||
| | | | |
|
||||
| | | | Indicates the priority for a model. |
|
||||
| | | | IMPORTANT: This property is not fully supported yet. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::execution_devices` | | Lists the runtime target devices on which the inferences are being |
|
||||
| | | | executed. |
|
||||
| | | | Examples of returning results could be `(CPU)`(`(CPU)` is a |
|
||||
| | | | temporary device, indicating that CPU is used for acceleration at |
|
||||
| | | | the model compilation stage), `CPU`, `GPU`, `CPU GPU`, `GPU.0`, |
|
||||
| | | | etc. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
| | `ov::intel_auto::enable_startup_fallback` | | **Values**: |
|
||||
| | | | `true` |
|
||||
| | | | `false` |
|
||||
| | | | |
|
||||
| | | | Enables/disables CPU as acceleration (or the helper device) in the |
|
||||
| | | | beginning. The default value is `true`, indicating that CPU is used|
|
||||
| | | | as acceleration by default. |
|
||||
+---------------------------------------------+----------------------------------------------------------------------+
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
@ -122,7 +137,7 @@ The device candidate list enables you to customize the priority and limit the ch
|
||||
- If <device candidate list> is not specified, AUTO assumes all the devices present in the system can be used.
|
||||
- If `AUTO` without any device names is specified, AUTO assumes all the devices present in the system can be used, and will load the network to all devices and run inference based on their default priorities, from high to low.
|
||||
|
||||
To specify the priority of devices, enter the device names in the priority order (from high to low) in `AUTO: <device names>`, or use the `ov::device:priorities` property.
|
||||
To specify the priority of devices, enter the device names in the priority order (from high to low) in `AUTO: <device names>`, or use the `ov::device::priorities` property.
|
||||
|
||||
See the following code for using AUTO and specifying devices:
|
||||
|
||||
@ -192,25 +207,43 @@ AUTO will then query all available devices and remove CPU from the candidate lis
|
||||
|
||||
Note that if you choose to exclude CPU from device candidate list, CPU will not be able to support the initial model compilation stage. See more information in [How AUTO Works](#how-auto-works).
|
||||
|
||||
### Checking Target Runtime Devices
|
||||
|
||||
To query the runtime target devices on which the inferences are being executed using AUTO, you can use the `ov::execution_devices` property. It must be used with `get_property`, for example:
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. tab:: C++
|
||||
|
||||
.. doxygensnippet:: docs/snippets/AUTO7.cpp
|
||||
:language: cpp
|
||||
:fragment: [part7]
|
||||
|
||||
.. tab:: Python
|
||||
|
||||
.. doxygensnippet:: docs/snippets/ov_auto.py
|
||||
:language: python
|
||||
:fragment: [part7]
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
### Performance Hints for AUTO
|
||||
The `ov::hint::performance_mode` property enables you to specify a performance option for AUTO to be more efficient for particular use cases.
|
||||
|
||||
> **NOTE**: Currently, the `ov::hint` property is supported by CPU and GPU devices only.
|
||||
|
||||
#### THROUGHPUT
|
||||
This option prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, such as inference of video feeds or large numbers of images.
|
||||
|
||||
> **NOTE**: If no performance hint is set explicitly, AUTO will set THROUGHPUT for devices that have not set `ov::device::properties`. For example, if you have both a CPU and a GPU in the system, this command `core.compile_model("AUTO", ov::device::properties("CPU", ov::enable_profiling(true)))` will set THROUGHPUT for the GPU only. No hint will be set for the CPU although it's the selected device.
|
||||
The `ov::hint::performance_mode` property enables you to specify a performance option for AUTO to be more efficient for particular use cases. The default hint for AUTO is `LATENCY`.
|
||||
|
||||
#### LATENCY
|
||||
This option prioritizes low latency, providing short response time for each inference job. It performs best for tasks where inference is required for a single input image, e.g. a medical analysis of an ultrasound scan image. It also fits the tasks of real-time or nearly real-time applications, such as an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles.
|
||||
|
||||
> **NOTE**: If no performance hint is set explicitly, AUTO will set LATENCY for devices that have not set `ov::device::properties`, for example, `ov::device::properties(<DEVICE_NAME>, ov::hint::performance_mode(ov::hint::LATENCY))`.
|
||||
|
||||
@sphinxdirective
|
||||
|
||||
.. _cumulative throughput:
|
||||
|
||||
@endsphinxdirective
|
||||
|
||||
#### THROUGHPUT
|
||||
This option prioritizes high throughput, balancing between latency and power. It is best suited for tasks involving multiple jobs, such as inference of video feeds or large numbers of images.
|
||||
|
||||
#### CUMULATIVE_THROUGHPUT
|
||||
While `LATENCY` and `THROUGHPUT` can select one target device with your preferred performance option, the `CUMULATIVE_THROUGHPUT` option enables running inference on multiple devices for higher throughput. With `CUMULATIVE_THROUGHPUT`, AUTO loads the network model to all available devices in the candidate list, and then runs inference on them based on the default or specified priority.
|
||||
|
||||
|
18
docs/snippets/AUTO7.cpp
Normal file
18
docs/snippets/AUTO7.cpp
Normal file
@ -0,0 +1,18 @@
|
||||
#include <openvino/openvino.hpp>
|
||||
|
||||
int auto7() {
|
||||
{
|
||||
//! [part7]
|
||||
ov::Core core;
|
||||
|
||||
// read a network in IR, PaddlePaddle, or ONNX format
|
||||
std::shared_ptr<ov::Model> model = core.read_model("sample.xml");
|
||||
|
||||
// compile a model on AUTO and set log level to debug
|
||||
ov::CompiledModel compiled_model = core.compile_model(model, "AUTO");
|
||||
// query the runtime target devices on which the inferences are being executed
|
||||
ov::Any execution_devices = compiled_model.get_property(ov::execution_devices);
|
||||
//! [part7]
|
||||
}
|
||||
return 0;
|
||||
}
|
@ -108,6 +108,17 @@ def part6():
|
||||
compiled_model = core.compile_model(model=model, device_name="AUTO");
|
||||
#! [part6]
|
||||
|
||||
def part7():
|
||||
#! [part7]
|
||||
core = Core()
|
||||
# read a network in IR, PaddlePaddle, or ONNX format
|
||||
model = core.read_model(model_path)
|
||||
# compile a model on AUTO and set log level to debug
|
||||
compiled_model = core.compile_model(model=model, device_name="AUTO")
|
||||
# query the runtime target devices on which the inferences are being executed
|
||||
execution_devices = compiled_model.get_property("EXECUTION_DEVICES")
|
||||
#! [part7]
|
||||
|
||||
def main():
|
||||
part0()
|
||||
part1()
|
||||
@ -115,6 +126,7 @@ def main():
|
||||
part4()
|
||||
part5()
|
||||
part6()
|
||||
part7()
|
||||
|
||||
if __name__ == '__main__':
|
||||
sys.exit(main())
|
||||
|
Loading…
Reference in New Issue
Block a user