Updated AsyncInferRequest documentation + leftovers (#16420)

This commit is contained in:
Ilya Churaev
2023-03-21 10:52:45 +04:00
committed by GitHub
parent 5cb20f8858
commit 60436dee5a
8 changed files with 33 additions and 37 deletions

View File

@@ -1,49 +1,45 @@
# Asynchronous Inference Request {#openvino_docs_ie_plugin_dg_async_infer_request}
# Asynchronous Inference Request {#openvino_docs_ov_plugin_dg_async_infer_request}
Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure.
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class:
- The class has the `_pipeline` field of `std::vector<std::pair<ITaskExecutor::Ptr, Task> >`, which contains pairs of an executor and executed task.
- The class has the `m_pipeline` field of `std::vector<std::pair<std::shared_ptr<ov::threading::ITaskExecutor>, ov::threading::Task> >`, which contains pairs of an executor and executed task.
- All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
- The class has the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method, which waits for `_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed.
- The class has the ov::IAsyncInferRequest::stop_and_wait method, which waits for `m_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the compiled model instance and are not destroyed.
`AsyncInferRequest` Class
AsyncInferRequest Class
------------------------
OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class for a custom asynchronous inference request implementation:
@snippet src/async_infer_request.hpp async_infer_request:header
#### Class Fields
### Class Fields
- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
- `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion
- `m_wait_executor` - a task executor that waits for a response from a device about device tasks completion
> **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
> **NOTE**: If a plugin can work with several instances of a device, `m_wait_executor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
### `AsyncInferRequest()`
### AsyncInferRequest()
The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `_pipeline`. The example below demonstrates `_pipeline` creation with the following stages:
The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `m_pipeline`. The example below demonstrates `m_pipeline` creation with the following stages:
- `inferPreprocess` is a CPU compute task.
- `startPipeline` is a CPU ligthweight task to submit tasks to a remote device.
- `waitPipeline` is a CPU non-compute task that waits for a response from a remote device.
- `inferPostprocess` is a CPU compute task.
- `infer_preprocess_and_start_pipeline` is a CPU ligthweight task to submit tasks to a remote device.
- `wait_pipeline` is a CPU non-compute task that waits for a response from a remote device.
- `infer_postprocess` is a CPU compute task.
@snippet src/async_infer_request.cpp async_infer_request:ctor
The stages are distributed among two task executors in the following way:
- `inferPreprocess` and `startPipeline` are combined into a single task and run on `_requestExecutor`, which computes CPU tasks.
- `infer_preprocess_and_start_pipeline` prepare input tensors and run on `m_request_executor`, which computes CPU tasks.
- You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one.
- `waitPipeline` is sent to `_waitExecutor`, which works with the device.
- `wait_pipeline` is sent to `m_wait_executor`, which works with the device.
> **NOTE**: `callbackExecutor` is also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of `callbackExecutor` and a callback function set by the user to the end of the pipeline.
> **NOTE**: `m_callback_executor` is also passed to the constructor and it is used in the base ov::IAsyncInferRequest class, which adds a pair of `callback_executor` and a callback function set by the user to the end of the pipeline.
Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) tool.
### ~AsyncInferRequest()
### `~AsyncInferRequest()`
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the ov::IAsyncInferRequest::stop_and_wait method of the base class.
@snippet src/async_infer_request.cpp async_infer_request:dtor

View File

@@ -54,7 +54,7 @@ The method creates an synchronous inference request and returns it.
While the public OpenVINO API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
- [Synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `infer` method.
- [Asynchronous inference request](@ref openvino_docs_ie_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
- [Asynchronous inference request](@ref openvino_docs_ov_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
- For single-stage pipelines, there is no need to define this method and create a class derived from ov::IAsyncInferRequest. For single stage pipelines, a default implementation of this method creates ov::IAsyncInferRequest wrapping a synchronous inference request and runs it asynchronously in the `m_request_executor` executor.
- For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
> **IMPORTANT**: It is up to you to decide how many task executors you need to optimally execute a device pipeline.

View File

@@ -2,7 +2,7 @@
`InferRequest` class functionality:
- Allocate input and output tensors needed for a backend-dependent network inference.
- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) implementation.
- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ov_plugin_dg_async_infer_request) implementation.
- Call inference stages one by one synchronously.
InferRequest Class
@@ -81,4 +81,4 @@ The method returns the profiling info which was measured during pipeline stages
@snippet src/sync_infer_request.cpp infer_request:get_profiling_info
The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) class.
The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ov_plugin_dg_async_infer_request) class.

View File

@@ -10,7 +10,7 @@
Implement Plugin Functionality <openvino_docs_ov_plugin_dg_plugin>
Implement Compiled Model Functionality <openvino_docs_ov_plugin_dg_compiled_model>
Implement Synchronous Inference Request <openvino_docs_ov_plugin_dg_infer_request>
Implement Asynchronous Inference Request <openvino_docs_ie_plugin_dg_async_infer_request>
Implement Asynchronous Inference Request <openvino_docs_ov_plugin_dg_async_infer_request>
Implement Remote Context <openvino_docs_ov_plugin_dg_remote_context>
Implement Remote Tensor <openvino_docs_ov_plugin_dg_remote_tensor>
openvino_docs_ov_plugin_dg_plugin_build
@@ -43,7 +43,7 @@ OpenVINO plugin dynamic library consists of several main components:
3. [Inference Request class](@ref openvino_docs_ov_plugin_dg_infer_request):
- Runs an inference pipeline serially.
- Can extract performance counters for an inference pipeline execution profiling.
4. [Asynchronous Inference Request class](@ref openvino_docs_ie_plugin_dg_async_infer_request):
4. [Asynchronous Inference Request class](@ref openvino_docs_ov_plugin_dg_async_infer_request):
- Wraps the [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class and runs pipeline stages in parallel on several task executors based on a device-specific pipeline structure.
5. [Remote Context](@ref openvino_docs_ov_plugin_dg_remote_context):
- Provides the device specific remote context. Context allows to create remote tensors.
@@ -61,7 +61,7 @@ Detailed guides
* [Build](@ref openvino_docs_ov_plugin_dg_plugin_build) a plugin library using CMake
* Plugin and its components [testing](@ref openvino_docs_ov_plugin_dg_plugin_testing)
* [Quantized networks](@ref openvino_docs_ie_plugin_dg_quantized_networks)
* [Quantized networks](@ref openvino_docs_ov_plugin_dg_quantized_models)
* [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
* [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide

View File

@@ -85,7 +85,7 @@ Actual model compilation is done in the `CompiledModel` constructor. Refer to th
The function accepts a const shared pointer to `ov::Model` object and applies common and device-specific transformations on a copied model to make it more friendly to hardware operations. For details how to write custom device-specific transformation, please, refer to [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide. See detailed topics about model representation:
* [Intermediate Representation and Operation Sets](@ref openvino_docs_MO_DG_IR_and_opsets)
* [Quantized models](@ref openvino_docs_ie_plugin_dg_quantized_networks).
* [Quantized models](@ref openvino_docs_ov_plugin_dg_quantized_models).
@snippet template/src/plugin.cpp plugin:transform_model

View File

@@ -8,7 +8,7 @@ OpenVINO Plugin tests are included in the `openvino::funcSharedTests` CMake targ
Test definitions are split into tests class declaration (see `src/tests/functional/plugin/shared/include`) and tests class implementation (see `src/tests/functional/plugin/shared/src`) and include the following scopes of plugin conformance tests:
1. **Behavior tests** (`behavior` sub-folder), which are a separate test group to check that a plugin satisfies basic OpenVINO concepts: plugin creation, multiple executable networks support, multiple synchronous and asynchronous inference requests support, and so on. See the next section with details how to instantiate the tests definition class with plugin-specific parameters.
1. **Behavior tests** (`behavior` sub-folder), which are a separate test group to check that a plugin satisfies basic OpenVINO concepts: plugin creation, multiple compiled models support, multiple synchronous and asynchronous inference requests support, and so on. See the next section with details how to instantiate the tests definition class with plugin-specific parameters.
2. **Single layer tests** (`single_layer_tests` sub-folder). This groups of tests checks that a particular single layer can be inferenced on a device. An example of test instantiation based on test definition from `openvino::funcSharedTests` library:

View File

@@ -1,8 +1,8 @@
# Quantized networks compute and restrictions {#openvino_docs_ie_plugin_dg_quantized_networks}
# Quantized models compute and restrictions {#openvino_docs_ov_plugin_dg_quantized_models}
One of the feature of Inference Engine is the support of quantized networks with different precisions: INT8, INT4, etc.
One of the feature of OpenVINO is the support of quantized models with different precisions: INT8, INT4, etc.
However, it is up to the plugin to define what exact precisions are supported by the particular HW.
All quantized networks which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
All quantized models which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
For more details about low-precision model representation please refer to this [document](@ref openvino_docs_ie_plugin_dg_lp_representation).
### Interpreting FakeQuantize at runtime
@@ -44,6 +44,6 @@ Below we define these rules as follows:
- Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.
- Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.
- Non-unified quantization parameters for Eltwise and Concat operations.
- Non-quantized network output, i.e. there are no quantization parameters for it.
- Non-quantized models output, i.e. there are no quantization parameters for it.
[qdq_propagation]: images/qdq_propagation.png

View File

@@ -6,13 +6,13 @@
:maxdepth: 1
:hidden:
openvino_docs_ie_plugin_dg_quantized_networks
openvino_docs_ov_plugin_dg_quantized_models
openvino_docs_OV_UG_lpt
@endsphinxdirective
The guides below provides extra information about specific features of OpenVINO needed for understanding during OpenVINO plugin development:
* [Quantized networks](@ref openvino_docs_ie_plugin_dg_quantized_networks)
* [Quantized networks](@ref openvino_docs_ov_plugin_dg_quantized_models)
* [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
* [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide