* Remove the skip of template plugin tests * Enable some skipped tests for template plugin * Added cancel callback, collect per-layer statistic, fixed tests * Fixed template tests * Rename internal API terminate to cancel * Fixed windows tests * Fixed logic with performance counters
3.1 KiB
Asynchronous Inference Request
Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure. OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class:
- The class has the
m_pipelinefield ofstd::vector<std::pair<std::shared_ptr<ov::threading::ITaskExecutor>, ov::threading::Task> >, which contains pairs of an executor and executed task. - All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
- The class has the ov::IAsyncInferRequest::stop_and_wait method, which waits for
m_pipelineto finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the compiled model instance and are not destroyed.
AsyncInferRequest Class
OpenVINO Runtime Plugin API provides the base ov::IAsyncInferRequest class for a custom asynchronous inference request implementation:
@snippet src/async_infer_request.hpp async_infer_request:header
Class Fields
m_cancel_callback- a callback which allows to interrupt the executionm_wait_executor- a task executor that waits for a response from a device about device tasks completion
Note
: If a plugin can work with several instances of a device,
m_wait_executormust be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
AsyncInferRequest()
The main goal of the AsyncInferRequest constructor is to define a device pipeline m_pipeline. The example below demonstrates m_pipeline creation with the following stages:
infer_preprocess_and_start_pipelineis a CPU ligthweight task to submit tasks to a remote device.wait_pipelineis a CPU non-compute task that waits for a response from a remote device.infer_postprocessis a CPU compute task.
@snippet src/async_infer_request.cpp async_infer_request:ctor
The stages are distributed among two task executors in the following way:
infer_preprocess_and_start_pipelineprepare input tensors and run onm_request_executor, which computes CPU tasks.- You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one.
wait_pipelineis sent tom_wait_executor, which works with the device.
Note
:
m_callback_executoris also passed to the constructor and it is used in the base ov::IAsyncInferRequest class, which adds a pair ofcallback_executorand a callback function set by the user to the end of the pipeline.
~AsyncInferRequest()
In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the ov::IAsyncInferRequest::stop_and_wait method of the base class.
@snippet src/async_infer_request.cpp async_infer_request:dtor
cancel()
The method allows to cancel the infer request execution:
@snippet src/async_infer_request.cpp async_infer_request:cancel