Added documentation for InferRequest (#16350)
* Added documentation for InferRequest * Updated documentation for methods * Fixed doc
This commit is contained in:
parent
4411a6ea45
commit
c472b020b7
@ -16,7 +16,7 @@ OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequest
|
||||
|
||||
#### Class Fields
|
||||
|
||||
- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ie_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
|
||||
- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
|
||||
- `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion
|
||||
|
||||
> **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
|
||||
|
@ -53,7 +53,7 @@ The method creates an synchronous inference request and returns it.
|
||||
|
||||
While the public OpenVINO API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
|
||||
|
||||
- [Synchronous inference request](@ref openvino_docs_ie_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `infer` method.
|
||||
- [Synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `infer` method.
|
||||
- [Asynchronous inference request](@ref openvino_docs_ie_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
|
||||
- For single-stage pipelines, there is no need to define this method and create a class derived from ov::IAsyncInferRequest. For single stage pipelines, a default implementation of this method creates ov::IAsyncInferRequest wrapping a synchronous inference request and runs it asynchronously in the `m_request_executor` executor.
|
||||
- For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
|
||||
@ -86,4 +86,4 @@ The methods returns the runtime model with backend specific information.
|
||||
|
||||
@snippet src/compiled_model.cpp compiled_model:get_runtime_model
|
||||
|
||||
The next step in plugin library implementation is the [Synchronous Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) class.
|
||||
The next step in plugin library implementation is the [Synchronous Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class.
|
||||
|
@ -1,83 +1,84 @@
|
||||
# Synchronous Inference Request {#openvino_docs_ie_plugin_dg_infer_request}
|
||||
# Synchronous Inference Request {#openvino_docs_ov_plugin_dg_infer_request}
|
||||
|
||||
`InferRequest` class functionality:
|
||||
- Allocate input and output blobs needed for a backend-dependent network inference.
|
||||
- Allocate input and output tensors needed for a backend-dependent network inference.
|
||||
- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) implementation.
|
||||
- Call inference stages one by one synchronously.
|
||||
|
||||
`InferRequest` Class
|
||||
InferRequest Class
|
||||
------------------------
|
||||
|
||||
Inference Engine Plugin API provides the helper InferenceEngine::IInferRequestInternal class recommended
|
||||
to use as a base class for a synchronous inference request implementation. Based of that, a declaration
|
||||
OpenVINO Plugin API provides the interface ov::ISyncInferRequest which should be
|
||||
used as a base class for a synchronous inference request implementation. Based of that, a declaration
|
||||
of a synchronous request class can look as follows:
|
||||
|
||||
@snippet src/sync_infer_request.hpp infer_request:header
|
||||
|
||||
#### Class Fields
|
||||
### Class Fields
|
||||
|
||||
The example class has several fields:
|
||||
|
||||
- `_executableNetwork` - reference to an executable network instance. From this reference, an inference request instance can take a task executor, use counter for a number of created inference requests, and so on.
|
||||
- `_profilingTask` - array of the `std::array<InferenceEngine::ProfilingTask, numOfStages>` type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
|
||||
- `_durations` - array of durations of each pipeline stage.
|
||||
- `_networkInputBlobs` - input blob map.
|
||||
- `_networkOutputBlobs` - output blob map.
|
||||
- `_parameters` - `ngraph::Function` parameter operations.
|
||||
- `_results` - `ngraph::Function` result operations.
|
||||
- `m_profiling_task` - array of the `std::array<openvino::itt::handle_t, numOfStages>` type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
|
||||
- `m_durations` - array of durations of each pipeline stage.
|
||||
- backend specific fields:
|
||||
- `_inputTensors` - inputs tensors which wrap `_networkInputBlobs` blobs. They are used as inputs to backend `_executable` computational graph.
|
||||
- `_outputTensors` - output tensors which wrap `_networkOutputBlobs` blobs. They are used as outputs from backend `_executable` computational graph.
|
||||
- `_executable` - an executable object / backend computational graph.
|
||||
- `m_backend_input_tensors` - input backend tensors.
|
||||
- `m_backend_output_tensors` - output backend tensors.
|
||||
- `m_executable` - an executable object / backend computational graph.
|
||||
|
||||
### `InferRequest` Constructor
|
||||
### InferRequest Constructor
|
||||
|
||||
The constructor initializes helper fields and calls methods which allocate blobs:
|
||||
The constructor initializes helper fields and calls methods which allocate tensors:
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:ctor
|
||||
|
||||
> **NOTE**: Call InferenceEngine::CNNNetwork::getInputsInfo and InferenceEngine::CNNNetwork::getOutputsInfo to specify both layout and precision of blobs, which you can set with InferenceEngine::InferRequest::SetBlob and get with InferenceEngine::InferRequest::GetBlob. A plugin uses these hints to determine its internal layouts and precisions for input and output blobs if needed.
|
||||
> **NOTE**: Use inputs/outputs information from the compiled model to understand shape and element type of tensors, which you can set with ov::InferRequest::set_tensor and get with ov::InferRequest::get_tensor. A plugin uses these hints to determine its internal layouts and element types for input and output tensors if needed.
|
||||
|
||||
### `~InferRequest` Destructor
|
||||
### ~InferRequest Destructor
|
||||
|
||||
Decrements a number of created inference requests:
|
||||
Destructor can contain plugin specific logic to finish and destroy infer request.
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:dtor
|
||||
|
||||
### `InferImpl()`
|
||||
### set_tensors_impl()
|
||||
|
||||
**Implementation details:** Base IInferRequestInternal class implements the public InferenceEngine::IInferRequestInternal::Infer method as following:
|
||||
- Checks blobs set by users
|
||||
- Calls the `InferImpl` method defined in a derived class to call actual pipeline stages synchronously
|
||||
The method allows to set batched tensors in case if the plugin supports it.
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:infer_impl
|
||||
@snippet src/sync_infer_request.cpp infer_request:set_tensors_impl
|
||||
|
||||
#### 1. `inferPreprocess`
|
||||
### query_state()
|
||||
|
||||
Below is the code of the `inferPreprocess` method to demonstrate Inference Engine common preprocessing step handling:
|
||||
The method returns variable states from the model.
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:query_state
|
||||
|
||||
### infer()
|
||||
|
||||
The method calls actual pipeline stages synchronously. Inside the method plugin should check input/output tensors, move external tensors to backend and run the inference.
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:infer
|
||||
|
||||
#### 1. infer_preprocess()
|
||||
|
||||
Below is the code of the `infer_preprocess()` method. The method checks user input/output tensors and demonstrates conversion from user tensor to backend specific representation:
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:infer_preprocess
|
||||
|
||||
**Details:**
|
||||
* `InferImpl` must call the InferenceEngine::IInferRequestInternal::execDataPreprocessing function, which executes common Inference Engine preprocessing step (for example, applies resize or color conversion operations) if it is set by the user. The output dimensions, layout and precision matches the input information set via InferenceEngine::CNNNetwork::getInputsInfo.
|
||||
* If `inputBlob` passed by user differs in terms of precisions from precision expected by plugin, `blobCopy` is performed which does actual precision conversion.
|
||||
#### 2. start_pipeline()
|
||||
|
||||
#### 2. `startPipeline`
|
||||
|
||||
Executes a pipeline synchronously using `_executable` object:
|
||||
Executes a pipeline synchronously using `m_executable` object:
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:start_pipeline
|
||||
|
||||
#### 3. `inferPostprocess`
|
||||
#### 3. infer_postprocess()
|
||||
|
||||
Converts output blobs if precisions of backend output blobs and blobs passed by user are different:
|
||||
Converts backend specific tensors to tensors passed by user:
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:infer_postprocess
|
||||
|
||||
### `GetPerformanceCounts()`
|
||||
### get_profiling_info()
|
||||
|
||||
The method sets performance counters which were measured during pipeline stages execution:
|
||||
The method returns the profiling info which was measured during pipeline stages execution:
|
||||
|
||||
@snippet src/sync_infer_request.cpp infer_request:get_performance_counts
|
||||
@snippet src/sync_infer_request.cpp infer_request:get_profiling_info
|
||||
|
||||
The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) class.
|
||||
|
@ -9,7 +9,7 @@
|
||||
|
||||
Implement Plugin Functionality <openvino_docs_ov_plugin_dg_plugin>
|
||||
Implement Compiled Model Functionality <openvino_docs_ov_plugin_dg_compiled_model>
|
||||
Implement Synchronous Inference Request <openvino_docs_ie_plugin_dg_infer_request>
|
||||
Implement Synchronous Inference Request <openvino_docs_ov_plugin_dg_infer_request>
|
||||
Implement Asynchronous Inference Request <openvino_docs_ie_plugin_dg_async_infer_request>
|
||||
openvino_docs_ov_plugin_dg_plugin_build
|
||||
openvino_docs_ov_plugin_dg_plugin_testing
|
||||
@ -37,13 +37,13 @@ OpenVINO plugin dynamic library consists of several main components:
|
||||
2. [Compiled Modek class](@ref openvino_docs_ov_plugin_dg_compiled_model):
|
||||
- Is an execution configuration compiled for a particular device and takes into account its capabilities.
|
||||
- Holds a reference to a particular device and a task executor for this device.
|
||||
- Can create several instances of [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request).
|
||||
- Can create several instances of [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request).
|
||||
- Can export an internal backend specific graph structure to an output stream.
|
||||
3. [Inference Request class](@ref openvino_docs_ie_plugin_dg_infer_request):
|
||||
3. [Inference Request class](@ref openvino_docs_ov_plugin_dg_infer_request):
|
||||
- Runs an inference pipeline serially.
|
||||
- Can extract performance counters for an inference pipeline execution profiling.
|
||||
4. [Asynchronous Inference Request class](@ref openvino_docs_ie_plugin_dg_async_infer_request):
|
||||
- Wraps the [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) class and runs pipeline stages in parallel
|
||||
- Wraps the [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class and runs pipeline stages in parallel
|
||||
on several task executors based on a device-specific pipeline structure.
|
||||
|
||||
> **NOTE**: This documentation is written based on the `Template` plugin, which demonstrates plugin
|
||||
|
@ -39,7 +39,7 @@ The provided plugin class also has several fields:
|
||||
As an example, a plugin configuration has three value parameters:
|
||||
|
||||
- `device_id` - particular device ID to work with. Applicable if a plugin supports more than one `Template` device. In this case, some plugin methods, like `set_property`, `query_model`, and `compile_model`, must support the ov::device::id property.
|
||||
- `perf_counts` - boolean value to identify whether to collect performance counters during [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) execution.
|
||||
- `perf_counts` - boolean value to identify whether to collect performance counters during [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) execution.
|
||||
- `streams_executor_config` - configuration of `ov::threading::IStreamsExecutor` to handle settings of multi-threaded context.
|
||||
- `performance_mode` - configuration of `ov::hint::PerformanceMode` to set the performance mode.
|
||||
|
||||
|
@ -95,9 +95,28 @@ ov::template_plugin::InferRequest::InferRequest(const std::shared_ptr<const ov::
|
||||
}
|
||||
// ! [infer_request:ctor]
|
||||
|
||||
// ! [infer_request:dtor]
|
||||
ov::template_plugin::InferRequest::~InferRequest() = default;
|
||||
// ! [infer_request:dtor]
|
||||
|
||||
// ! [infer_request:set_tensors_impl]
|
||||
void ov::template_plugin::InferRequest::set_tensors_impl(const ov::Output<const ov::Node> port,
|
||||
const std::vector<ov::Tensor>& tensors) {
|
||||
for (const auto& input : get_inputs()) {
|
||||
if (input == port) {
|
||||
m_batched_tensors[input.get_tensor_ptr()] = tensors;
|
||||
return;
|
||||
}
|
||||
}
|
||||
OPENVINO_THROW("Cannot find input tensors for port ", port);
|
||||
}
|
||||
// ! [infer_request:set_tensors_impl]
|
||||
|
||||
// ! [infer_request:query_state]
|
||||
std::vector<std::shared_ptr<ov::IVariableState>> ov::template_plugin::InferRequest::query_state() const {
|
||||
return m_variable_states;
|
||||
}
|
||||
// ! [infer_request:query_state]
|
||||
|
||||
std::shared_ptr<const ov::template_plugin::CompiledModel> ov::template_plugin::InferRequest::get_template_model()
|
||||
const {
|
||||
@ -107,11 +126,7 @@ std::shared_ptr<const ov::template_plugin::CompiledModel> ov::template_plugin::I
|
||||
return template_model;
|
||||
}
|
||||
|
||||
// ! [infer_request:dtor]
|
||||
ov::template_plugin::InferRequest::~InferRequest() = default;
|
||||
// ! [infer_request:dtor]
|
||||
|
||||
// ! [infer_request:infer_impl]
|
||||
// ! [infer_request:infer]
|
||||
void ov::template_plugin::InferRequest::infer() {
|
||||
// TODO: fill with actual list of pipeline stages, which are executed synchronously for sync infer requests
|
||||
infer_preprocess();
|
||||
@ -119,7 +134,7 @@ void ov::template_plugin::InferRequest::infer() {
|
||||
wait_pipeline(); // does nothing in current implementation
|
||||
infer_postprocess();
|
||||
}
|
||||
// ! [infer_request:infer_impl]
|
||||
// ! [infer_request:infer]
|
||||
|
||||
// ! [infer_request:infer_preprocess]
|
||||
void ov::template_plugin::InferRequest::infer_preprocess() {
|
||||
@ -235,20 +250,7 @@ void ov::template_plugin::InferRequest::infer_postprocess() {
|
||||
}
|
||||
// ! [infer_request:infer_postprocess]
|
||||
|
||||
// ! [infer_request:set_blobs_impl]
|
||||
void ov::template_plugin::InferRequest::set_tensors_impl(const ov::Output<const ov::Node> port,
|
||||
const std::vector<ov::Tensor>& tensors) {
|
||||
for (const auto& input : get_inputs()) {
|
||||
if (input == port) {
|
||||
m_batched_tensors[input.get_tensor_ptr()] = tensors;
|
||||
return;
|
||||
}
|
||||
}
|
||||
OPENVINO_THROW("Cannot find input tensors for port ", port);
|
||||
}
|
||||
// ! [infer_request:set_blobs_impl]
|
||||
|
||||
// ! [infer_request:get_performance_counts]
|
||||
// ! [infer_request:get_profiling_info]
|
||||
std::vector<ov::ProfilingInfo> ov::template_plugin::InferRequest::get_profiling_info() const {
|
||||
std::vector<ov::ProfilingInfo> info;
|
||||
const auto fill_profiling_info = [](const std::string& name,
|
||||
@ -264,4 +266,4 @@ std::vector<ov::ProfilingInfo> ov::template_plugin::InferRequest::get_profiling_
|
||||
info.emplace_back(fill_profiling_info("output postprocessing", m_durations[Postprocess]));
|
||||
return info;
|
||||
}
|
||||
// ! [infer_request:get_performance_counts]
|
||||
// ! [infer_request:get_profiling_info]
|
||||
|
Loading…
Reference in New Issue
Block a user