* Added VariableState to Plugin API documentation * More fixes for plugin documentation * Added ie_memory_state.hpp to documentation * Added proper dependencies between C++ and Plugin API targets * Fixed issues in public C++ API reference * Fixed issues in public C++ API reference: part 2 * Removed obsolete entries from EXCLUDE_SYMBOLS in doxygen config * Fixed path to examples, tag files for Plugin API doxygen file * Put impl to a private section for VariableStatebase * Fixed examples path to Plugin API: part 2 * Fixed path to examples in main ie_docs doxygen file * Replaced path to snippets; otherwise path depends on how cloned repo is named * Added path to snippets for ie_docs doxygen file as well * Great amount of fixes for documentation * Removed IE_SET_METRIC * Fixes for C API documentation * More fixes for documentation * Restored Transformations API as a part of Plugin API * Fixed tag files usage for Plugin API * Fixed link to FakeQuantize operation
7.2 KiB
Executable Network
ExecutableNetwork class functionality:
- Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
- Create an arbitrary number of
InferRequestobjects - Hold some common resources shared between different instances of
InferRequest. For example:- InferenceEngine::ExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
- InferenceEngine::ExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread
ExecutableNetwork Class
Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows:
@snippet src/template_executable_network.hpp executable_network:header
Class Fields
The example class has several fields:
_requestId- Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library._cfg- Defines a configuration an executable network was compiled with._plugin- Refers to a plugin instance._function- Keeps a reference to transformedngraph::Functionwhich is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation_functionhas different type and represents backend specific graph or just a set of computational kernels to perform an inference._inputIndex- maps a name of input with its index among all network inputs._outputIndex- maps a name of output with its index among all network outputs.
ExecutableNetwork Constructor with ICNNNetwork
This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:
@snippet src/template_executable_network.cpp executable_network:ctor_cnnnetwork
The implementation CompileNetwork is fully device-specific.
CompileNetwork()
The function accepts a const shared pointer to ngraph::Function object and performs the following steps:
- Applies ngraph passes using
TransformNetworkfunction, which defines plugin-specific conversion pipeline. - Maps the transformed graph to a backend specific graph representation (for example, to MKLDNN graph for Intel CPU).
- Allocates and fills memory for graph weights, backend specific memory handles and so on.
@snippet src/template_executable_network.cpp executable_network:map_graph
Note
: After all these steps, the backend specific graph is ready to create inference requests and perform inference.
ExecutableNetwork Constructor Importing from Stream
This constructor creates a backend specific graph by importing from a stream object:
Note
: The export of backend specific graph is done in the
ExportImplmethod, and data formats must be the same for both import and export.
@snippet src/template_executable_network.cpp executable_network:ctor_import_stream
ExportImpl()
Implementation details:
Base InferenceEngine::ExecutableNetworkThreadSafeDefault class implements the public InferenceEngine::ExecutableNetworkThreadSafeDefault::Export method as following:
- Writes
_plugin->GetName()to themodelstream. - Calls the
ExportImplmethod defined in a derived class to dump a backend specific graph.
The implementation of the method should write all data to the model stream, which is required to import a backend specific graph later in the Plugin::Import method:
@snippet src/template_executable_network.cpp executable_network:export_impl
CreateInferRequest()
The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
- [Synchronous inference request](@ref infer_request), which defines pipeline stages and runs them synchronously in the
Infermethod. - [Asynchronous inference request](@ref async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
- For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the
_taskExecutorexecutor. - For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
Important
: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
- For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the
@snippet src/template_executable_network.cpp executable_network:create_infer_request
CreateInferRequestImpl()
This is a helper method used by CreateInferRequest to create a [synchronous inference request](@ref infer_request), which is later wrapped with the asynchronous inference request class:
@snippet src/template_executable_network.cpp executable_network:create_infer_request_impl
GetMetric()
Returns a metric value for a metric with the name name. A metric is a static type of information about an executable network. Examples of metrics:
- EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
- EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
- Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example,
template/template_config.hpp
@snippet src/template_executable_network.cpp executable_network:get_metric
The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
GetConfig()
Returns a current value for a configuration key with the name name. The method extracts configuration values an executable network is compiled with.
@snippet src/template_executable_network.cpp executable_network:get_config
This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the Compile tool).
The next step in plugin library implementation is the [Synchronous Inference Request](@ref infer_request) class.