Files
openvino/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md
Mikhail Ryzhov 40691e9cc5 [GNA] Renamed plugin and moved it to new place (#8944)
* Renamed plugin

* Changed plugin source dir

* Fixed tests

* Renamed ENABLE_GNA to ENABLE_INTEL_GNA

* Fixed centos compile error

* One more renaming place

* HF for centos 7

* renamed the name in version api

* Fixed CI configurations

* Disabled GNA plugin for old gcc (<=5.4)

* Fixed Azure Win job
2021-12-06 13:00:35 +03:00

12 KiB

Inference Engine Developer Guide

This Guide provides an overview of the Inference Engine describing the typical workflow for performing inference of a pre-trained and optimized deep learning model and a set of sample applications.

NOTE: Before you perform inference with the Inference Engine, your models should be converted to the Inference Engine format using the Model Optimizer or built directly in runtime using nGraph API. To learn about how to use Model Optimizer, refer to the Model Optimizer Developer Guide. To learn about the pre-trained and optimized models delivered with the OpenVINO™ toolkit, refer to [Pre-Trained Models](@ref omz_models_group_intel).

After you have used the Model Optimizer to create an Intermediate Representation (IR), use the Inference Engine to infer the result for a given input data.

Inference Engine is a set of C++ libraries providing a common API to deliver inference solutions on the platform of your choice: CPU, GPU, or VPU. Use the Inference Engine API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. While the C++ libraries is the primary implementation, C libraries and Python bindings are also available.

For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages.

The open source version is available in the OpenVINO™ toolkit GitHub repository and can be built for supported platforms using the Inference Engine Build Instructions.

To learn about how to use the Inference Engine API for your application, see the Integrating Inference Engine in Your Application documentation.

For complete API Reference, see the Inference Engine API References section.

Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, etc. Each plugin implements the unified API and provides additional hardware-specific APIs.

Modules in the Inference Engine component

Core Inference Engine Libraries

Your application must link to the core Inference Engine libraries:

  • Linux* OS:
    • libinference_engine.so, which depends on libinference_engine_transformations.so, libtbb.so, libtbbmalloc.so and libngraph.so
  • Windows* OS:
    • inference_engine.dll, which depends on inference_engine_transformations.dll, tbb.dll, tbbmalloc.dll and ngraph.dll
  • macOS*:
    • libinference_engine.dylib, which depends on libinference_engine_transformations.dylib, libtbb.dylib, libtbbmalloc.dylib and libngraph.dylib

The required C++ header files are located in the include directory.

This library contains the classes to:

  • Create Inference Engine Core object to work with devices and read network (InferenceEngine::Core)
  • Manipulate network information (InferenceEngine::CNNNetwork)
  • Execute and pass inputs and outputs (InferenceEngine::ExecutableNetwork and InferenceEngine::InferRequest)

Plugin Libraries to Read a Network Object

Starting from 2022.1 release, OpenVINO Runtime introduced a concept of frontend plugins. Such plugins can be automatically dynamically loaded by OpenVINO Runtime dynamically depending on file format:

  • Linux* OS:
    • libir_ov_frontend.so to read a network from IR
    • libpaddlepaddle_ov_frontend.so to read a network from PaddlePaddle model format
    • libonnx_ov_frontend.so to read a network from ONNX model format
  • Windows* OS:
    • ir_ov_frontend.dll to read a network from IR
    • paddlepaddle_ov_frontend.dll to read a network from PaddlePaddle model format
    • onnx_ov_frontend.dll to read a network from ONNX model format

Device-Specific Plugin Libraries

For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available:

Plugin Device Type
CPU Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE
GPU Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
MYRIAD Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X
GNA Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor
HETERO Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers
MULTI Simultaneous inference of the same network on several devices in parallel

The table below shows the plugin libraries and additional dependencies for Linux, Windows and macOS platforms.

Plugin Library name for Linux Dependency libraries for Linux Library name for Windows Dependency libraries for Windows Library name for macOS Dependency libraries for macOS
CPU libMKLDNNPlugin.so libinference_engine_lp_transformations.so MKLDNNPlugin.dll inference_engine_lp_transformations.dll libMKLDNNPlugin.so inference_engine_lp_transformations.dylib
GPU libov_intel_gpu_plugin.so libinference_engine_lp_transformations.so, libOpenCL.so ov_intel_gpu_plugin.dll OpenCL.dll, inference_engine_lp_transformations.dll Is not supported -
MYRIAD libmyriadPlugin.so libusb.so, myriadPlugin.dll usb.dll libmyriadPlugin.so libusb.dylib
HDDL libHDDLPlugin.so libbsl.so, libhddlapi.so, libmvnc-hddl.so HDDLPlugin.dll bsl.dll, hddlapi.dll, json-c.dll, libcrypto-1_1-x64.dll, libssl-1_1-x64.dll, mvnc-hddl.dll Is not supported -
GNA libov_intel_gna_plugin.so libgna.so, ov_intel_gna_plugin.dll gna.dll Is not supported -
HETERO libov_hetero_plugin.so Same as for selected plugins ov_hetero_plugin.dll Same as for selected plugins libov_hetero_plugin.so Same as for selected plugins
MULTI libov_auto_plugin.so Same as for selected plugins ov_auto_plugin.dll Same as for selected plugins libov_auto_plugin.so Same as for selected plugins
AUTO libov_auto_plugin.so Same as for selected plugins ov_auto_plugin.dll Same as for selected plugins libov_auto_plugin.so Same as for selected plugins

Note

: All plugin libraries also depend on core Inference Engine libraries.

Make sure those libraries are in your computer's path or in the place you pointed to in the plugin loader. Make sure each plugin's related dependencies are in the:

  • Linux: LD_LIBRARY_PATH
  • Windows: PATH
  • macOS: DYLD_LIBRARY_PATH

On Linux and macOS, use the script setupvars.sh to set the environment variables.

On Windows, run the setupvars.bat batch file to set the environment variables.

To learn more about supported devices and corresponding plugins, see the Supported Devices chapter.

Common Workflow for Using the Inference Engine API

The common workflow contains the following steps:

  1. Create Inference Engine Core object - Create an InferenceEngine::Core object to work with different devices, all device plugins are managed internally by the Core object. Register extensions with custom nGraph operations (InferenceEngine::Core::AddExtension).

  2. Read the Intermediate Representation - Using the InferenceEngine::Core class, read an Intermediate Representation file into an object of the InferenceEngine::CNNNetwork class. This class represents the network in the host memory.

  3. Prepare inputs and outputs format - After loading the network, specify input and output precision and the layout on the network. For these specification, use the InferenceEngine::CNNNetwork::getInputsInfo() and InferenceEngine::CNNNetwork::getOutputsInfo().

  4. Pass per device loading configurations specific to this device (InferenceEngine::Core::SetConfig), and register extensions to this device (InferenceEngine::Core::AddExtension).

  5. Compile and Load Network to device - Use the InferenceEngine::Core::LoadNetwork() method with specific device (e.g. CPU, GPU, etc.) to compile and load the network on the device. Pass in the per-target load configuration for this compilation and load operation.

  6. Set input data - With the network loaded, you have an InferenceEngine::ExecutableNetwork object. Use this object to create an InferenceEngine::InferRequest in which you signal the input buffers to use for input and output. Specify a device-allocated memory and copy it into the device memory directly, or tell the device to use your application memory to save a copy.

  7. Execute - With the input and output memory now defined, choose your execution mode:

    • Synchronously - InferenceEngine::InferRequest::Infer() method. Blocks until inference is completed.
    • Asynchronously - InferenceEngine::InferRequest::StartAsync() method. Check status with the InferenceEngine::InferRequest::Wait() method (0 timeout), wait, or specify a completion callback.
  8. Get the output - After inference is completed, get the output memory or read the memory you provided earlier. Do this with the InferenceEngine::IInferRequest::GetBlob() method.

Video: Inference Engine Concept

Further Reading

For more details on the Inference Engine API, refer to the Integrating Inference Engine in Your Application documentation.