Docs 2021 1 (#901)

* Initial state of dev docs * Ported docs for quantized networks * Integrate quantization guide + transformations template * Fixes
2020-06-15 12:20:42 +03:00
parent 36be9e4031
commit b058948763
17 changed files with 3192 additions and 15 deletions
--- a/docs/IE_PLUGIN_DG/AsyncInferRequest.md
+++ b/docs/IE_PLUGIN_DG/AsyncInferRequest.md
@@ -0,0 +1,49 @@
+# Asynchronous Inference Request {#async_infer_request}
+
+Asynchronous Inference Request runs an inference pipeline asynchronously in one or several task executors depending on a device pipeline structure.
+Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class:
+
+- The class has the `_pipeline` field of `std::vector<std::pair<ITaskExecutor::Ptr, Task> >`, which contains pairs of an executor and executed task.
+- All executors are passed as arguments to a class constructor and they are in the running state and ready to run tasks.
+- The class has the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method, which waits for `_pipeline` to finish in a class destructor. The method does not stop task executors and they are still in the running stage, because they belong to the executable network instance and are not destroyed.
+
+`AsyncInferRequest` Class
+------------------------
+
+Inference Engine Plugin API provides the base InferenceEngine::AsyncInferRequestThreadSafeDefault class for a custom asynchronous inference request implementation:
+
+@snippet src/template_async_infer_request.hpp async_infer_request:header
+
+#### Class Fields
+
+- `_inferRequest` - a reference to the [synchronous inference request](@ref infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
+- `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion
+
+> **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
+
+### `AsyncInferRequest()`
+
+The main goal of the `AsyncInferRequest` constructor is to define a device pipeline `_pipeline`. The example below demonstrates `_pipeline` creation with the following stages:
+
+- `inferPreprocess` is a CPU compute task.
+- `startPipeline` is a CPU ligthweight task to submit tasks to a remote device.
+- `waitPipeline` is a CPU non-compute task that waits for a response from a remote device.
+- `inferPostprocess` is a CPU compute task.
+
+@snippet src/template_async_infer_request.cpp async_infer_request:ctor
+
+The stages are distributed among two task executors in the following way:
+
+- `inferPreprocess` and `startPipeline` are combined into a single task and run on `_requestExecutor`, which computes CPU tasks.
+- You need at least two executors to overlap compute tasks of a CPU and a remote device the plugin works with. Otherwise, CPU and device tasks are executed serially one by one.
+- `waitPipeline` is sent to `_waitExecutor`, which works with the device.
+
+> **NOTE**: `callbackExecutor` is also passed to the constructor and it is used in the base InferenceEngine::AsyncInferRequestThreadSafeDefault class, which adds a pair of `callbackExecutor` and a callback function set by the user to the end of the pipeline.
+
+Inference request stages are also profiled using IE_PROFILING_AUTO_SCOPE, which shows how pipelines of multiple asynchronous inference requests are run in parallel via the [Intel® VTune™ Profiler](https://software.intel.com/en-us/vtune) tool.
+
+### `~AsyncInferRequest()`
+
+In the asynchronous request destructor, it is necessary to wait for a pipeline to finish. It can be done using the InferenceEngine::AsyncInferRequestThreadSafeDefault::StopAndWait method of the base class.
+
+@snippet src/template_async_infer_request.cpp async_infer_request:dtor
--- a/docs/IE_PLUGIN_DG/Building.md
+++ b/docs/IE_PLUGIN_DG/Building.md
@@ -0,0 +1,100 @@
+# Build Plugin Using CMake* {#plugin_build}
+
+Inference Engine build infrastructure provides the Inference Engine Developer Package for plugin development.
+
+Inference Engine Developer Package
+------------------------
+
+To automatically generate the Inference Engine Developer Package, run the `cmake` tool during a DLDT build:
+
+```bash
+$ mkdir dldt-release-build
+$ cd dldt-release-build
+$ cmake -DCMAKE_BUILD_TYPE=Release ../dldt 
+```
+
+Once the commands above are executed, the Inference Engine Developer Package is generated in the `dldt-release-build` folder. It consists of several files:
+ - `InferenceEngineDeveloperPackageConfig.cmake` - the main CMake script which imports targets and provides compilation flags and CMake options.
+ - `InferenceEngineDeveloperPackageConfig-version.cmake` - a file with a package version.
+ - `targets_developer.cmake` - an automatically generated file which contains all targets exported from the Deep Learning Deployment Toolkit (DLDT) build tree. This file is included by `InferenceEngineDeveloperPackageConfig.cmake` to import the following targets:
+   - Libraries for plugin development:
+	   * `IE::ngraph` - shared nGraph library
+	   * `IE::inference_engine` - shared Inference Engine library
+	   * `IE::inference_engine_preproc` - shared library with Inference Engine preprocessing plugin
+	   * `IE::inference_engine_plugin_api` - interface library with Inference Engine Plugin API headers
+	   * `IE::inference_engine_lp_transformations` - shared library with low-precision transformations
+	   * `IE::pugixml` - static Pugixml library
+	   * `IE::xbyak` - interface library with Xbyak headers
+   - Libraries for tests development:
+	   * `IE::gtest`, `IE::gtest_main`, `IE::gmock` - Google Tests framework libraries
+	   * `IE::commonTestUtils` - static library with common tests utilities 
+	   * `IE::funcTestUtils` - static library with functional tests utilities 
+	   * `IE::unitTestUtils` - static library with unit tests utilities 
+	   * `IE::ngraphFunctions` - static library with the set of Ngraph Functions builders
+	   * `IE::funcSharedTests` - static library with common functional tests
+
+> **Note:** it's enough just to run `cmake --build . --target ie_dev_targets` command to build only targets from the
+> Inference Engine Developer package.
+
+Build Plugin using Inference Engine Developer Package
+------------------------
+
+To build a plugin source tree using the Inference Engine Developer Package, run the commands below:
+
+```cmake
+$ mkdir template-plugin-release-build
+$ cd template-plugin-release-build
+$ cmake -DInferenceEngineDeveloperPackage_DIR=../dldt-release-build ../template-plugin
+```
+
+A common plugin consists of the following components:
+
+1. Plugin code in the `src` folder
+2. Code of tests in the `tests` folder
+
+To build a plugin and its tests, run the following CMake scripts:
+
+- Root `CMakeLists.txt`, which finds the Inference Engine Developer Package using the `find_package` CMake command and adds the `src` and `tests` subdirectories with plugin sources and their tests respectively:
+
+```cmake
+cmake_minimum_required(VERSION 3.13.3)
+
+project(InferenceEngineTemplatePlugin)
+
+set(IE_MAIN_TEMPLATE_PLUGIN_SOURCE_DIR ${InferenceEngineTemplatePlugin_SOURCE_DIR})
+
+find_package(InferenceEngineDeveloperPackage REQUIRED)
+
+add_subdirectory(src)
+
+if(ENABLE_TESTS)
+	include(CTest)
+	enable_testing()
+
+	if(ENABLE_FUNCTIONAL_TESTS)
+	    add_subdirectory(tests/functional)
+	endif()
+
+	if(ENABLE_BEH_TESTS)
+	    add_subdirectory(tests/behavior)
+	endif()
+endif()
+```
+
+> **NOTE**: The default values of the `ENABLE_TESTS`, `ENABLE_FUNCTIONAL_TESTS`, `ENABLE_BEH_TESTS` options are shared via the Inference Engine Developer Package and they are the same as for the main DLDT build tree. You can override them during plugin build using the command below:
+
+	```bash
+	$ cmake -DENABLE_FUNCTIONAL_TESTS=OFF -DInferenceEngineDeveloperPackage_DIR=../dldt-release-build ../template-plugin
+	``` 
+
+- `src/CMakeLists.txt` to build a plugin shared library from sources:
+
+@snippet src/CMakeLists.txt cmake:plugin
+
+> **NOTE**: `IE::inference_engine` target is imported from the Inference Engine Developer Package.
+
+- `tests/functional/CMakeLists.txt` to build a set of functional plugin tests:
+
+@snippet tests/functional/CMakeLists.txt cmake:functional_tests
+
+> **NOTE**: The `IE::funcSharedTests` static library with common functional Inference Engine Plugin tests is imported via the Inference Engine Developer Package.
--- a/docs/IE_PLUGIN_DG/Doxyfile
+++ b/docs/IE_PLUGIN_DG/Doxyfile
--- a/docs/IE_PLUGIN_DG/ExecutableNetwork.md
+++ b/docs/IE_PLUGIN_DG/ExecutableNetwork.md
@@ -0,0 +1,107 @@
+# Executable Network {#executable_network}
+
+`ExecutableNetwork` class functionality:
+- Compile an InferenceEngine::ICNNNetwork instance to a hardware-specific graph representation
+- Create an arbitrary number of `InferRequest` objects
+- Hold some common resources shared between different instances of `InferRequest`. For example:
+	- InferenceEngine::ExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
+	- InferenceEngine::ExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread
+
+`ExecutableNetwork` Class
+------------------------
+
+Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows: 
+
+@snippet src/template_executable_network.hpp executable_network:header
+
+#### Class Fields
+
+The example class has several fields:
+
+- `_requestId` - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
+- `_name` - Provides a network name.
+- `_cfg` - Defines a configuration an executable network was compiled with.
+- `_plugin` - Refers to a plugin instance.
+
+### `ExecutableNetwork` Constructor with `ICNNNetwork`
+
+This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a hardware-specific device graph:
+
+@snippet src/template_executable_network.cpp executable_network:ctor_cnnnetwork
+
+The implementation `CompileGraph` is fully device-specific.
+
+### `CompileGraph()`
+
+The function accepts a const shared pointer to `const ngraph::Function` object and performs the following steps:
+
+1. Deep copies a const object to a local object, which can later be modified.
+2. Applies common and plugin-specific transformations on a copied graph to make the graph more friendly to hardware operations. For details how to write custom plugin-specific transformation, please, refer to [Writing ngraph transformations](@ref new_ngraph_transformation) guide.
+3. Maps the transformed graph to a plugin-specific graph representation (for example, to MKLDNN graph for CPU). See details topics about network representation:
+    * [Intermediate Representation and Operation Sets](../_docs_MO_DG_IR_and_opsets.html)
+    * [Quantized networks](@ref quantized_networks).
+4. Allocates and fills memory for graph weights.
+
+@snippet src/template_executable_network.cpp executable_network:compile_graph
+
+> **NOTE**: After all these steps, the hardware-specific graph is ready to create inference requests and perform inference.
+
+### `ExecutableNetwork` Constructor Importing from Stream
+
+This constructor creates a hardware-specific graph by importing from a stream object:
+
+> **NOTE**: The export of hardware-specific graph is done in the `ExportImpl` method, and data formats must be the same for both import and export.
+
+@snippet src/template_executable_network.cpp executable_network:ctor_import_stream
+
+### `ExportImpl()`
+
+**Implementation details:**   
+Base InferenceEngine::ExecutableNetworkThreadSafeDefault class implements the public InferenceEngine::ExecutableNetworkThreadSafeDefault::Export method as following:
+- Writes `_plugin->GetName()` to the `model` stream.
+- Calls the `ExportImpl` method defined in a derived class to dump a hardware-specific graph.
+
+The implementation of the method should write all data to the `model` stream, which is required to import a hardware-specific graph later in the `Plugin::Import` method:
+
+@snippet src/template_executable_network.cpp executable_network:export_impl
+
+### `CreateInferRequest()`
+
+The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
+
+- [Synchronous inference request](@ref infer_request), which defines pipeline stages and runs them synchronously in the `Infer` method.
+- [Asynchronous inference request](@ref async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
+   - For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the `_taskExecutor` executor.
+   - For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
+
+   > **IMPORTANT**: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
+
+@snippet src/template_executable_network.cpp executable_network:create_infer_request
+
+### `CreateInferRequestImpl()`
+
+This is a helper method used by `CreateInferRequest` to create a [synchronous inference request](@ref infer_request), which is later wrapped with the asynchronous inference request class:
+
+@snippet src/template_executable_network.cpp executable_network:create_infer_request_impl
+
+### `GetMetric()`
+
+Returns a metric value for a metric with the name `name`.  A metric is a static type of information about an executable network. Examples of metrics:
+
+- EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
+- EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
+- Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example, `template/template_config.hpp`
+
+@snippet src/template_executable_network.cpp executable_network:get_metric
+
+The IE_SET_METRIC helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
+
+### `GetConfig()`
+
+Returns a current value for a configuration key with the name `name`. The method extracts configuration values an executable network is compiled with.
+
+@snippet src/template_executable_network.cpp executable_network:get_config
+
+This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the [Compile tool](../_inference_engine_tools_compile_tool_README.html)).
+
+The next step in plugin library implementation is the [Synchronous Inference Request](@ref infer_request) class.
--- a/docs/IE_PLUGIN_DG/InferRequest.md
+++ b/docs/IE_PLUGIN_DG/InferRequest.md
@@ -0,0 +1,69 @@
+# Synchronous Inference Request {#infer_request}
+
+`InferRequest` class functionality:
+- Allocate input and output blobs needed for a hardware-dependent network inference.
+- Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref async_infer_request) implementation.
+- Call inference stages one by one synchronously.
+
+`InferRequest` Class
+------------------------
+
+Inference Engine Plugin API provides the helper InferenceEngine::InferRequestInternal class recommended 
+to use as a base class for a synchronous inference request implementation. Based of that, a declaration 
+of a synchronous request class can look as follows: 
+
+@snippet src/template_infer_request.hpp infer_request:header
+
+#### Class Fields
+
+The example class has several fields:
+
+- `_executableNetwork` - reference to an executable network instance. From this reference, an inference request instance can take a task executor, use counter for a number of created inference requests, and so on.
+- `_profilingTask` - array of the `std::array<InferenceEngine::ProfilingTask, numOfStages>` type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
+- `_inputsNCHW` - input blob map
+- `_outputsNCHW` - output blob map
+- Several double values to hold an execution time for pipeline stages.
+
+### `InferRequest` Constructor
+
+The constructor initializes helper fields and calls methods which allocate blobs:
+
+@snippet src/template_infer_request.cpp infer_request:ctor
+
+The implementation of function allocating device buffers is fully device-specific and not provided in the guide. 
+The implementation of function allocating host buffers assumes that the `Template` device works 
+natively only with the InferenceEngine::NCHW input and output layout, while the user can specify the InferenceEngine::NHWC as a layout 
+of InferenceEngine::CNNNetwork inputs and outputs and set InferenceEngine::NHWC blobs via the InferenceEngine::InferRequest::SetBlob method.
+
+> **NOTE**: Call InferenceEngine::CNNNetwork::getInputsInfo and InferenceEngine::CNNNetwork::getOutputsInfo to specify both layout and precision of blobs, which you can set with InferenceEngine::InferRequest::SetBlob and get with InferenceEngine::InferRequest::GetBlob. A plugin uses these hints to determine its internal layouts and precisions for input and output blobs if needed. 
+
+### `~InferRequest` Destructor
+
+Decrements a number of created inference requests: 
+
+@snippet src/template_infer_request.cpp infer_request:dtor
+
+### `InferImpl()`
+
+**Implementation details:** Base InferRequestInternal class implements the public InferenceEngine::InferRequestInternal::Infer method as following:
+- Checks blobs set by users
+- Calls the `InferImpl` method defined in a derived class to call actual pipeline stages synchronously
+
+@snippet src/template_infer_request.cpp infer_request:infer_impl
+
+Below is the code of the the `inferPreprocess` method to demonstrate Inference Engine common preprocessing step handling:
+
+@snippet src/template_infer_request.cpp infer_request:infer_preprocess
+
+**Details:**
+* `InferImpl` must call the InferenceEngine::InferRequestInternal::execDataPreprocessing function, which executes common Inference Engine preprocessing step (for example, applies resize or color conversion operations) if it is set by the user. The output dimensions, layout and precision matches the input information set via InferenceEngine::CNNNetwork::getInputsInfo.
+* To handle both InferenceEngine::NCHW and InferenceEngine::NHWC input layouts, the `TemplateInferRequest` class has the `_inputsNCHW` field, which holds blobs in the InferenceEngine::NCHW layout. During Inference Request execution, `InferImpl` copies from the input InferenceEngine::NHWC layout to `_inputsNCHW` if needed.
+* The next logic of `InferImpl` works with `_inputsNCHW`.
+
+### `GetPerformanceCounts()`
+
+The method sets performance counters which were measured during pipeline stages execution:
+
+@snippet src/template_infer_request.cpp infer_request:get_performance_counts
+
+The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref async_infer_request) class.
--- a/docs/IE_PLUGIN_DG/Intro.md
+++ b/docs/IE_PLUGIN_DG/Intro.md
@@ -0,0 +1,47 @@
+@mainpage Overview of Inference Engine Plugin Library
+
+The plugin architecture of the Inference Engine allows to develop and plug independent inference 
+solutions dedicated to different devices. Physically, a plugin is represented as a dynamic library 
+exporting the single `CreatePluginEngine` function that allows to create a new plugin instance.
+
+Inference Engine Plugin Library
+-----------------------
+
+Inference Engine plugin dynamic library consists of several main components:
+
+1. [Plugin class](@ref plugin):
+	- Provides information about devices of a specific type.
+	- Can create an [executable network](@ref executable_network) instance which represents a Neural 
+	Network hardware-specific graph structure for a particular device in opposite to the InferenceEngine::ICNNNetwork 
+	interface which is hardware-independent.
+	- Can import an already compiled graph structure from an input stream to an 
+	[executable network](@ref executable_network) object.
+2. [Executable Network class](@ref executable_network):
+	- Is an execution configuration compiled for a particular device and takes into account its capabilities.
+	- Holds a reference to a particular device and a task executor for this device.
+	- Can create several instances of [Inference Request](@ref infer_request).
+	- Can export an internal hardware-specific graph structure to an output stream.
+3. [Inference Request class](@ref infer_request):
+    - Runs an inference pipeline serially.
+    - Can extract performance counters for an inference pipeline execution profiling.
+4. [Asynchronous Inference Request class](@ref async_infer_request):
+    - Wraps the [Inference Request](@ref infer_request) class and runs pipeline stages in parallel 
+	on several task executors based on a device-specific pipeline structure.
+
+> **NOTE**: This documentation is written based on the `Template` plugin, which demonstrates plugin 
+development details. Find the complete code of the `Template`, which is fully compilable and up-to-date,
+at `<dldt source dir>/docs_developer/template_plugin`.
+
+Detailed guides
+-----------------------
+
+* [Build](@ref plugin_build) a plugin library using CMake\*
+* Plugin and its components [testing](@ref plugin_testing)
+* [Quantized networks](@ref quantized_networks)
+* [Writing ngraph transformations](@ref new_ngraph_transformation) guide
+
+API References
+-----------------------
+
+* [Inference Engine Plugin API](group__ie__dev__api.html)
+* [Inference Engine Transformation API](group__ie__transformation__api.html)
--- a/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.md
+++ b/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.md
@@ -0,0 +1,18 @@
+# Representation of low-precision models
+The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime. 
+Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
+- **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
+- **Quantization**. The rest of this document is dedicated to the representation of quantized models.
+
+## Representation of quantized models
+The OpenVINO Toolkit represents all the quantized models using the so-called [FakeQuantize](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_Legacy_IR_Layers_Catalog_Spec.html#FakeQuantize) operation. This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
+In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize between operation and data blobs.  The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for weights and one for activations (bias is quantized using the same parameters).
+![quantized_convolution]
+<div align="center">Figure 1. Example of quantized Convolution operation.</div>
+
+Starting from OpenVINO 2020.2 release all the quantized models are represented in the compressed form. It means that the weights of low-precision operations are converted into the target precision (e.g. INT8). It helps to substantially reduce the model size. The rest of the parameters can be represented in FLOAT32 or FLOAT16 precision depending on the input full-precision model used in the quantization process. Fig. 2 below shows an example of the part of the compressed IR.
+![quantized_model_example]
+<div align="center">Figure 2. Example of compressed quantized model.</div>  
+
+[quantized_convolution]: ../images/quantized_convolution.png
+[quantized_model_example]: ../images/quantized_model_example.png
--- a/docs/IE_PLUGIN_DG/NewTransformation.md
+++ b/docs/IE_PLUGIN_DG/NewTransformation.md
@@ -0,0 +1,15 @@
+# Writing ngraph transformations {#new_ngraph_transformation}
+
+1. Code of such transformation MUST be directly in template plugin soruces.
+2. It must be mark with doxygen markeds like
+
+
+// ! [new_transformation:part1]
+
+bla-bla
+
+// ! [new_transformation:part1]
+
+And this file must refer to that code via
+
+@snippet src/template_transformation.cpp new_transformation:part1
--- a/docs/IE_PLUGIN_DG/Plugin.md
+++ b/docs/IE_PLUGIN_DG/Plugin.md
@@ -0,0 +1,165 @@
+# Plugin {#plugin}
+
+In addition to the Inference Engine Public API, the Inference Engine provides the Plugin API, which is a set of functions and helper classes that simplify new plugin development:
+
+- header files in the `inference_engine/src/plugin_api` directory
+- implementations in the `inference_engine/src/inference_engine` directory
+- symbols in the Inference Engine Core shared library
+
+To build an Inference Engine plugin with the Plugin API, see the [Inference Engine Plugin Building](@ref plugin_build) guide.  
+
+Plugin Class
+------------------------
+
+Inference Engine Plugin API provides the helper InferenceEngine::InferencePluginInternal class recommended to use as a base class for a plugin.
+Based on that, declaration of a plugin class can look as follows:
+
+@snippet src/template_plugin.hpp plugin:header
+
+#### Class Fields
+
+The provided plugin class also has a single field:
+
+* `_cfg` of type `Configuration`:
+
+@snippet src/template_config.hpp configuration:header
+
+As an example, a plugin configuration has three value parameters:
+
+- `deviceId` - particular device ID to work with. Applicable if a plugin supports more than one `Template` device. In this case, some plugin methods, like `SetConfig`, `QueryNetwork`, and `LoadNetwork`, must support the CONFIG_KEY(KEY_DEVICE_ID) parameter. 
+- `perfCounts` - boolean value to identify whether to collect performance counters during [Inference Request](@ref infer_request) execution.
+
+### Engine Constructor
+
+A plugin constructor must contain code that checks the ability to work with a device of the `Template` 
+type. For example, if some drivers are required, the code must check 
+driver availability. If a driver is not available (for example, OpenCL runtime is not installed in 
+case of a GPU device or there is an improper version of a driver is on a host machine), an exception 
+must be thrown from a plugin constructor.
+
+A plugin must define a device name enabled via the `_pluginName` field of a base class:
+
+@snippet src/template_plugin.cpp plugin:ctor
+
+### `LoadExeNetworkImpl()`
+
+**Implementation details:** The base InferenceEngine::InferencePluginInternal class provides a common implementation 
+of the public InferenceEngine::InferencePluginInternal::LoadNetwork method that calls plugin-specific `LoadExeNetworkImpl`, which is defined in a derived class.
+
+This is the most important function of the `Plugin` class and creates an instance of compiled `ExecutableNetwork`,
+which holds a hardware-dependent compiled graph in an internal representation:
+
+@snippet src/template_plugin.cpp plugin:load_exe_network_impl
+
+Before a creation of an `ExecutableNetwork` instance via a constructor, a plugin may check if a provided 
+InferenceEngine::ICNNNetwork object is supported by a device. In the example above, the plugin checks precision information.
+
+Actual graph compilation is done in the `ExecutableNetwork` constructor. Refer to the [ExecutableNetwork Implementation Guide](@ref executable_network) for details.
+
+> **NOTE**: Actual configuration map used in `ExecutableNetwork` is constructed as a base plugin 
+> configuration set via `Plugin::SetConfig`, where some values are overwritten with `config` passed to `Plugin::LoadExeNetworkImpl`. 
+> Therefore, the config of  `Plugin::LoadExeNetworkImpl` has a higher priority.
+
+### `QueryNetwork()`
+
+Use the method with the `HETERO` mode, which allows to distribute network execution between different 
+devices based on the `ngraph::Node::get_rt_info()` map, which can contain the `"affinity"` key.
+The `QueryNetwork` method analyzes operations of provided `network` and returns a list of supported
+operations via the InferenceEngine::QueryNetworkResult structure:
+
+@snippet src/template_plugin.cpp plugin:query_network
+
+### `AddExtension()`
+
+Adds an extension of the InferenceEngine::IExtensionPtr type to a plugin. If a plugin does not 
+support extensions, the method must throw an exception:
+
+@snippet src/template_plugin.cpp plugin:add_extension
+
+### `SetConfig()`
+
+Sets new values for plugin configuration keys:
+
+@snippet src/template_plugin.cpp plugin:set_config
+
+In the snippet above, the `Configuration` class overrides previous configuration values with the new 
+ones. All these values are used during hardware-specific graph compilation and execution of inference requests.
+
+> **NOTE**: The function must throw an exception if it receives an unsupported configuration key.
+
+### `GetConfig()`
+
+Returns a current value for a specified configuration key:
+
+@snippet src/template_plugin.cpp plugin:get_config
+
+The function is implemented with the `Configuration::Get` method, which wraps an actual configuration 
+key value to the InferenceEngine::Parameter and returns it.
+
+> **NOTE**: The function must throw an exception if it receives an unsupported configuration key.
+
+### `GetMetric()`
+
+Returns a metric value for a metric with the name `name`. A device metric is a static type of information 
+from a plugin about its devices or device capabilities. 
+
+Examples of metrics:
+
+- METRIC_KEY(AVAILABLE_DEVICES) - list of available devices that are required to implement. In this case, you can use 
+all devices of the same `Template` type with automatic logic of the `MULTI` device plugin.
+- METRIC_KEY(FULL_DEVICE_NAME) - full device name. In this case, a particular device ID is specified 
+in the `option` parameter as `{ CONFIG_KEY(KEY_DEVICE_ID), "deviceID" }`.
+- METRIC_KEY(SUPPORTED_METRICS) - list of metrics supported by a plugin
+- METRIC_KEY(SUPPORTED_CONFIG_KEYS) - list of configuration keys supported by a plugin that
+affects their behavior during a hardware-specific graph compilation or an inference requests execution
+- METRIC_KEY(OPTIMIZATION_CAPABILITIES) - list of optimization capabilities of a device.
+For example, supported data types and special optimizations for them.
+- Any other device-specific metrics. In this case, place metrics declaration and possible values to 
+a plugin-specific public header file, for example, `template/template_config.hpp`. The example below 
+demonstrates the definition of a new optimization capability value specific for a device:
+
+@snippet template/template_config.hpp public_header:metrics 
+
+The snippet below provides an example of the implementation for `GetMetric`:
+
+@snippet src/template_plugin.cpp plugin:get_metric
+
+> **NOTE**: If an unsupported metric key is passed to the function, it must throw an exception.
+
+### `ImportNetworkImpl()`
+
+The importing network mechanism allows to import a previously exported hardware-specific graph and wrap it 
+using an [ExecutableNetwork](@ref executable_network) object. This functionality is useful if 
+hardware-specific graph compilation takes significant time and/or cannot be done on a target host 
+device due to other reasons.
+
+**Implementation details:** The base plugin class InferenceEngine::InferencePluginInternal implements InferenceEngine::InferencePluginInternal::ImportNetwork 
+as follows: exports a device type (InferenceEngine::InferencePluginInternal::_pluginName) and then calls `ImportNetworkImpl`, 
+which is implemented in a derived class. 
+If a plugin cannot use the base implementation InferenceEngine::InferencePluginInternal::ImportNetwork, it can override base 
+implementation and define an output blob structure up to its needs. This 
+can be useful if a plugin exports a blob in a special format for integration with other frameworks 
+where a common Inference Engine header from a base class implementation is not appropriate. 
+
+During export of hardware-specific graph using `ExecutableNetwork::Export`, a plugin may export any 
+type of information it needs to import a compiled graph properly and check its correctness. 
+For example, the export information may include:
+
+- Compilation options (state of `Plugin::_cfg` structure)
+- Information about a plugin and a device type to check this information later during the import and 
+throw an exception if the `model` stream contains wrong data. For example, if devices have different 
+capabilities and a graph compiled for a particular device cannot be used for another, such type of 
+information must be stored and checked during the import. 
+- Compiled hardware-specific graph itself
+- Information about precisions and shapes set by the user
+
+@snippet src/template_plugin.cpp plugin:import_network_impl
+
+Create Instance of Plugin Class
+------------------------
+
+Inference Engine plugin library must export only one function creating a plugin instance:
+
+@snippet src/template_plugin.cpp plugin:create_plugin_engine
+
+Next step in a plugin library implementation is the [ExecutableNetwork](@ref executable_network) class.
--- a/docs/IE_PLUGIN_DG/PluginTesting.md
+++ b/docs/IE_PLUGIN_DG/PluginTesting.md
@@ -0,0 +1,40 @@
+# Plugin Testing {#plugin_testing}
+
+Inference Engine (IE) tests infrastructure provides a predefined set of functional tests and utilities exported via the Inference
+Engine developer package. They are used to verify a plugin using the Inference Engine public API.
+All the tests are written in the [Google Test C++ framework](https://github.com/google/googletest).
+
+To build test binaries together with other build artifacts, use the `make all` command. For details, see
+[Build Plugin Using CMake*](@ref plugin_build).
+
+Inference Engine Plugin tests are included in the `funcSharedTests` CMake target which is built within the  Deep Learning Deployment Toolkit (DLDT) repository
+(see [Build Plugin Using CMake](@ref plugin_build) guide).
+
+Test definitions:
+
+1. **Conformance tests**, which are a separate test group to check that a plugin satisfies basic Inference
+Engine concepts: plugin creation, multiple executable networks support, multiple synchronous and asynchronous inference requests support, and so on.
+2. **Other API tests**, which contain the following types of tests:
+    - Per-layer tests. Located in the `single_layer_tests`and `subgraph_tests` folders.
+    - Tests for integration with the `InferenceEngine::Core` class. Located in the the `ie_class` folder.
+    - Tests to check that IE common preprocessing works with your plugin. The `io_blob_tests` folder.
+
+To use these tests for your own plugin development, link the `funcSharedTests` library to your test binary and
+instantiate required test cases with desired parameters values.
+
+> **NOTE**: A plugin may contain its own tests for use cases that are specific to hardware or need to be extensively
+> tested. Depending on your device positioning, you can implement more specific tests for your device. Such tests can
+> be defined both for conformance and other API tests groups within your own test binary.
+
+How to Extend Inference Engine Plugin Tests
+========================
+
+Inference Engine Plugin tests are open for contribution.
+Add common test case definitions applicable for all plugins to the `funcSharedTests` target within the DLDT repository. Then, any other plugin supporting corresponding functionality can instantiate the new test.
+
+All Inference Engine per-layer tests check test layers functionality. They are developed using nGraph functions
+as input graphs used by tests. In this case, to test a new layer with layer tests, extend
+the `ngraphFunctions` CMake target, which is also included in the Inference Engine Developer package, with a new nGraph function
+including the corresponding operation.
+
+> **NOTE**: When implementing a new subgraph test, add new single-layer tests for each operation of the subgraph.
--- a/docs/IE_PLUGIN_DG/QuantizedNetworks.md
+++ b/docs/IE_PLUGIN_DG/QuantizedNetworks.md
@@ -0,0 +1,53 @@
+# Quantized networks compute and restrictions {#quantized_networks}
+
+One of the feature of Inference Engine is the support of quantized networks with different precisions: INT8, INT4, etc.
+However, it is up to the plugin to define what exact precisions are supported by the particular HW.
+All quantized networks which can be expressed in IR have a unified representation by means of *FakeQuantize* operation. 
+For more details about low-precision model representation please refer to this [document](LowPrecisionModelRepresentation.md).
+
+### Interpreting FakeQuantize at runtime
+During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
+- Independently based on the definition of *FakeQuantize* operation.
+- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
+such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](https://docs.openvinotoolkit.org/latest/_docs_IE_DG_Int8Inference.html). 
+
+Here we provide only a high-level overview of the interpretation rules of FakeQuantize. 
+At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**. 
+The former one is aimed to transform the input data into the target precision while the latter transforms the resulting values back to the original range and precision. 
+In practice *Dequantize* operations can be propagated forward through the linear operations, such as *Convolution* or *Fully-Connected*, 
+and in some cases fused with the following *Quantize* operation for the next layer into the so-called *Requantize* operation (see Fig. 1).
+
+![qdq_propagation]
+<div align="center">Figure 1. Quantization operations propagation at runtime. Q, DQ, RQ stand for Quantize, Dequantize, and Requantize correspondingly.</div>
+
+From the calculation standpoint, the FakeQuantize formula also is split into two parts accordingly:  
+`output = round((x - input_low) / (input_high - input_low) * (levels-1)) / (levels-1) * (output_high - output_low) + output_low`  
+The first part of this formula represents *Quantize* operation:  
+`q = round((x - input_low) / (input_high - input_low) * (levels-1))`  
+The second is responsible for the dequantization:  
+`r = q / (levels-1) * (output_high - output_low) + output_low`  
+From the scale/zero-point notation standpoint the latter formula can be written as follows:  
+`r = (output_high - output_low) / (levels-1) * (q + output_low / (output_high - output_low) * (levels-1))`  
+Thus we can define:
+- **Scale** as `(output_high - output_low) / (levels-1)`
+- **Zero-point** as `-output_low / (output_high - output_low) * (levels-1)`
+
+**Note**: During the quantization process the values `input_low`, `input_high`, `output_low`, `output_high` are selected so that to map a floating-point zero exactly to an integer value (zero-point) and vice versa.
+
+## Quantization specifics and restrictions
+In general, OpenVINO can represent and execute quantized models from different sources. However, the Post-training Optimization Toolkit (POT)
+is considered the default way to get optimized models. Since the POT supports HW-aware quantization it means that specific rules can be implemented in it for 
+the particular HW. However, it is reasonable to have compatibility with general-purpose HW such as CPU and GPU and support their quantization schemes.
+Below we define these rules as follows:
+- Support of mixed-precision models where some layers can be kept in the floating-point precision.
+- Per-channel quantization of weights of Convolutional and Fully-Connected layers.
+- Per-channel quantization of activations for channel-wise and element-wise operations, e.g. Depthwise Convolution, Eltwise Add/Mul, ScaleShift.
+- Symmetric and asymmetric quantization of weights and activations with the support of per-channel scales and zero-points.
+- Non-unified quantization parameters for Eltwise and Concat operations.  
+- Non-quantized network output, i.e. there are no quantization parameters for it.
+
+## Quantized model inference
+
+!!! Need details from the runtime team.
+
+[qdq_propagation]: ../images/qdq_propagation.png
--- a/docs/IE_PLUGIN_DG/images/qdq_propagation.png
+++ b/docs/IE_PLUGIN_DG/images/qdq_propagation.png
--- a/docs/IE_PLUGIN_DG/images/quantized_convolution.png
+++ b/docs/IE_PLUGIN_DG/images/quantized_convolution.png
--- a/docs/IE_PLUGIN_DG/images/quantized_model_example.png
+++ b/docs/IE_PLUGIN_DG/images/quantized_model_example.png
--- a/docs/IE_PLUGIN_DG/layout.xml
+++ b/docs/IE_PLUGIN_DG/layout.xml
@@ -0,0 +1,26 @@
+<doxygenlayout version="1.0">
+    <navindex>
+        <!-- Steps -->
+        <tab type="usergroup" url="index.html" visibile="yes" title="GUIDE">
+            <tab type="usergroup" url="index.html" title="Developer Guide for Inference Engine Plugin Library">
+                <tab type="user" url="@ref plugin" visibile="yes" title="Implement Plugin Functionality"/>
+                <tab type="user" url="@ref executable_network" visibile="yes" title="Implement Executable Network Functionality"/>
+                <tab type="user" url="@ref infer_request" visibile="yes" title="Implement Synchronous Inference Request"/>
+                <tab type="user" url="@ref async_infer_request" visibile="yes" title="Implement Asynchronous Inference Request"/>
+            </tab>
+        </tab>
+        <!-- Additional resources -->
+         <tab type="usergroup" visibile="no" title="DETAILED GUIDES">
+            <tab type="user" url="@ref plugin_build" visibile="yes" title="Build Your Plugin with CMake*"/>
+            <tab type="user" url="@ref plugin_testing" visibile="yes" title="Test Your Plugin"/>
+            <tab type="user" url="@ref quantized_networks" visibile="yes" title="Quantized networks guide"/>
+            <tab type="user" url="@ref new_ngraph_transformation" visibile="yes" title="Writing ngraph transformations"/>
+        </tab>
+        <!-- API References -->
+        <tab type="usergroup" title="API REFERENCE">
+            <!-- IE Developer Package -->
+            <tab type="modules" visible="yes" title="Inference Engine Plugin API Reference"/>
+        </tab>
+        <tab type="usergroup" title="MAIN OPENVINO™ DOCS" url="../index.html"/>
+    </navindex>
+</doxygenlayout>
--- a/inference-engine/src/transformations/include/transformations/init_node_info.hpp
+++ b/inference-engine/src/transformations/include/transformations/init_node_info.hpp
@@ -4,6 +4,11 @@

 #pragma once

+/**
+ * @brief Defines initialize node runtime information pass
+ * @file init_node_info.hpp
+ */
+
 #include <vector>
 #include <memory>

@@ -11,7 +16,33 @@

 #include <ngraph/pass/graph_rewrite.hpp>

+/**
+ * @defgroup ie_transformation_api Inference Engine Transformation API
+ * @brief Defines Inference Engine Transformations API which is used to transform ngraph::Function
+ *
+ * @{
+ * @defgroup ie_runtime_attr_api Runtime information
+ * @brief A machnism of runtime information extension
+ *
+ * @defgroup ie_transformation_common_api Common optimization passes
+ * @brief A set of common optimization passes
+ *
+ * @defgroup ie_transformation_to_opset1_api Conversion from opset2 to opset1
+ * @brief A set of conversion passes from opset2 to opset1
+
+ * @defgroup ie_transformation_to_opset2_api Conversion from opset3 to opset2
+ * @brief A set of conversion passes from opset3 to opset2
+ * @}
+ */
+
+/**
+ * @brief ngraph namespace
+ */
 namespace ngraph {
+
+/**
+ * @brief ngraph::passes namespace
+ */
 namespace pass {

 class TRANSFORMATIONS_API InitNodeInfo;
@@ -19,18 +50,21 @@ class TRANSFORMATIONS_API InitNodeInfo;
 }  // namespace pass
 }  // namespace ngraph

-/*
- * Description:
- *     InitNodeInfo transformation helps to set runtime info attributes in a single place.
- *     Every runtime info attribute that needs to be initialized should be registered
- *     in run_on_function method. Also do not forget to override init methods for registered
- *     attribute.
- *     This transformations should be called first in transformation pipeline. If attrbute was
- *     already set initialization will be skipped for this node.
+/**
+ * @ingroup ie_transformation_common_api
+ * @brief InitNodeInfo transformation helps to set runtime info attributes in a single place.
+ * 
+ * Every runtime info attribute that needs to be initialized should be registered
+ * in run_on_function method. Also do not forget to override init methods for registered
+ * attribute.
+ * This transformations should be called first in transformation pipeline. If attrbute was
+ * already set initialization will be skipped for this node.
 */
-
 class ngraph::pass::InitNodeInfo: public ngraph::pass::FunctionPass {
 public:
+	/**
+     * Constructor
+     */
    InitNodeInfo() : FunctionPass() {}

    bool run_on_function(std::shared_ptr<ngraph::Function> f) override;
--- a/inference-engine/src/transformations/include/transformations/rt_info/fused_names_attribute.hpp
+++ b/inference-engine/src/transformations/include/transformations/rt_info/fused_names_attribute.hpp
@@ -2,6 +2,11 @@
 // SPDX-License-Identifier: Apache-2.0
 //

+/**
+ * @brief Defines fused names attribute
+ * @file fused_names_attribute.hpp
+ */
+
 #include <assert.h>
 #include <functional>
 #include <memory>
@@ -15,24 +20,33 @@

 namespace ngraph {

-/*
- * Description:
- *     FusedName class represents runtime info attribute that stores
- *     all operation names that was fully or partially fused into node
+/**
+ * @ingroup ie_runtime_attr_api
+ * @brief FusedName class represents runtime info attribute that stores
+ * all operation names that was fully or partially fused into node
 */
-
 class TRANSFORMATIONS_API FusedNames {
 private:
    std::set<std::string> fused_names;

 public:
+    /**
+     * A default constructor
+     */
    FusedNames() = default;

+    /**
+     * @brief      Constructs a new object consisting of a single name     *
+     * @param[in]  name  The name
+     */
    explicit FusedNames(const std::string &name) {
        fused_names.insert(name);
    }

-    // This method unite current set of already fused names with another FusedNames object
+    /**
+     * @brief Unites current set of already fused names with another FusedNames object
+     * @param names[in] Another object to fuse with
+     */
    void fuseWith(const FusedNames &names);

    // return string with operation names separated by coma in alphabetical order