Fix license header in Movidius sources

doc: add Red Hat docker registry (#5184 ) (#5254 )
Feature/azaytsev/formula fixes (#4601 )
2021-06-02 21:20:19 +03:00 · 2021-04-15 19:34:14 +03:00 · 2021-03-04 11:48:35 +03:00 · 2021-03-02 14:37:37 +03:00 · 2021-02-11 22:56:48 +03:00 · 2021-02-01 23:05:40 +03:00
409 changed files with 11235 additions and 5601 deletions
--- a/.ci/azure/mac.yml
+++ b/.ci/azure/mac.yml
@@ -111,7 +111,7 @@ jobs:
    continueOnError: false

  - script: |
-      git clone https://github.com/openvinotoolkit/testdata.git
+      git clone --single-branch --branch releases/2021/2 https://github.com/openvinotoolkit/testdata.git
    workingDirectory: $(WORK_DIR)
    displayName: 'Clone testdata'

--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 # [OpenVINO™ Toolkit](https://01.org/openvinotoolkit) - Deep Learning Deployment Toolkit repository
-[![Stable release](https://img.shields.io/badge/version-2021.1-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2021.1)
+[![Stable release](https://img.shields.io/badge/version-2021.2-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2021.2)
 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)
 ![Azure DevOps builds (branch)](https://img.shields.io/azure-devops/build/openvinoci/b2bab62f-ab2f-4871-a538-86ea1be7d20f/9/master?label=Public%20CI)

--- a/docs/CMakeLists.txt
+++ b/docs/CMakeLists.txt
@@ -132,14 +132,6 @@ function(build_docs)
                       COMMAND ${Python3_EXECUTABLE} ${PYX_FILTER} ${PYTHON_API_OUT}
                       COMMENT "Pre-process Python API")

-    # Plugin API
-
-    add_custom_target(plugin_api
-                      COMMAND ${DOXYGEN_EXECUTABLE} ${PLUGIN_CONFIG_BINARY}
-                      WORKING_DIRECTORY ${DOCS_BINARY_DIR}
-                      COMMENT "Generating Plugin API Reference"
-                      VERBATIM)
-
    # Preprocess docs

    add_custom_target(preprocess_docs
@@ -165,6 +157,15 @@ function(build_docs)
                      WORKING_DIRECTORY ${DOCS_BINARY_DIR}
                      VERBATIM)

+    # Plugin API
+
+    add_custom_target(plugin_api
+                      DEPENDS ie_docs
+                      COMMAND ${DOXYGEN_EXECUTABLE} ${PLUGIN_CONFIG_BINARY}
+                      WORKING_DIRECTORY ${DOCS_BINARY_DIR}
+                      COMMENT "Generating Plugin API Reference"
+                      VERBATIM)
+
    # Umbrella OpenVINO target

    add_custom_target(openvino_docs
--- a/docs/HOWTO/Custom_Layers_Guide.md
+++ b/docs/HOWTO/Custom_Layers_Guide.md
@@ -1,212 +1,382 @@
-# Custom Layers Guide {#openvino_docs_HOWTO_Custom_Layers_Guide}
+# Custom Operations Guide {#openvino_docs_HOWTO_Custom_Layers_Guide}

-The Intel® Distribution of OpenVINO™ toolkit supports neural network model layers in multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX*. The list of known layers is different for each of the supported frameworks. To see the layers supported by your framework, refer to [supported frameworks](../MO_DG/prepare_model/Supported_Frameworks_Layers.md).
+The Intel® Distribution of OpenVINO™ toolkit supports neural network models trained with multiple frameworks including
+TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX* file format. The list of supported operations (layers) is different for
+each of the supported frameworks. To see the operations supported by your framework, refer to
+[Supported Framework Layers](../MO_DG/prepare_model/Supported_Frameworks_Layers.md).

-Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom.
+Custom operations are operations that are not included in the list of known operations. If your model contains any
+operation that is not in the list of known operations, the Model Optimizer is not able to generate an Intermediate
+Representation (IR) for this model.

-This guide illustrates the workflow for running inference on topologies featuring custom layers, allowing you to plug in your own implementation for existing or completely new layers.
-For a step-by-step example of creating and executing a custom layer, see the [Custom Layer Implementation Tutorials for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0)
+This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to
+plug in your own implementation for existing or completely new operation.

-## Terms used in this guide
+> **NOTE:** *Layer* — The legacy term for an *operation* which came from Caffe\* framework. Currently it is not used.
+> Refer to the [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md)
+> for more information on the topic.

- *Layer* — The abstract concept of a math function that is selected for a specific purpose (relu, sigmoid, tanh, convolutional).  This is one of a sequential series of building blocks within the neural network.
- *Kernel* — The implementation of a layer function, in this case, the math programmed (in C++ and Python) to perform the layer operation for target hardware (CPU or GPU).
- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing topology, layer parameters and weights.
-The original format will be a supported framework such as TensorFlow, Caffe, or MXNet.
+## Terms Used in This Guide

- *Model Extension Generator* — Generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. 
+- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the
+  different frameworks and describing the model topology, operations parameters and weights.
+
+- *Operation* — The abstract concept of a math function that is selected for a specific purpose. Operations supported by
+  OpenVINO™ are listed in the supported operation set provided in the [Available Operations Sets](../ops/opset.md).
+  Examples of the operations are: [ReLU](../ops/activation/ReLU_1.md), [Convolution](../ops/convolution/Convolution_1.md),
+  [Add](../ops/arithmetic/Add_1.md), etc.
+
+- *Kernel* — The implementation of a operation function in the OpenVINO™ plugin, in this case, the math programmed (in
+  C++ and OpenCL) to perform the operation for a target hardware (CPU or GPU).
+
+- *Inference Engine Extension* — Device-specific module implementing custom operations (a set of kernels).
+
+## Custom Operation Support Overview
+
+There are three steps to support inference of a model with custom operation(s):
+1. Add support for a custom operation in the [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) so
+the Model Optimizer can generate the IR with the operation.
+2. Create an operation set and implement a custom nGraph operation in it as described in the
+[Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md).
+3. Implement a customer operation in one of the [Inference Engine](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md)
+plugins to support inference of this operation using a particular target hardware (CPU, GPU or VPU).
+
+To see the operations that are supported by each device plugin for the Inference Engine, refer to the
+[Supported Devices](../IE_DG/supported_plugins/Supported_Devices.md).
+
+> **NOTE:** If a device doesn't support a particular operation, an alternative to creating a new operation is to target
+> an additional device using the HETERO plugin. The [Heterogeneous Plugin](../IE_DG/supported_plugins/HETERO.md) may be
+> used to run an inference model on multiple devices allowing the unsupported operations on one device to "fallback" to
+> run on another device (e.g., CPU) that does support those operations.
+
+### Custom Operation Support for the Model Optimizer
+
+Model Optimizer model conversion pipeline is described in details in "Model Conversion Pipeline" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
+It is recommended to read that article first for a better understanding of the following material.
+
+Model Optimizer provides extensions mechanism to support new operations and implement custom model transformations to
+generate optimized IR. This mechanism is described in the "Model Optimizer Extensions" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
+
+Two types of the Model Optimizer extensions should be implemented to support custom operation at minimum:
+1. Operation class for a new operation. This class stores information about the operation, its attributes, shape
+inference function, attributes to be saved to an IR and some others internally used attributes. Refer to the
+"Model Optimizer Operation" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
+detailed instruction on how to implement it.
+2. Operation attributes extractor. The extractor is responsible for parsing framework-specific representation of the
+operation and uses corresponding operation class to update graph node attributes with necessary attributes of the
+operation. Refer to the "Operation Extractor" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
+detailed instruction on how to implement it.
+
+> **NOTE:** In some cases you may need to implement some transformation to support the operation. This topic is covered
+> in the "Graph Transformation Extensions" section on the
+> [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
+
+## Custom Operations Extensions for the Inference Engine
+
+Inference Engine provides extensions mechanism to support new operations. This mechanism is described in the
+[Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md).
+
+Each device plugin includes a library of optimized implementations to execute known operations which must be extended to
+execute a custom operation. The custom operation extension is implemented according to the target device:
+
+- Custom Operation CPU Extension
+   - A compiled shared library (`.so`, `.dylib` or `.dll`) needed by the CPU Plugin for executing the custom operation
+   on a CPU. Refer to the [How to Implement Custom CPU Operations](../IE_DG/Extensibility_DG/CPU_Kernel.md) for more
+   details.
+- Custom Operation GPU Extension
+   - OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the GPU along with a
+   operation description file (.xml) needed by the GPU Plugin for the custom operation kernel. Refer to the
+   [How to Implement Custom GPU Operations](../IE_DG/Extensibility_DG/GPU_Kernel.md) for more details.
+- Custom Operation VPU Extension
+   - OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the VPU along with a
+   operation description file (.xml) needed by the VPU Plugin for the custom operation kernel. Refer to the
+   [How to Implement Custom Operations for VPU](../IE_DG/Extensibility_DG/VPU_Kernel.md) for more details.
+
+Also, it is necessary to implement nGraph custom operation according to the
+[Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md) so the Inference Engine can read an IR with this
+operation and correctly infer output tensors shape and type.
+
+## Enabling Magnetic Resonance Image Reconstruction Model
+This chapter provides a step-by-step instruction on how to enable the magnetic resonance image reconstruction model
+implemented in the [repository](https://github.com/rmsouza01/Hybrid-CS-Model-MRI/) using a custom operation on CPU. The
+example is prepared for a model generated from the repository with hash `2ede2f96161ce70dcdc922371fe6b6b254aafcc8`.
+
+### Download and Convert the Model to a Frozen TensorFlow\* Model Format
+The original pre-trained model is provided in the hdf5 format which is not supported by OpenVINO directly and needs to
+be converted to TensorFlow\* frozen model format first.
+
+1. Download repository `https://github.com/rmsouza01/Hybrid-CS-Model-MRI`:<br
+```bash
+    git clone https://github.com/rmsouza01/Hybrid-CS-Model-MRI
+    git checkout 2ede2f96161ce70dcdc922371fe6b6b254aafcc8
+```
+
+2. Convert pre-trained `.hdf5` to a frozen `.pb` graph using the following script (tested with TensorFlow==1.15.0 and
+Keras==2.2.4) which should be executed from the root of the cloned repository:<br>
+```py
+    import keras as K
+    import numpy as np
+    import Modules.frequency_spatial_network as fsnet
+    import tensorflow as tf
+
+    under_rate = '20'
+
+    stats = np.load("Data/stats_fs_unet_norm_" + under_rate + ".npy")
+    var_sampling_mask = np.load("Data/sampling_mask_" + under_rate + "perc.npy")
+
+    model = fsnet.wnet(stats[0], stats[1], stats[2], stats[3], kshape = (5,5), kshape2=(3,3))
+    model_name = "Models/wnet_" + under_rate + ".hdf5"
+    model.load_weights(model_name)
+
+    inp = np.random.standard_normal([1, 256, 256, 2]).astype(np.float32)
+    np.save('inp', inp)
+
+    sess = K.backend.get_session()
+    sess.as_default()
+    graph_def = sess.graph.as_graph_def()
+    graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, ['conv2d_44/BiasAdd'])
+    with tf.gfile.FastGFile('wnet_20.pb', 'wb') as f:
+        f.write(graph_def.SerializeToString())    
+```
   
- *Inference Engine Extension* — Device-specific module implementing custom layers (a set of kernels).
+As a result the TensorFlow\* frozen model file "wnet_20.pb" is generated.

+### Convert the Frozen TensorFlow\* Model to Intermediate Representation

-## Custom Layer Overview
-
-The [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) searches the list of known layers for each layer contained in the input model topology before building the model's internal representation, optimizing the model, and producing the Intermediate Representation files.  
-
-The [Inference Engine](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) loads the layers from the input model IR files into the specified device plugin, which will search a list of known layer implementations for the device.  If your topology contains layers that are not in the list of known layers for the device, the Inference Engine considers the layer to be unsupported and reports an error.  To see the layers that are supported by each device plugin for the Inference Engine, refer to the [Supported Devices](../IE_DG/supported_plugins/Supported_Devices.md) documentation.  
-<br>
-> **NOTE:** If a device doesn't support a particular layer, an alternative to creating a new custom layer is to target an additional device using the HETERO plugin. The [Heterogeneous Plugin](../IE_DG/supported_plugins/HETERO.md) may be used to run an inference model on multiple devices allowing the unsupported layers on one device to "fallback" to run on another device (e.g., CPU) that does support those layers.
-
-## Custom Layer Implementation Workflow
-
-When implementing a custom layer for your pre-trained model in the Intel® Distribution of OpenVINO™ toolkit, you will need to add extensions to both the Model Optimizer and the Inference Engine. 
-
-## Custom Layer Extensions for the Model Optimizer 
-
-The following figure shows the basic processing steps for the Model Optimizer highlighting the two necessary custom layer extensions, the Custom Layer Extractor and the Custom Layer Operation.
-
-![](img/MO_extensions_flow.png)
-
-
-The Model Optimizer first extracts information from the input model which includes the topology of the model layers along with parameters, input and output format, etc., for each layer.  The model is then optimized from the various known characteristics of the layers, interconnects, and data flow which partly comes from the layer operation providing details including the shape of the output for each layer.  Finally, the optimized model is output to the model IR files needed by the Inference Engine to run the model.  
-
-The Model Optimizer starts with a library of known extractors and operations for each [supported model framework](../MO_DG/prepare_model/Supported_Frameworks_Layers.md) which must be extended to use each unknown custom layer.  The custom layer extensions needed by the Model Optimizer are:
-
- Custom Layer Extractor
-   - Responsible for identifying the custom layer operation and extracting the parameters for each instance of the custom layer.  The layer parameters are stored per instance and used by the layer operation before finally appearing in the output IR.  Typically the input layer parameters are unchanged, which is the case covered by this tutorial. 
- Custom Layer Operation
-   - Responsible for specifying the attributes that are supported by the custom layer and computing the output shape for each instance of the custom layer from its parameters. <br> The `--mo-op` command-line argument shown in the examples below generates a custom layer operation for the Model Optimizer. 
-
-## Custom Layer Extensions for the Inference Engine 
-
-The following figure shows the basic flow for the Inference Engine highlighting two custom layer extensions for the CPU and GPU Plugins, the Custom Layer CPU extension and the Custom Layer GPU Extension.
-
-![](img/IE_extensions_flow.png)
-
-Each device plugin includes a library of optimized implementations to execute known layer operations which must be extended to execute a custom layer.  The custom layer extension is implemented according to the target device:
-
- Custom Layer CPU Extension
-   - A compiled shared library (.so or .dll binary) needed by the CPU Plugin for executing the custom layer on the CPU.  
- Custom Layer GPU Extension
-   - OpenCL source code (.cl) for the custom layer kernel that will be compiled to execute on the GPU along with a layer description file (.xml) needed by the GPU Plugin for the custom layer kernel. 
-
-## Model Extension Generator
-
-Using answers to interactive questions or a *.json* configuration file, the Model Extension Generator tool generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine.  To complete the implementation of each extension, the template functions may need to be edited to fill-in details specific to the custom layer or the actual custom layer functionality itself.
-
-### Command-line
-
-The Model Extension Generator is included in the Intel® Distribution of OpenVINO™ toolkit installation and is run using the command (here with the "--help" option):
+Firstly, open the model in the TensorBoard or other TensorFlow* model visualization tool. The model supports dynamic
+batch dimension because the value for the batch dimension is not hardcoded in the model. Model Optimizer need to set all
+dynamic dimensions to some specific value to create the IR, therefore specify the command line parameter `-b 1` to set
+the batch dimension equal to 1. The actual batch size dimension can be changed at runtime using the Inference Engine API
+described in the [Using Shape Inference](../IE_DG/ShapeInference.md). Also refer to
+[Converting a Model Using General Conversion Parameters](../MO_DG/prepare_model/convert_model/Converting_Model_General.md)
+and [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md)
+for more details and command line parameters used for the model conversion.

 ```bash
-python3 /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py new --help
+./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1
 ```

-where the output will appear similar to:
-
-```
-usage: You can use any combination of the following arguments:
-
-Arguments to configure extension generation in the interactive mode:
-
-optional arguments:
-  -h, --help            show this help message and exit
-  --mo-caffe-ext        generate a Model Optimizer Caffe* extractor
-  --mo-mxnet-ext        generate a Model Optimizer MXNet* extractor
-  --mo-tf-ext           generate a Model Optimizer TensorFlow* extractor
-  --mo-op               generate a Model Optimizer operation
-  --ie-cpu-ext          generate an Inference Engine CPU extension
-  --ie-gpu-ext          generate an Inference Engine GPU extension
-  --output_dir OUTPUT_DIR
-                        set an output directory. If not specified, the current
-                        directory is used by default.
+Model Optimizer produces the following error:
+```bash
+[ ERROR ]  List of operations that cannot be converted to Inference Engine IR:
+[ ERROR ]      Complex (1)
+[ ERROR ]          lambda_2/Complex
+[ ERROR ]      IFFT2D (1)
+[ ERROR ]          lambda_2/IFFT2D
+[ ERROR ]      ComplexAbs (1)
+[ ERROR ]          lambda_2/Abs
+[ ERROR ]  Part of the nodes was not converted to IR. Stopped.
 ```

-The available command-line arguments are used to specify which extension(s) to generate templates for the Model Optimizer or Inference Engine.  The generated extension files for each argument will appear starting from the top of the output directory as follows:
+The error means that the Model Optimizer doesn't know how to handle 3 types of TensorFlow\* operations: "Complex",
+"IFFT2D" and "ComplexAbs". In order to see more details about the conversion process run the model conversion with
+additional parameter `--log_level DEBUG`. It is worth to mention the following lines from the detailed output:

-Command-line Argument | Output Directory Location      |
--------------------- | ------------------------------ |
-`--mo-caffe-ext`        | user_mo_extensions/front/caffe |
-`--mo-mxnet-ext`       | user_mo_extensions/front/mxnet |
-`--mo-tf-ext`           | user_mo_extensions/front/tf    |
-`--mo-op`               | user_mo_extensions/ops         |
-`--ie-cpu-ext`          | user_ie_extensions/cpu         |
-`--ie-gpu-ext`          | user_ie_extensions/gpu         |
+```bash
+[ INFO ]  Called "tf_native_tf_node_infer" for node "lambda_2/Complex"
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ]  Added placeholder with name 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ]  Added placeholder with name 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ]  update_input_in_pbs: replace input 'lambda_2/lambda_3/strided_slice' with input 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ]  Replacing input '0' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ]  update_input_in_pbs: replace input 'lambda_2/lambda_4/strided_slice' with input 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ]  Replacing input '1' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
+[ <TIMESTAMP> ] [ DEBUG ] [ tf:148 ]  Inferred shape of the output tensor with index '0' of the node 'lambda_2/Complex': '[  1 256 256]'
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:145 ]  Outputs:
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ]  output[0]: shape = [  1 256 256], value = <UNKNOWN>
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:129 ]  --------------------
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:130 ]  Partial infer for lambda_2/IFFT2D
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:131 ]  Op: IFFT2D
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:132 ]  Inputs:
+[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ]  input[0]: shape = [  1 256 256], value = <UNKNOWN>
+```

-### Extension Workflow
+This is a part of the log of the partial inference phase of the model conversion. See the "Partial Inference" section on
+the [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for
+more information about this phase. Model Optimizer inferred output shape for the unknown operation of type "Complex"
+using a "fallback" to TensorFlow\*. However, it is not enough to generate the IR because Model Optimizer doesn't know
+which  attributes of the operation should be saved to IR. So it is necessary to implement Model Optimizer extensions to
+support these operations.

-The workflow for each generated extension follows the same basic steps:
+Before going into the extension development it is necessary to understand what these unsupported operations do according
+to the TensorFlow\* framework specification.

-![](img/MEG_generic_flow.png)
+* "Complex" - returns a tensor of complex type constructed from two real input tensors specifying real and imaginary
+part of a complex number.
+* "IFFT2D" - returns a tensor with inverse 2-dimensional discrete Fourier transform over the inner-most 2 dimensions of
+ an input.
+* "ComplexAbs" - returns a tensor with absolute values of input tensor with complex numbers.

-**Step 1: Generate:** Use the Model Extension Generator to generate the Custom Layer Template Files. 
+The part of the model with all three unsupported operations is depicted below:

-**Step 2:  Edit:** Edit the Custom Layer Template Files as necessary to create the specialized Custom Layer Extension Source Code.
+![Unsupported sub-graph](img/unsupported_subgraph.png)

-**Step 3:  Specify:** Specify the custom layer extension locations to be used by the Model Optimizer or Inference Engine.
+This model uses complex numbers during the inference but Inference Engine does not support tensors of this data type. So
+it is necessary to find a way how to avoid using tensors of such a type in the model. Fortunately, the complex tensor
+appear as a result of "Complex" operation, is used as input in the "IFFT2D" operation then is passed to "ComplexAbs"
+which produces real value tensor as output. So there are just 3 operations consuming/producing complex tensors in the
+model.

-## Caffe\* Models with Custom Layers <a name="caffe-models-with-custom-layers"></a>
+Let's design an OpenVINO operation "FFT" which get a single real number tensor describing the complex number and
+produces a single real number tensor describing output complex tensor. This way the fact that the model uses complex
+numbers is hidden inside the "FFT" operation implementation. The operation gets a tensor of shape `[N, H, W, 2]` and
+produces the output tensor with the same shape, where the innermost dimension contains pairs of real numbers describing
+the complex number (its real and imaginary part). As we will see further this operation will allow us to support the
+model. The implementation of the Model Optimizer operation should be saved to `mo_extensions/ops/FFT.py` file:

-If your Caffe\* model has custom layers:
+@snippet FFT.py fft:operation

-**Register the custom layers as extensions to the Model Optimizer**. For instructions, see [Extending Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You will need a bit of Python\* code that lets the Model Optimizer;
+The attribute `inverse` is a flag specifying type of the FFT to apply: forward or inverse.

- Generate a valid Intermediate Representation according to the rules you specified.
- Be independent from the availability of Caffe on your computer.
-	
-If your model contains Custom Layers, it is important to understand the internal workflow of the Model Optimizer. Consider the following example.
+See the "Model Optimizer Operation" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
+detailed instruction on how to implement the operation.

-**Example**:
+Now it is necessary to implement extractor for the "IFFT2D" operation according to the
+"Operation Extractor" section on the 
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md). The
+following snippet provides two extractors: one for "IFFT2D", another one for "FFT2D", however only on of  them is used
+in this example. The implementation should be saved to the file `mo_extensions/front/tf/FFT_ext.py`.

-The network has:
+@snippet FFT_ext.py fft_ext:extractor

-*   One input layer (#1)
-*   One output Layer (#5)
-*   Three internal layers (#2, 3, 4)
+> **NOTE:** The graph is in inconsistent state after extracting node attributes because according to original operation
+> "IFFT2D" semantic it should have an input consuming a tensor of complex numbers, but the extractor instantiated an
+> operation "FFT" which expects a real tensor with specific layout. But the inconsistency will be resolved during
+> applying front phase transformations discussed below.

-The custom and standard layer types are:
+The output shape of the operation "AddV2" from the picture above is `[N, H, W, 2]`. Where the innermost dimension
+contains pairs of real numbers describing the complex number (its real and imaginary part). The following "StridedSlice"
+operations split the input tensor into 2 parts to get a tensor of real and a tensor of imaginary parts which are then
+consumed with the "Complex" operation to produce a tensor of complex numbers. These "StridedSlice" and "Complex"
+operations can be removed so the "FFT" operation will get a real value tensor encoding complex numbers. To achieve this
+we implement the front phase transformation which searches for a pattern of two "StridedSlice" operations with specific
+attributes producing data to "Complex" operation and removes it from the graph. Refer to the
+"Pattern-Defined Front Phase Transformations" section on the
+[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for more
+information on how this type of transformation works. The code snippet should be saved to the file
+`mo_extensions/front/tf/Complex.py`.

-*   Layers #2 and #5 are implemented as Model Optimizer extensions.
-*   Layers #1 and #4 are supported in Model Optimizer out-of-the box.
-*   Layer #3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml.
+@snippet Complex.py complex:transformation

-> **NOTE**: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in [Model Optimizer FAQ](../MO_DG/prepare_model/Model_Optimizer_FAQ.md).
+> **NOTE:** The graph is in inconsistent state because the "ComplexAbs" operation consumes complex value tensor but
+>  "FFT" produces real value tensor.

-The general process is as shown:
+Now lets implement a transformation which replace a "ComplexAbs" operation with a sub-graph of primitive operations
+which calculate the result using the following formulae: \f$module(z) = \sqrt{real(z) \cdot real(z) + imag(z) \cdot imag(z)}\f$.
+Original "IFFT2D" operation produces tensor of complex values, but the "FFT" operation produces a real value tensor with
+the same format and shape as the input for the operation. So the input shape for the "ComplexAbs" will be `[N, H, W, 2]`
+with the innermost dimension containing tuple with real and imaginary part of a complex number. In order to calculate
+absolute values for the complex tensor we do the following:
+1. Raise all elements in the power of 2.
+2. Calculate a reduced sum over the innermost dimension.
+3. Calculate a square root.

-![Example custom layer network](img/mo_caffe_priorities.png)
-<br>
+The implementation should be saved to the file `mo_extensions/front/tf/ComplexAbs.py` and provided below:

-**Step 1:**  The example model is fed to the Model Optimizer that **loads the model** with the special parser built on top of the `caffe.proto` file. In case of failure, the Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to the Model Optimizer, <a href="MO_FAQ.html#FAQ1">FAQ #1</a>.
+@snippet ComplexAbs.py complex_abs:transformation

-**Step 2:**  The Model Optimizer **extracts the attributes of all layers** by going through the list of layers and attempting to find the appropriate extractor. In order of priority, the Model Optimizer checks if the layer is:
-    
-*   A.  Registered as a Model Optimizer extension
-*   B.  Registered as a standard Model Optimizer layer
-    
-When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules:
-    
-*   For A. - takes only the parameters specified in the extension
-*   For B. - takes only the parameters specified in the standard extractor
-<br> 
+Now it is possible to convert the model using the following command line:
+```bash
+./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1 --extensions mo_extensions/
+```

-**Step 3:**  The Model Optimizer **calculates the output shape of all layers**. The logic is the same as it is for the priorities. **Important:** the Model Optimizer always takes the first available option.
+The sub-graph corresponding to the originally non-supported one is depicted on the image below:

-**Step 4:**  The Model Optimizer **optimizes the original model and produces the two Intermediate Representation (IR) files in .xml and .bin**.
-<br>
+![Converted sub-graph](img/converted_subgraph.png)

-## TensorFlow\* Models with Custom Layers <a name="Tensorflow-models-with-custom-layers"></a>
+> **NOTE:** Model Optimizer performed conversion of the model from NHWC to NCHW layout that is why the dimension with
+> the value 2 moved to another position.

-You have two options for TensorFlow\* models with custom layers:
-<br>
+### Inference Engine Extension Implementation
+Now it is necessary to implement the extension for the CPU plugin with operation "FFT" introduced previously. The code
+below is based on the template extension described on the
+[Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md).

-*   **Register those layers as extensions to the Model Optimizer.** In this case, the Model Optimizer generates a valid and optimized Intermediate Representation.
-*   **If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option.** This feature is helpful for many TensorFlow models. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md).
-	
-## MXNet\* Models with Custom Layers <a name="mxnet-models-with-custom-layers"></a>
+#### CMake Build File
+The first step is to create a CMake configuration file which builds the extension. The content of the "CMakeLists.txt"
+file is the following:

-There are two options to convert your MXNet* model that contains custom layers:
+@snippet ie_cpu_extension/CMakeLists.txt fft_cmake_list:cmake

-1.  Register the custom layers as extensions to the Model Optimizer. For instructions, see [Extending MXNet Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op `Custom` and layers which are not standard MXNet layers.
+The CPU FFT kernel implementation uses OpenCV to perform the FFT that is why the extension library is linked with
+"opencv_core" which comes with the OpenVINO.

-2.  If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. In MXNet the function is actively used for ssd models provides an opportunity to  for the necessary subgraph sequences and replace them. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md).
+#### Custom nGraph Operation "FFT" Implementation
+The next step is to create the nGraph operation FFT. The header file "fft_op.hpp" has the following content:

-## Kaldi\* Models with Custom Layers <a name="Kaldi-models-with-custom-layers"></a>
-For information on converting your Kaldi* model containing custom layers see [Converting a Kaldi Model in the Model Optimizer Developer Guide](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md). 
+@snippet fft_op.hpp fft_op:header

-## ONNX\* Models with Custom Layers <a name="ONNX-models-with-custom-layers"></a>
-For information on converting your ONNX* model containing custom layers see [Converting an ONNX Model in the Model Optimizer Developer Guide](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md). 
+The operation has just one boolean attribute `inverse`. Implementation of the necessary nGraph operation functions are
+in the "fft_op.cpp" file with the following content:

-## Step-by-Step Custom Layers Tutorial
-For a step-by-step walk-through creating and executing a custom layer, see [Custom Layer Implementation Tutorial for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0) 
+@snippet fft_op.cpp fft_op:implementation
+
+Refer to the [Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md) for more details.
+
+#### CPU FFT Kernel Implementation
+The operation implementation for CPU plugin uses OpenCV to perform the FFT. The header file "fft_kernel.hpp" has the
+following content:
+
+@snippet fft_kernel.hpp fft_kernel:header
+
+The "fft_kernel.cpp" with the implementation of the CPU has the following content:
+
+@snippet fft_kernel.cpp fft_kernel:implementation
+
+Refer to the [How to Implement Custom CPU Operations](../IE_DG/Extensibility_DG/CPU_Kernel.md) for more details.
+
+#### Extension Implementation
+The source code of the extension itself contains the "extension.hpp" and "extension.cpp" files.
+
+**extension.hpp**:
+
+@snippet ie_cpu_extension/extension.hpp fft_extension:header
+
+**extension.cpp**:
+
+@snippet ie_cpu_extension/extension.cpp fft_extension:implementation
+
+### Building and Running the Custom Extension
+In order to build the extension run the following:<br>
+```bash
+mkdir build && cd build
+source /opt/intel/openvino/bin/setupvars.sh
+cmake .. -DCMAKE_BUILD_TYPE=Release
+make --jobs=$(nproc)
+```
+
+The result of this command is a compiled shared library (`.so`, `.dylib` or `.dll`). It should be loaded in the
+application using `Core` class instance method `AddExtension` like this
+`core.AddExtension(make_so_pointer<IExtension>(compiled_library_file_name), "CPU");`.
+
+To test that the extension is implemented correctly we can run the [Benchmark App](../../inference-engine/tools/benchmark_tool/README.md)
+the following way:
+```bash
+python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/benchmark_tool/benchmark_app.py \
+        -m <PATH_TO_IR>/wnet_20.xml \
+        -l <PATH_TO_BUILD_DIR>/libfft_cpu_extension.so \
+        -d CPU
+```

 ## Additional Resources

 - Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit)
 - OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org)
 - [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
+- [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
 - [Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md)
 - [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md)
 - [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index)
- [Inference Engine Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic)
 - For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit).

 ## Converting Models:

 - [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md)
+- [Convert Your Kaldi* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md)
 - [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md)
 - [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md)
 - [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md)
-
-
-
--- a/docs/HOWTO/ie_cpu_extension/CMakeLists.txt
+++ b/docs/HOWTO/ie_cpu_extension/CMakeLists.txt
@@ -0,0 +1,36 @@
+#
+# Copyright (C) 2018-2019 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+# ===============================================================================
+# Generated file for building library with user generated CPU extensions
+#
+# Contains implementation of the basic layer methods
+#
+# Refer to the section "Adding Your Own Kernels to the Inference Engine" in
+# OpenVINO* documentation (either online or offline in
+# <INSTALL_DIR>/deployment_tools/documentation/docs/index.html an then navigate
+# to the corresponding section).
+# ===============================================================================
+
+# [fft_cmake_list:cmake]
+set(CPU_EXTENSIONS_BASE_DIR ${CMAKE_CURRENT_SOURCE_DIR} CACHE INTERNAL "")
+
+set(CMAKE_CXX_STANDARD 11)
+
+find_package(ngraph REQUIRED OPTIONAL_COMPONENTS onnx_importer)
+find_package(InferenceEngine REQUIRED)
+find_package(OpenCV REQUIRED COMPONENTS core)
+
+set(TARGET_NAME fft_cpu_extension)
+
+file(GLOB SRC ${CPU_EXTENSIONS_BASE_DIR}/*.cpp)
+
+add_library(${TARGET_NAME} SHARED ${SRC})
+
+target_compile_definitions(${TARGET_NAME} PRIVATE IMPLEMENT_INFERENCE_EXTENSION_API)
+target_link_libraries(${TARGET_NAME} PRIVATE ${InferenceEngine_LIBRARIES}
+                                             ${NGRAPH_LIBRARIES}
+                                             opencv_core)
+# [fft_cmake_list:cmake]
+
--- a/docs/HOWTO/ie_cpu_extension/extension.cpp
+++ b/docs/HOWTO/ie_cpu_extension/extension.cpp
@@ -0,0 +1,67 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
+
+//! [fft_extension:implementation]
+#include "extension.hpp"
+#include "fft_kernel.hpp"
+#include "fft_op.hpp"
+#include <ngraph/factory.hpp>
+#include <ngraph/opsets/opset.hpp>
+
+#include <map>
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+using namespace FFTExtension;
+
+void Extension::GetVersion(const InferenceEngine::Version *&versionInfo) const noexcept {
+    static InferenceEngine::Version ExtensionDescription = {
+        {1, 0},           // extension API version
+        "1.0",
+        "The CPU plugin extension with FFT operation"    // extension description message
+    };
+
+    versionInfo = &ExtensionDescription;
+}
+
+std::map<std::string, ngraph::OpSet> Extension::getOpSets() {
+    std::map<std::string, ngraph::OpSet> opsets;
+    ngraph::OpSet opset;
+    opset.insert<FFTOp>();
+    opsets["fft_extension"] = opset;
+    return opsets;
+}
+
+std::vector<std::string> Extension::getImplTypes(const std::shared_ptr<ngraph::Node> &node) {
+    if (std::dynamic_pointer_cast<FFTOp>(node)) {
+        return {"CPU"};
+    }
+    return {};
+}
+
+InferenceEngine::ILayerImpl::Ptr Extension::getImplementation(const std::shared_ptr<ngraph::Node> &node, const std::string &implType) {
+    if (std::dynamic_pointer_cast<FFTOp>(node) && implType == "CPU") {
+        return std::make_shared<FFTImpl>(node);
+    }
+    return nullptr;
+}
+
+INFERENCE_EXTENSION_API(InferenceEngine::StatusCode) InferenceEngine::CreateExtension(InferenceEngine::IExtension *&ext,
+                                                                                      InferenceEngine::ResponseDesc *resp) noexcept {
+    try {
+        ext = new Extension();
+        return OK;
+    } catch (std::exception &ex) {
+        if (resp) {
+            std::string err = ((std::string) "Couldn't create extension: ") + ex.what();
+            err.copy(resp->msg, 255);
+        }
+        return InferenceEngine::GENERAL_ERROR;
+    }
+}
+//! [fft_extension:implementation]
+
--- a/docs/HOWTO/ie_cpu_extension/extension.hpp
+++ b/docs/HOWTO/ie_cpu_extension/extension.hpp
@@ -0,0 +1,32 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
+
+//! [fft_extension:header]
+#pragma once
+
+#include <ie_iextension.h>
+#include <ie_api.h>
+#include <ngraph/ngraph.hpp>
+#include <memory>
+#include <vector>
+#include <string>
+#include <map>
+
+namespace FFTExtension {
+
+class Extension : public InferenceEngine::IExtension {
+public:
+    Extension() = default;
+    void GetVersion(const InferenceEngine::Version*& versionInfo) const noexcept override;
+    void Unload() noexcept override {}
+    void Release() noexcept override { delete this; }
+
+    std::map<std::string, ngraph::OpSet> getOpSets() override;
+    std::vector<std::string> getImplTypes(const std::shared_ptr<ngraph::Node>& node) override;
+    InferenceEngine::ILayerImpl::Ptr getImplementation(const std::shared_ptr<ngraph::Node>& node, const std::string& implType) override;
+};
+
+}
+//! [fft_extension:header]
--- a/docs/HOWTO/ie_cpu_extension/fft_kernel.cpp
+++ b/docs/HOWTO/ie_cpu_extension/fft_kernel.cpp
@@ -0,0 +1,119 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+//! [fft_kernel:implementation]
+#include "fft_kernel.hpp"
+#include "fft_op.hpp"
+#include <details/ie_exception.hpp>
+#include <ie_layouts.h>
+
+#include <opencv2/opencv.hpp>
+
+using namespace FFTExtension;
+
+FFTImpl::FFTImpl(const std::shared_ptr<ngraph::Node> &node) {
+    auto castedNode = std::dynamic_pointer_cast<FFTOp>(node);
+    if (!castedNode)
+        THROW_IE_EXCEPTION << "Cannot create implementation for unknown operation!";
+    if (castedNode->inputs().size() != 1 || castedNode->outputs().size() != 1)
+        THROW_IE_EXCEPTION << "Cannot create implementation for operation with incorrect number of inputs or outputs!";
+    if (castedNode->get_input_partial_shape(0).is_dynamic() || castedNode->get_output_partial_shape(0).is_dynamic())
+        THROW_IE_EXCEPTION << "Cannot create implementation for op with dynamic shapes!";
+    if (castedNode->get_input_element_type(0) != ngraph::element::f32 || castedNode->get_output_element_type(0) != ngraph::element::f32)
+        THROW_IE_EXCEPTION << "Operation supports only FP32 tensors.";
+    inpShape = castedNode->get_input_shape(0);
+    outShape = castedNode->get_output_shape(0);
+    inverse = castedNode->inverse;
+}
+
+InferenceEngine::StatusCode FFTImpl::getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
+                                                                         InferenceEngine::ResponseDesc *resp) noexcept {
+    std::vector<InferenceEngine::DataConfig> inDataConfig;
+    std::vector<InferenceEngine::DataConfig> outDataConfig;
+    InferenceEngine::SizeVector order(inpShape.size());
+    std::iota(order.begin(), order.end(), 0);
+
+    // Allow any offset before data
+    size_t offset((std::numeric_limits<size_t>::max)());
+
+    // Input shape
+    InferenceEngine::DataConfig inpConf;
+    inpConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, inpShape, {inpShape, order, offset});
+    inDataConfig.push_back(inpConf);
+
+    // Output shape
+    InferenceEngine::DataConfig outConf;
+    outConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, outShape, {outShape, order, offset});
+    outDataConfig.push_back(outConf);
+
+    InferenceEngine::LayerConfig layerConfig;
+    layerConfig.inConfs = inDataConfig;
+    layerConfig.outConfs = outDataConfig;
+
+    conf.push_back(layerConfig);
+    return InferenceEngine::StatusCode::OK;
+}
+
+InferenceEngine::StatusCode FFTImpl::init(InferenceEngine::LayerConfig &config, InferenceEngine::ResponseDesc *resp) noexcept {
+    try {
+        if (config.inConfs.size() != 1 || config.outConfs.size() != 1) {
+            THROW_IE_EXCEPTION << "Operation cannot be initialized with incorrect number of inputs/outputs!";
+        }
+
+        if (config.outConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32 ||
+            config.inConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32)  {
+            THROW_IE_EXCEPTION << "Operation supports only FP32 precisions!";
+        }
+    } catch (InferenceEngine::details::InferenceEngineException& ex) {
+        if (resp) {
+            strncpy(resp->msg, error.c_str(), sizeof(resp->msg) - 1);
+            resp->msg[sizeof(resp->msg)-1] = 0;
+        }
+        return InferenceEngine::GENERAL_ERROR;
+    }
+    return InferenceEngine::OK;
+}
+
+static cv::Mat infEngineBlobToMat(const InferenceEngine::Blob::Ptr& blob)
+{
+    // NOTE: Inference Engine sizes are reversed.
+    std::vector<size_t> dims = blob->getTensorDesc().getDims();
+    std::vector<int> size(dims.begin(), dims.end());
+    auto precision = blob->getTensorDesc().getPrecision();
+    CV_Assert(precision == InferenceEngine::Precision::FP32);
+    return cv::Mat(size, CV_32F, (void*)blob->buffer());
+}
+
+InferenceEngine::StatusCode FFTImpl::execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
+                                                      std::vector<InferenceEngine::Blob::Ptr> &outputs,
+                                                      InferenceEngine::ResponseDesc *resp) noexcept {
+    cv::Mat inp = infEngineBlobToMat(inputs[0]);
+    cv::Mat out = infEngineBlobToMat(outputs[0]);
+
+    const int n = inp.size[0];
+    const int h = inp.size[2];
+    const int w = inp.size[3];
+    cv::Mat complex(h, w, CV_32FC2), interleavedOut(h, w, CV_32FC2);
+    for (int i = 0; i < n; ++i) {
+        std::vector<cv::Mat> components = {
+            cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 0)),
+            cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 1))
+        };
+        cv::merge(components, complex);
+
+        if (!inverse)
+            cv::dft(complex, interleavedOut);
+        else
+            cv::idft(complex, interleavedOut, cv::DFT_SCALE);
+
+        components = {
+            cv::Mat(h, w, CV_32F, out.ptr<float>(i, 0)),
+            cv::Mat(h, w, CV_32F, out.ptr<float>(i, 1))
+        };
+        cv::split(interleavedOut, components);
+    }
+    return InferenceEngine::OK;
+}
+//! [fft_kernel:implementation]
+
--- a/docs/HOWTO/ie_cpu_extension/fft_kernel.hpp
+++ b/docs/HOWTO/ie_cpu_extension/fft_kernel.hpp
@@ -0,0 +1,32 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
+
+//! [fft_kernel:header]
+#pragma once
+
+#include <ie_iextension.h>
+#include <ngraph/ngraph.hpp>
+
+namespace FFTExtension {
+
+class FFTImpl : public InferenceEngine::ILayerExecImpl {
+public:
+    explicit FFTImpl(const std::shared_ptr<ngraph::Node>& node);
+    InferenceEngine::StatusCode getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
+                                                           InferenceEngine::ResponseDesc *resp) noexcept override;
+    InferenceEngine::StatusCode init(InferenceEngine::LayerConfig &config,
+                                     InferenceEngine::ResponseDesc *resp) noexcept override;
+    InferenceEngine::StatusCode execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
+                                        std::vector<InferenceEngine::Blob::Ptr> &outputs,
+                                        InferenceEngine::ResponseDesc *resp) noexcept override;
+private:
+    ngraph::Shape inpShape;
+    ngraph::Shape outShape;
+    bool inverse;
+    std::string error;
+};
+
+}
+//! [fft_kernel:header]
--- a/docs/HOWTO/ie_cpu_extension/fft_op.cpp
+++ b/docs/HOWTO/ie_cpu_extension/fft_op.cpp
@@ -0,0 +1,34 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+//! [fft_op:implementation]
+#include "fft_op.hpp"
+
+using namespace FFTExtension;
+
+constexpr ngraph::NodeTypeInfo FFTOp::type_info;
+
+FFTOp::FFTOp(const ngraph::Output<ngraph::Node>& inp, bool _inverse) : Op({inp}) {
+    constructor_validate_and_infer_types();
+    inverse = _inverse;
+}
+
+void FFTOp::validate_and_infer_types() {
+    auto outShape = get_input_partial_shape(0);
+    set_output_type(0, get_input_element_type(0), outShape);
+}
+
+std::shared_ptr<ngraph::Node> FFTOp::clone_with_new_inputs(const ngraph::OutputVector &new_args) const {
+    if (new_args.size() != 1) {
+        throw ngraph::ngraph_error("Incorrect number of new arguments");
+    }
+    return std::make_shared<FFTOp>(new_args.at(0), inverse);
+}
+
+bool FFTOp::visit_attributes(ngraph::AttributeVisitor &visitor) {
+    visitor.on_attribute("inverse", inverse);
+    return true;
+}
+//! [fft_op:implementation]
+
--- a/docs/HOWTO/ie_cpu_extension/fft_op.hpp
+++ b/docs/HOWTO/ie_cpu_extension/fft_op.hpp
@@ -0,0 +1,28 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+//! [fft_op:header]
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+
+namespace FFTExtension {
+
+class FFTOp : public ngraph::op::Op {
+public:
+    static constexpr ngraph::NodeTypeInfo type_info{"FFT", 0};
+    const ngraph::NodeTypeInfo& get_type_info() const override { return type_info;  }
+
+    FFTOp() = default;
+    FFTOp(const ngraph::Output<ngraph::Node>& inp, bool inverse);
+    void validate_and_infer_types() override;
+    std::shared_ptr<ngraph::Node> clone_with_new_inputs(const ngraph::OutputVector& new_args) const override;
+    bool visit_attributes(ngraph::AttributeVisitor& visitor) override;
+
+    bool inverse;
+};
+
+}
+//! [fft_op:header]
+
--- a/docs/HOWTO/img/IE_extensions_flow.png
+++ b/docs/HOWTO/img/IE_extensions_flow.png
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:c2f362a39ae6c2af080e4f055b6fdba4954f918f85731545d1df3d687d9213d5
-size 421056
--- a/docs/HOWTO/img/MEG_generic_flow.png
+++ b/docs/HOWTO/img/MEG_generic_flow.png
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:cb5c700d003936779455353bfa4ed9432410c0975c46e2dfd30c6a1abccd1727
-size 23320
--- a/docs/HOWTO/img/MO_extensions_flow.png
+++ b/docs/HOWTO/img/MO_extensions_flow.png
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:99d6b5146be85fa408dc5432883c3e2745cffe890133854a97dcf22f5c5962d4
-size 47564
--- a/docs/HOWTO/img/converted_subgraph.png
+++ b/docs/HOWTO/img/converted_subgraph.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f7c8ab4f15874d235968471bcf876c89c795d601e69891208107b8b72aa58eb1
+size 70014
--- a/docs/HOWTO/img/mo_caffe_priorities.png
+++ b/docs/HOWTO/img/mo_caffe_priorities.png
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:0a4de6e502cae7542f1f311bcdbea6bb145f960f0d27d86a03160d1a60133778
-size 301310
--- a/docs/HOWTO/img/unsupported_subgraph.png
+++ b/docs/HOWTO/img/unsupported_subgraph.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:3d5ccf51fe1babb93d96d042494695a6a6e055d1f8ebf7eef5083d54d8987a23
+size 58789
--- a/docs/HOWTO/mo_extensions/front/tf/Complex.py
+++ b/docs/HOWTO/mo_extensions/front/tf/Complex.py
@@ -0,0 +1,57 @@
+"""
+ Copyright (C) 2018-2020 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+#! [complex:transformation]
+import logging as log
+
+import numpy as np
+
+from mo.front.common.replacement import FrontReplacementSubgraph
+from mo.graph.graph import Graph
+
+
+class Complex(FrontReplacementSubgraph):
+    enabled = True
+
+    def pattern(self):
+        return dict(
+            nodes=[
+                ('strided_slice_real', dict(op='StridedSlice')),
+                ('strided_slice_imag', dict(op='StridedSlice')),
+                ('complex', dict(op='Complex')),
+            ],
+            edges=[
+                ('strided_slice_real', 'complex', {'in': 0}),
+                ('strided_slice_imag', 'complex', {'in': 1}),
+            ])
+
+    @staticmethod
+    def replace_sub_graph(graph: Graph, match: dict):
+        strided_slice_real = match['strided_slice_real']
+        strided_slice_imag = match['strided_slice_imag']
+        complex_node = match['complex']
+
+        # make sure that both strided slice operations get the same data as input
+        assert strided_slice_real.in_port(0).get_source() == strided_slice_imag.in_port(0).get_source()
+
+        # identify the output port of the operation producing datat for strided slice nodes
+        input_node_output_port = strided_slice_real.in_port(0).get_source()
+        input_node_output_port.disconnect()
+
+        # change the connection so now all consumers of "complex_node" get data from input node of strided slice nodes
+        complex_node.out_port(0).get_connection().set_source(input_node_output_port)
+#! [complex:transformation]
+
--- a/docs/HOWTO/mo_extensions/front/tf/ComplexAbs.py
+++ b/docs/HOWTO/mo_extensions/front/tf/ComplexAbs.py
@@ -0,0 +1,40 @@
+"""
+ Copyright (C) 2018-2020 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+#! [complex_abs:transformation]
+import numpy as np
+
+from extensions.ops.elementwise import Pow
+from extensions.ops.ReduceOps import ReduceSum
+from mo.front.common.replacement import FrontReplacementOp
+from mo.graph.graph import Graph, Node
+from mo.ops.const import Const
+
+
+class ComplexAbs(FrontReplacementOp):
+    op = "ComplexAbs"
+    enabled = True
+
+    def replace_op(self, graph: Graph, node: Node):
+        pow_2 = Const(graph, {'value': np.float32(2.0)}).create_node()
+        reduce_axis = Const(graph, {'value': np.int32(-1)}).create_node()
+        pow_0_5 = Const(graph, {'value': np.float32(0.5)}).create_node()
+
+        sq = Pow(graph, dict(name=node.in_node(0).name + '/sq', power=2.0)).create_node([node.in_node(0), pow_2])
+        sum = ReduceSum(graph, dict(name=sq.name + '/sum')).create_node([sq, reduce_axis])
+        sqrt = Pow(graph, dict(name=sum.name + '/sqrt', power=0.5)).create_node([sum, pow_0_5])
+        return [sqrt.id]
+#! [complex_abs:transformation]
--- a/docs/HOWTO/mo_extensions/front/tf/FFT_ext.py
+++ b/docs/HOWTO/mo_extensions/front/tf/FFT_ext.py
@@ -0,0 +1,47 @@
+"""
+ Copyright (C) 2018-2020 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+# ! [fft_ext:extractor]
+from ...ops.FFT import FFT
+from mo.front.extractor import FrontExtractorOp
+from mo.utils.error import Error
+
+
+class FFT2DFrontExtractor(FrontExtractorOp):
+    op = 'FFT2D'
+    enabled = True
+
+    @classmethod
+    def extract(cls, node):
+        attrs = {
+            'inverse': 0
+        }
+        FFT.update_node_stat(node, attrs)
+        return cls.enabled
+
+
+class IFFT2DFrontExtractor(FrontExtractorOp):
+    op = 'IFFT2D'
+    enabled = True
+
+    @classmethod
+    def extract(cls, node):
+        attrs = {
+            'inverse': 1
+        }
+        FFT.update_node_stat(node, attrs)
+        return cls.enabled
+# ! [fft_ext:extractor]
--- a/docs/HOWTO/mo_extensions/ops/FFT.py
+++ b/docs/HOWTO/mo_extensions/ops/FFT.py
@@ -0,0 +1,40 @@
+"""
+ Copyright (C) 2018-2020 Intel Corporation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+"""
+
+#! [fft:operation]
+from mo.front.common.partial_infer.elemental import copy_shape_infer
+from mo.graph.graph import Node, Graph
+from mo.ops.op import Op
+
+
+class FFT(Op):
+    op = 'FFT'
+    enabled = False
+
+    def __init__(self, graph: Graph, attrs: dict):
+        super().__init__(graph, {
+            'type': self.op,
+            'op': self.op,
+            'version': 'fft_extension',
+            'inverse': None,
+            'in_ports_count': 1,
+            'out_ports_count': 1,
+            'infer': copy_shape_infer
+        }, attrs)
+
+    def backend_attrs(self):
+        return ['inverse']
+#! [fft:operation]
--- a/docs/IE_DG/API_Changes.md
+++ b/docs/IE_DG/API_Changes.md
@@ -2,6 +2,29 @@

 The sections below contain detailed list of changes made to the Inference Engine API in recent releases.

+## 2021.2
+
+### New API
+
+ **State API**
+
+ * InferenceEngine::InferRequest::QueryState query state value of network on current infer request
+ * InferenceEngine::IVariableState class instead of IMemoryState (rename)
+ * InferenceEngine::IVariableState::GetState instead of IMemoryState::GetLastState (rename)
+
+ **BatchedBlob** - represents a InferenceEngine::BatchedBlob containing other blobs - one per batch.
+
+ **Transformations API** - added a new header `ie_transformations.hpp` which contains transformations for InferenceEngine::CNNNetwork object. Such transformations can be called prior to loading network for compilation for particular device:
+
+ * InferenceEngine::LowLatency
+
+### Deprecated API
+
+ **State API**
+
+ * InferenceEngine::ExecutableNetwork::QueryState - use InferenceEngine::InferRequest::QueryState
+ * InferenceEngine::IVariableState::GetLastState - use InferenceEngine::IVariableState::GetState
+
 ## 2021.1

 ### Deprecated API
@@ -133,7 +156,7 @@ The sections below contain detailed list of changes made to the Inference Engine

 ### Deprecated API

- **Myriad Plugin API:**
+ **MYRIAD Plugin API:**

 * VPU_CONFIG_KEY(IGNORE_IR_STATISTIC)

--- a/docs/IE_DG/Bfloat16Inference.md
+++ b/docs/IE_DG/Bfloat16Inference.md
@@ -20,7 +20,7 @@ There are two ways to check if CPU device can support bfloat16 computations for
 1. Query the instruction set via system `lscpu | grep avx512_bf16` or `cat /proc/cpuinfo | grep avx512_bf16`.
 2. Use [Query API](InferenceEngine_QueryAPI.md) with `METRIC_KEY(OPTIMIZATION_CAPABILITIES)`, which should return `BF16` in the list of CPU optimization options:

-@snippet openvino/docs/snippets/Bfloat16Inference0.cpp part0
+@snippet snippets/Bfloat16Inference0.cpp part0

 Current Inference Engine solution for bfloat16 inference uses Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and supports inference of the following layers in BF16 computation mode:
 * Convolution
@@ -46,11 +46,11 @@ Bfloat16 data usage provides the following benefits that increase performance:
 For default optimization on CPU, source model converts from FP32 or FP16 to BF16 and executes internally on platforms with native BF16 support. In that case, `KEY_ENFORCE_BF16` is set to `YES`.
 The code below demonstrates how to check if the key is set:

-@snippet openvino/docs/snippets/Bfloat16Inference1.cpp part1
+@snippet snippets/Bfloat16Inference1.cpp part1

 To disable BF16 internal transformations, set the `KEY_ENFORCE_BF16` to `NO`. In this case, the model infers AS IS without modifications with precisions that were set on each layer edge.

-@snippet openvino/docs/snippets/Bfloat16Inference2.cpp part2
+@snippet snippets/Bfloat16Inference2.cpp part2

 An exception with message `Platform doesn't support BF16 format` is formed in case of setting `KEY_ENFORCE_BF16` to `YES` on CPU without native BF16 support.

--- a/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md
+++ b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md
@@ -86,3 +86,7 @@ inference of a pre-trained and optimized deep learning model and a set of sample
 * [Known Issues](Known_Issues_Limitations.md)

 **Typical Next Step:** [Introduction to Inference Engine](inference_engine_intro.md)
+
+## Video: Inference Engine Concept
+[![](https://img.youtube.com/vi/e6R13V8nbak/0.jpg)](https://www.youtube.com/watch?v=e6R13V8nbak)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/e6R13V8nbak" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
--- a/docs/IE_DG/DynamicBatching.md
+++ b/docs/IE_DG/DynamicBatching.md
@@ -18,7 +18,7 @@ The batch size that was set in passed <code>CNNNetwork</code> object will be use

 Here is a code example:

-@snippet openvino/docs/snippets/DynamicBatching.cpp part0
+@snippet snippets/DynamicBatching.cpp part0


 ## Limitations
--- a/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md
+++ b/docs/IE_DG/Extensibility_DG/AddingNGraphOps.md
@@ -1,4 +1,4 @@
-# Add Custom nGraph Operations {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps}
+# Custom nGraph Operation {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps}

 Inference Engine Extension API allows to register operation sets (opsets) with custom nGraph operations, it allows to support Networks with unknown operations.

@@ -20,7 +20,7 @@ To add your custom nGraph operation, create a new class that extends `ngraph::Op

 Based on that, declaration of a operation class can look as follows:

-@snippet op.hpp op:header
+@snippet template_extension/op.hpp op:header

 ### Class Fields

@@ -33,37 +33,37 @@ The provided implementation has several fields:

 nGraph operation contains two constructors: a default constructor, which allows to create operation without attributes and a constructor that creates and validates operation with specified inputs and attributes.

-@snippet op.cpp op:ctor
+@snippet template_extension/op.cpp op:ctor

 ### `validate_and_infer_types()`

 `ngraph::Node::validate_and_infer_types` method validates operation attributes and calculates output shapes using attributes of operation.

-@snippet op.cpp op:validate
+@snippet template_extension/op.cpp op:validate

 ### `clone_with_new_inputs()`

 `ngraph::Node::clone_with_new_inputs` method creates a copy of nGraph operation with new inputs.

-@snippet op.cpp op:copy
+@snippet template_extension/op.cpp op:copy

 ### `visit_attributes()`

 `ngraph::Node::visit_attributes` method allows to visit all operation attributes.

-@snippet op.cpp op:visit_attributes
+@snippet template_extension/op.cpp op:visit_attributes

 ### `evaluate()`

 `ngraph::Node::evaluate` method allows to apply constant folding to an operation.

-@snippet op.cpp op:evaluate
+@snippet template_extension/op.cpp op:evaluate

 ## Register Custom Operations in Extension Class

 To add custom operations to the [Extension](Extension.md) class, create an operation set with custom operations and implement the `InferenceEngine::IExtension::getOpSets` method:

-@snippet extension.cpp extension:getOpSets
+@snippet template_extension/extension.cpp extension:getOpSets

 This method returns a map of opsets that exist in the extension library.

@@ -71,10 +71,9 @@ nGraph provides opsets mechanism for operation versioning. Different opsets dist

 When specifying opset names, follow the rules below:
 * Use unique opset names.
-* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opest2`.
+* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opset2`, `opset3`, ... , `opsetN`.
 * Make sure that the Model Optimizer and your extension use the same opset names.
-* IR v10 layers have the mandatory `version` attribute  specifying the opset. 
-* `opset1` is the name of default operations set.
+* IR v10 operations have the mandatory `version` attribute specifying the opset.
 Operations from the default opset cannot be redefined.

 Use a custom opset to create a new operation or extend functionality of an existing operation from another opset.
--- a/docs/IE_DG/Extensibility_DG/CPU_Kernel.md
+++ b/docs/IE_DG/Extensibility_DG/CPU_Kernel.md
@@ -1,4 +1,4 @@
-# How to Implement Custom CPU Layers {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel}
+# How to Implement Custom CPU Operations {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel}

 The primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), and new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerExecImpl defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel.

@@ -7,7 +7,7 @@ The primary vehicle for the performance of the CPU codepath in the Inference Eng
 All custom kernels for the CPU plugin should be inherited from the InferenceEngine::ILayerExecImpl interface.
 Based on that, declaration of a kernel implementation class can look as follows:

-@snippet cpu_kernel.hpp cpu_implementation:header
+@snippet template_extension/cpu_kernel.hpp cpu_implementation:header

 ### Class Fields

@@ -22,25 +22,25 @@ The provided implementation has several fields:

 An implementation constructor checks parameters of nGraph operation, stores needed attributes, and stores an error message in the case of an error.

-@snippet cpu_kernel.cpp cpu_implementation:ctor
+@snippet template_extension/cpu_kernel.cpp cpu_implementation:ctor

 ### `getSupportedConfigurations`

 InferenceEngine::ILayerExecImpl::getSupportedConfigurations method returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the [Memory Primitives](../Memory_primitives.md) section for instructions on how to do it.

-@snippet cpu_kernel.cpp cpu_implementation:getSupportedConfigurations
+@snippet template_extension/cpu_kernel.cpp cpu_implementation:getSupportedConfigurations

 ### `init`

 InferenceEngine::ILayerExecImpl::init method gets a runtime-selected configuration from a vector that is populated from the `getSupportedConfigurations` method and checks the parameters:

-@snippet cpu_kernel.cpp cpu_implementation:init
+@snippet template_extension/cpu_kernel.cpp cpu_implementation:init

 ### `execute`

 InferenceEngine::ILayerExecImpl::execute method accepts and processes the actual tenors as input/output blobs:

-@snippet cpu_kernel.cpp cpu_implementation:execute
+@snippet template_extension/cpu_kernel.cpp cpu_implementation:execute

 ## Register Implementation in `Extension` Class

@@ -52,18 +52,18 @@ To register custom kernel implementation in the [Extension](Extension.md) class,

 InferenceEngine::IExtension::getImplTypes returns a vector of implementation types for an operation.

-@snippet extension.cpp extension:getImplTypes
+@snippet template_extension/extension.cpp extension:getImplTypes

 ### <a name="getImplementation"><code>getImplementation</code></a>

 InferenceEngine::IExtension::getImplementation returns the kernel implementation with a specified type for an operation.

-@snippet extension.cpp extension:getImplementation
+@snippet template_extension/extension.cpp extension:getImplementation


 ## Load Extension with Executable Kernels to Plugin

 Use the `AddExtension` method of the general plugin interface to load your primitives:

-@snippet openvino/docs/snippets/CPU_Kernel.cpp part0
+@snippet snippets/CPU_Kernel.cpp part0

--- a/docs/IE_DG/Extensibility_DG/Custom_ONNX_Ops.md
+++ b/docs/IE_DG/Extensibility_DG/Custom_ONNX_Ops.md
@@ -24,11 +24,11 @@ The `ngraph::onnx_import::Node` class represents a node in ONNX model. It provid
 New operator registration must happen before the ONNX model is read, for example, if an ONNX model uses the 'CustomRelu' operator, `register_operator("CustomRelu", ...)` must be called before InferenceEngine::Core::ReadNetwork.
 Re-registering ONNX operators within the same process is supported. During registration of the existing operator, a warning is printed.

-The example below demonstrates an examplary model that requires previously created 'CustomRelu' operator:
+The example below demonstrates an exemplary model that requires previously created 'CustomRelu' operator:
@snippet onnx_custom_op/onnx_custom_op.cpp onnx_custom_op:model


-For a reference on how to create a graph with nGraph operations, visit [nGraph tutorial](../nGraphTutorial.md).
+For a reference on how to create a graph with nGraph operations, visit [Custom nGraph Operation](AddingNGraphOps.md).
 For a complete list of predefined nGraph operators, visit [available operations sets](../../ops/opset.md).

 If operator is no longer needed, it can be unregistered by calling `unregister_operator`. The function takes three arguments `op_type`, `version`, and `domain`.
@@ -38,12 +38,12 @@ If operator is no longer needed, it can be unregistered by calling `unregister_o

 The same principles apply when registering custom ONNX operator based on custom nGraph operations.
 This example shows how to register custom ONNX operator based on `Operation` presented in [this tutorial](AddingNGraphOps.md), which is used in [TemplateExtension](Extension.md).
-@snippet extension.cpp extension:ctor
+@snippet template_extension/extension.cpp extension:ctor

 Here, the `register_operator` function is called in Extension's constructor, which makes sure that it is called before InferenceEngine::Core::ReadNetwork (since InferenceEngine::Core::AddExtension must be called before a model with custom operator is read).

 The example below demonstrates how to unregister operator from Extension's destructor:
-@snippet extension.cpp extension:dtor
+@snippet template_extension/extension.cpp extension:dtor
 Note that it is mandatory to unregister custom ONNX operator if it is defined in dynamic shared library.

 ## Requirements for building with CMake
--- a/docs/IE_DG/Extensibility_DG/Extension.md
+++ b/docs/IE_DG/Extensibility_DG/Extension.md
@@ -5,11 +5,11 @@ All extension libraries should be inherited from this interface.

 Based on that, declaration of an extension class can look as follows:

-@snippet extension.hpp extension:header
+@snippet template_extension/extension.hpp extension:header

 The extension library should contain and export the method InferenceEngine::CreateExtension, which creates an `Extension` class:

-@snippet extension.cpp extension:CreateExtension
+@snippet template_extension/extension.cpp extension:CreateExtension

 Also, an `Extension` object should implement the following methods:

@@ -17,7 +17,7 @@ Also, an `Extension` object should implement the following methods:

 * InferenceEngine::IExtension::GetVersion returns information about version of the library

-@snippet extension.cpp extension:GetVersion
+@snippet template_extension/extension.cpp extension:GetVersion

 Implement the  InferenceEngine::IExtension::getOpSets method if the extension contains custom layers. 
 Read the [guide about custom operations](AddingNGraphOps.md) for more information.
--- a/docs/IE_DG/Extensibility_DG/GPU_Kernel.md
+++ b/docs/IE_DG/Extensibility_DG/GPU_Kernel.md
@@ -1,16 +1,16 @@
-# How to Implement Custom GPU Layers {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel}
+# How to Implement Custom GPU Operations {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel}

-The GPU codepath abstracts many details about OpenCL&trade;. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer.
+The GPU codepath abstracts many details about OpenCL&trade;. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the operation.

-There are two options of using custom layer configuration file:
+There are two options of using custom operation configuration file:

 * Include a section with your kernels into the global automatically-loaded `cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file, which is hosted in the `<INSTALL_DIR>/deployment_tools/inference_engine/bin/intel64/{Debug/Release}` folder
-* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom layers to the plugin:
+* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:

-@snippet openvino/docs/snippets/GPU_Kernel.cpp part0
+@snippet snippets/GPU_Kernel.cpp part0

 All Inference Engine samples, except trivial `hello_classification`,
-feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom layers for the classification sample, run the command below:
+feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
 ```sh
 $ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU
 -c <absolute_path_to_config>/custom_layer_example.xml
@@ -19,7 +19,7 @@ $ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validati
 ## Configuration File Format <a name="config-file-format"></a>

 The configuration file is expected to follow the `.xml` file structure
-with a node of the type `CustomLayer` for every custom layer you provide.
+with a node of the type `CustomLayer` for every custom operation you provide.

 The definitions described in the sections below use the following notations:

@@ -32,14 +32,13 @@ Notation | Description

 ### CustomLayer Node and Sub-node Structure

-`CustomLayer` node contains the entire configuration for a single custom
-layer.
+`CustomLayer` node contains the entire configuration for a single custom operation.

 | Attribute Name   |\#    |  Description |
 |-----|-----|-----|
-| `name`           | (1)  | The name of the layer type to be used. This name should be identical to the type used in the IR.|
-| `type`           | (1)  | Must be `SimpleGPU`.                                                                             |
-| `version`        | (1)  | Must be `1`.                                                                                   |
+| `name`           | (1)  | The name of the operation type to be used. This name should be identical to the type used in the IR.|
+| `type`           | (1)  | Must be `SimpleGPU`.                                                                                |
+| `version`        | (1)  | Must be `1`.                                                                                        |

 **Sub-nodes**: `Kernel` (1), `Buffers` (1), `CompilerOptions` (0+),
 `WorkSizes` (0/1)
@@ -69,9 +68,9 @@ the sources during compilation (JIT).
 | Attribute Name | \#    | Description |
 |------|-------|------|
 | `name`         | (1)   | The name of the defined JIT. For static constants, this can include the value as well (taken as a string). |
-| `param`        | (0/1) | This parameter value is used as the value of this JIT definition.                                     |
+| `param`        | (0/1) | This parameter value is used as the value of this JIT definition.                                          |
 | `type`         | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays.                    |
-| `default`      | (0/1) | The default value to be used if the specified parameters is missing from the layer in the IR.              |
+| `default`      | (0/1) | The default value to be used if the specified parameters is missing from the operation in the IR.          |

 **Sub-nodes:** None

@@ -92,7 +91,7 @@ weights or biases).

 | Attribute Name | \#  | Description |
 |----|-----|------|
-| `name`         | (1) | Name of a blob attached to a layer in the IR             |
+| `name`         | (1) | Name of a blob attached to a operation in the IR             |
 | `arg-index`    | (1) | 0-based index in the entry function arguments to be bound to |

 **Sub-nodes**: None
@@ -105,7 +104,7 @@ weights or biases).
 |------|-------|-------|
 | `arg-index`    | (1)   | 0-based index in the entry function arguments to be bound to.                                                                          |
 | `type`         | (1)   | `input` or `output`                                                                                                                    |
-| `port-index`   | (1)   | 0-based index in the layer’s input/output ports in the IR                                                                              |
+| `port-index`   | (1)   | 0-based index in the operation input/output ports in the IR                                                                            |
 | `format`       | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB` (also in all lowercase). Default value: `BFYX` |

 ### CompilerOptions Node and Sub-node Structure
@@ -178,7 +177,7 @@ For an example, see [Example Kernel](#example-kernel).
 | `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array   |
 | `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element (bypassing the lower padding)   |
 All `<TENSOR>` values are automatically defined for every tensor
-bound to this layer (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown
+bound to this operation (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown
 in the following for example:

 ```sh
@@ -227,7 +226,7 @@ floating-point, and integer kernel parameters. To get the dump, add the
 following line to your code that configures the GPU plugin to output the
 custom kernels:

-@snippet openvino/docs/snippets/GPU_Kernel.cpp part1
+@snippet snippets/GPU_Kernel.cpp part1

 When the Inference Engine compiles the kernels for the specific network,
 it also outputs the resulting code for the custom kernels. In the
--- a/docs/IE_DG/Extensibility_DG/Intro.md
+++ b/docs/IE_DG/Extensibility_DG/Intro.md
@@ -2,19 +2,22 @@

 Inference Engine Extensibility API allows to add support of custom operations to the Inference Engine.
 Extension should contain operation sets with custom operations and execution kernels for custom operations.
-Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function that allows to create a new extension instance.
+Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function
+that allows to create a new extension instance.

-Extensibility library can be loaded to the InferenceEngine::Core object using the InferenceEngine::Core::AddExtension method.
+Extensibility library can be loaded to the `InferenceEngine::Core` object using the
+`InferenceEngine::Core::AddExtension` method.

 ## Inference Engine Extension Library

-Inference Engine Extension dynamic library contains several main components:
+Inference Engine Extension dynamic library contains several components:

- * [Extension class](Extension.md):
+ * [Extension Library](Extension.md):
    - Contains custom operation sets
    - Provides CPU implementations for custom operations
- * [Custom operations](Intro.md):
-    - Allows to use InferenceEngine::Core::ReadNetwork to read Intermediate Representation (IR) with unsupported operations 
+ * [Custom nGraph Operation](AddingNGraphOps.md):
+    - Allows to use `InferenceEngine::Core::ReadNetwork` to read Intermediate Representation (IR) with unsupported
+    operations
    - Allows to create `ngraph::Function` with unsupported operations
    - Provides shape inference mechanism for custom operations

@@ -26,13 +29,13 @@ at `<dldt source tree>/docs/template_extension`.

 The Inference Engine workflow involves the creation of custom kernels and either custom or existing operations.

-An _Operation_ is a Network building block implemented in the training framework, for example, `Convolution` in Caffe*.
+An _Operation_ is a network building block implemented in the training framework, for example, `Convolution` in Caffe*.
 A _Kernel_ is defined as the corresponding implementation in the Inference Engine.

-Refer to the [Custom Layers in the Model Optimizer](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) section for details on how
-mapping between framework layers and Inference Engine kernels is registered.
+Refer to the [Model Optimizer Extensibility](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
+for details on how a mapping between framework operations and Inference Engine kernels is registered.

-In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework.
+In short, you can plug your own kernel implementations into the Inference Engine and map them to the operations in the original framework.

 The following pages describe how to integrate custom _kernels_ into the Inference Engine:

--- a/docs/IE_DG/GPU_Kernels_Tuning.md
+++ b/docs/IE_DG/GPU_Kernels_Tuning.md
@@ -30,7 +30,7 @@ File with tuned data is the result of this step.

 The example below shows how to set and use the key files:

-@snippet openvino/docs/snippets/GPU_Kernels_Tuning.cpp part0
+@snippet snippets/GPU_Kernels_Tuning.cpp part0

 ---

--- a/docs/IE_DG/Glossary.md
+++ b/docs/IE_DG/Glossary.md
@@ -72,7 +72,7 @@ Glossary of terms used in the Inference Engine
 | <code>InferenceEngineProfileInfo</code> | Represents basic inference profiling information per layer |
 | Inference Engine | A C++ library with a set of classes that you can use in your application to infer input data (images) and get the result |
 | Inference Engine API | The basic default API for all supported devices, which allows you to load a model from Intermediate Representation, set input and output formats and execute the model on various devices |
-| Inference Engine <code>Core<code> | Inference Engine Core is a software component that manages inference on certain Intel(R) hardware devices: CPU, GPU, MYRIAD, GNA, etc. |
+| Inference Engine <code>Core</code> | Inference Engine Core is a software component that manages inference on certain Intel(R) hardware devices: CPU, GPU, MYRIAD, GNA, etc. |
 | Layer catalog or Operations specification | A list of supported layers or operations and its parameters. Sets of supported layers are different for different plugins, please check the documentation on plugins to verify if the Inference Engine supports certain layer on the dedicated hardware |
 | <code>Layout</code> | Image data layout refers to the representation of images batch. Layout shows a sequence of 4D or 5D tensor data in memory. A typical NCHW format represents pixel in horizontal direction, rows by vertical dimension, planes by channel and images into batch |
 | <code>OutputsDataMap</code> | Structure which contains information about output precisions and layouts |
--- a/docs/IE_DG/InferenceEngine_QueryAPI.md
+++ b/docs/IE_DG/InferenceEngine_QueryAPI.md
@@ -23,7 +23,7 @@ The `InferenceEngine::ExecutableNetwork` class is also extended to support the Q

 ### GetAvailableDevices

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI0.cpp part0
+@snippet snippets/InferenceEngine_QueryAPI0.cpp part0

 The function returns list of available devices, for example:
 ```
@@ -32,7 +32,8 @@ MYRIAD.1.4-ma2480
 FPGA.0
 FPGA.1
 CPU
-GPU
+GPU.0
+GPU.1
 ...
 ```

@@ -46,7 +47,7 @@ Each device name can then be passed to:

 The code below demonstrates how to understand whether `HETERO` device dumps `.dot` files with split graphs during the split stage:

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI1.cpp part1
+@snippet snippets/InferenceEngine_QueryAPI1.cpp part1

 For documentation about common configuration keys, refer to `ie_plugin_config.hpp`. Device specific configuration keys can be found in corresponding plugin folders.

@@ -54,7 +55,7 @@ For documentation about common configuration keys, refer to `ie_plugin_config.hp

 * To extract device properties such as available device, device name, supported configuration keys, and others, use the `InferenceEngine::Core::GetMetric` method:

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI2.cpp part2
+@snippet snippets/InferenceEngine_QueryAPI2.cpp part2

 A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`.

@@ -66,17 +67,17 @@ A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`.

 The method is used to get executable network specific metric such as `METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)`:

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI3.cpp part3
+@snippet snippets/InferenceEngine_QueryAPI3.cpp part3

 Or the current temperature of `MYRIAD` device:

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI4.cpp part4
+@snippet snippets/InferenceEngine_QueryAPI4.cpp part4

 ### GetConfig()

 The method is used to get information about configuration values the executable network has been created with:

-@snippet openvino/docs/snippets/InferenceEngine_QueryAPI5.cpp part5
+@snippet snippets/InferenceEngine_QueryAPI5.cpp part5

 ### SetConfig()

--- a/docs/IE_DG/Integrate_with_customer_application_new_API.md
+++ b/docs/IE_DG/Integrate_with_customer_application_new_API.md
@@ -29,20 +29,20 @@ Integration process includes the following steps:

 1) **Create Inference Engine Core** to manage available devices and read network objects:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part0
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part0

 2) **Read a model IR** created by the Model Optimizer (.xml is supported format):

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part1
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part1

 **Or read the model from ONNX format** (.onnx and .prototxt are supported formats). You can find more information about the ONNX format support in the document [ONNX format support in the OpenVINO™](./ONNX_Support.md).

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part2
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part2

 3) **Configure input and output**. Request input and output information using `InferenceEngine::CNNNetwork::getInputsInfo()`, and `InferenceEngine::CNNNetwork::getOutputsInfo()`
 methods:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part3
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part3

  Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the
  [Supported configurations](supported_plugins/Supported_Devices.md) chapter to choose the relevant configuration.
@@ -67,7 +67,7 @@ methods:

  You can use the following code snippet to configure input and output:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part4
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part4

 > **NOTE**: NV12 input color format pre-processing differs from other color conversions. In case of NV12,
 >  Inference Engine expects two separate image planes (Y and UV). You must use a specific
@@ -91,31 +91,31 @@ methods:

 4) **Load the model** to the device using `InferenceEngine::Core::LoadNetwork()`:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part5
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part5

    It creates an executable network from a network object. The executable network is associated with single hardware device.
    It is possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources).
    Third parameter is a configuration for plugin. It is map of pairs: (parameter name, parameter value). Choose device from
     [Supported devices](supported_plugins/Supported_Devices.md) page for more details about supported configuration parameters.

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part6
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part6

 5) **Create an infer request**:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part7
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part7

 6) **Prepare input**. You can use one of the following options to prepare input:
    * **Optimal way for a single network.** Get blobs allocated by an infer request using `InferenceEngine::InferRequest::GetBlob()`
    and feed an image and the input data to the blobs. In this case, input data must be aligned (resized manually) with a
    given blob size and have a correct color format.

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part8
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part8

    * **Optimal way for a cascade of networks (output of one network is input for another).** Get output blob from the first
    request using `InferenceEngine::InferRequest::GetBlob()` and set it as input for the second request using
    `InferenceEngine::InferRequest::SetBlob()`.

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part9
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part9

    * **Optimal way to handle ROI (a ROI object located inside of input of one network is input for another).** It is
    possible to re-use shared input by several networks. You do not need to allocate separate input blob for a network if
@@ -126,7 +126,7 @@ methods:
    ROI without allocation of new memory using `InferenceEngine::make_shared_blob()` with passing of
    `InferenceEngine::Blob::Ptr` and `InferenceEngine::ROI` as parameters.

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part10
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part10

      Make sure that shared input is kept valid during execution of each network. Otherwise, ROI blob may be corrupted if the
      original input blob (that ROI is cropped from) has already been rewritten.
@@ -134,7 +134,7 @@ methods:
    * Allocate input blobs of the appropriate types and sizes, feed an image and the input data to the blobs, and call
    `InferenceEngine::InferRequest::SetBlob()` to set these blobs for an infer request:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part11
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part11

      A blob can be filled before and after `SetBlob()`.

@@ -157,11 +157,11 @@ methods:
 7) **Do inference** by calling the `InferenceEngine::InferRequest::StartAsync` and `InferenceEngine::InferRequest::Wait`
 methods for asynchronous request:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part12
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part12

 or by calling the `InferenceEngine::InferRequest::Infer` method for synchronous request:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part13
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part13

 `StartAsync` returns immediately and starts inference without blocking main thread, `Infer` blocks
 main thread and returns when inference is completed.
@@ -185,7 +185,7 @@ exception.
 Note that casting `Blob` to `TBlob` via `std::dynamic_pointer_cast` is not recommended way,
 better to access data via `buffer()` and `as()` methods as follows:

-@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part14
+@snippet snippets/Integrate_with_customer_application_new_API.cpp part14

 ## Build Your Application

--- a/docs/IE_DG/Introduction.md
+++ b/docs/IE_DG/Introduction.md
@@ -116,13 +116,10 @@ For Intel® Distribution of OpenVINO™ toolkit, the Inference Engine package co
 [sample console applications](Samples_Overview.md) demonstrating how you can use
 the Inference Engine in your applications.

-The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md">Inference Engine Build Instructions</a>.
+The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode">Inference Engine Build Instructions</a>.
 ## See Also
 - [Inference Engine Samples](Samples_Overview.md)
 - [Intel&reg; Deep Learning Deployment Toolkit Web Page](https://software.intel.com/en-us/computer-vision-sdk)


-[scheme]: img/workflow_steps.png
-
-#### Optimization Notice
-<sup>For complete information about compiler optimizations, see our [Optimization Notice](https://software.intel.com/en-us/articles/optimization-notice#opt-en).</sup>
+[scheme]: img/workflow_steps.png
--- a/docs/IE_DG/Migration_CoreAPI.md
+++ b/docs/IE_DG/Migration_CoreAPI.md
@@ -27,44 +27,44 @@ Common migration process includes the following steps:

 1. Migrate from the `InferenceEngine::InferencePlugin` initialization:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part0
+@snippet snippets/Migration_CoreAPI.cpp part0

 to the `InferenceEngine::Core` class initialization:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part1
+@snippet snippets/Migration_CoreAPI.cpp part1

 2. Instead of using `InferenceEngine::CNNNetReader` to read IR:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part2
+@snippet snippets/Migration_CoreAPI.cpp part2

 read networks using the Core class:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part3
+@snippet snippets/Migration_CoreAPI.cpp part3

 The Core class also allows reading models from the ONNX format (more information is [here](./ONNX_Support.md)):

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part4
+@snippet snippets/Migration_CoreAPI.cpp part4

 3. Instead of adding CPU device extensions to the plugin:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part5
+@snippet snippets/Migration_CoreAPI.cpp part5

 add extensions to CPU device using the Core class:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part6
+@snippet snippets/Migration_CoreAPI.cpp part6

 4. Instead of setting configuration keys to a particular plugin, set (key, value) pairs via `InferenceEngine::Core::SetConfig`

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part7
+@snippet snippets/Migration_CoreAPI.cpp part7

 > **NOTE**: If `deviceName` is omitted as the last argument, configuration is set for all Inference Engine devices.

 5. Migrate from loading the network to a particular plugin:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part8
+@snippet snippets/Migration_CoreAPI.cpp part8

 to `InferenceEngine::Core::LoadNetwork` to a particular device:

-@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part9
+@snippet snippets/Migration_CoreAPI.cpp part9

 After you have an instance of `InferenceEngine::ExecutableNetwork`, all other steps are as usual.
--- a/docs/IE_DG/OnnxImporterTutorial.md
+++ b/docs/IE_DG/OnnxImporterTutorial.md
@@ -18,7 +18,7 @@ Two categories of API functions:
 To list all supported ONNX ops in a specific version and domain, use the `get_supported_operators` 
 as shown in the example below:

-@snippet openvino/docs/snippets/OnnxImporterTutorial0.cpp part0
+@snippet snippets/OnnxImporterTutorial0.cpp part0

 The above code produces a list of all the supported operators for the `version` and `domain` you specified and outputs a list similar to this:
 ```cpp
@@ -30,7 +30,7 @@ Xor

 To determine whether a specific ONNX operator in a particular version and domain is supported by the importer, use the `is_operator_supported` function as shown in the example below:

-@snippet openvino/docs/snippets/OnnxImporterTutorial1.cpp part1
+@snippet snippets/OnnxImporterTutorial1.cpp part1

 ## Import ONNX Model

@@ -55,13 +55,13 @@ As it was shown in [Build a Model with nGraph Library](../nGraph_DG/build_functi

 The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the stream as an input:

-@snippet openvino/docs/snippets/OnnxImporterTutorial2.cpp part2
+@snippet snippets/OnnxImporterTutorial2.cpp part2

 ### <a name="path">Filepath as Input</a>

 The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the filepath as an input:

-@snippet openvino/docs/snippets/OnnxImporterTutorial3.cpp part3
+@snippet snippets/OnnxImporterTutorial3.cpp part3

 [onnx_header]: https://github.com/NervanaSystems/ngraph/blob/master/src/ngraph/frontend/onnx_import/onnx.hpp
 [onnx_model_zoo]: https://github.com/onnx/models
--- a/docs/IE_DG/Optimization_notice.md
+++ b/docs/IE_DG/Optimization_notice.md
@@ -1,3 +0,0 @@
-# Optimization Notice {#openvino_docs_IE_DG_Optimization_notice}
-
-![Optimization_notice](img/opt-notice-en_080411.gif)
--- a/docs/IE_DG/Samples_Overview.md
+++ b/docs/IE_DG/Samples_Overview.md
@@ -35,13 +35,15 @@ Inference Engine sample applications include the following:
   - [Object Detection for SSD C Sample](../../inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md)
   - [Object Detection for SSD Python* Sample](../../inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md)
 
+> **NOTE**: All samples support input paths containing only ASCII characters, except the Hello Classification Sample, that supports Unicode.
+
 ## Media Files Available for Samples

 To run the sample applications, you can use images and videos from the media files collection available at https://github.com/intel-iot-devkit/sample-videos.

 ## Samples that Support Pre-Trained Models

-You can download the [pre-trained models](@ref omz_models_intel_index) using the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/).
+To run the sample, you can use [public](@ref omz_models_public_index) or [Intel's](@ref omz_models_intel_index) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader_README).

 ## Build the Sample Applications

@@ -53,7 +55,7 @@ The officially supported Linux* build environment is the following:
 * GCC* 7.5.0 (for Ubuntu* 18.04) or GCC* 4.8.5 (for CentOS* 7.6)
 * CMake* version 3.10 or higher

-> **NOTE**: For building samples from the open-source version of OpenVINO™ toolkit, see the [build instructions on GitHub](https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md).
+> **NOTE**: For building samples from the open-source version of OpenVINO™ toolkit, see the [build instructions on GitHub](https://github.com/openvinotoolkit/openvino/wiki/BuildingCode).

 To build the C or C++ sample applications for Linux, go to the `<INSTALL_DIR>/inference_engine/samples/c` or `<INSTALL_DIR>/inference_engine/samples/cpp` directory, respectively, and run the `build_samples.sh` script:
 ```sh
--- a/docs/IE_DG/ShapeInference.md
+++ b/docs/IE_DG/ShapeInference.md
@@ -1,6 +1,36 @@
 Using Shape Inference {#openvino_docs_IE_DG_ShapeInference}
 ==========================================

+OpenVINO™ provides the following methods for runtime model reshaping:
+
+* **Set a new input shape** with the `InferenceEngine::CNNNetwork::reshape` method.<br>
+   The `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers. 
+   
+> **NOTES**:
+> - Starting with the 2021.1 release, the Model Optimizer converts topologies keeping shape-calculating sub-graphs by default, which enables correct shape propagation during reshaping in most cases.
+> - Older versions of IRs are not guaranteed to reshape successfully. Please regenerate them with the Model Optimizer of the latest version of OpenVINO™.<br>
+> - If an ONNX model does not have a fully defined input shape and the model was imported with the ONNX importer, reshape the model before loading it to the plugin.
+
+* **Set a new batch dimension value** with the `InferenceEngine::CNNNetwork::setBatchSize` method.<br>     
+   The meaning of a model batch may vary depending on the model design.
+   This method does not deduce batch placement for inputs from the model architecture.
+   It assumes that the batch is placed at the zero index in the shape for all inputs and uses the `InferenceEngine::CNNNetwork::reshape` method to propagate updated shapes through the model.
+
+   The method transforms the model before a new shape propagation to relax a hard-coded batch dimension in the model, if any.
+
+   Use `InferenceEngine::CNNNetwork::reshape` instead of `InferenceEngine::CNNNetwork::setBatchSize` to set new input shapes for the model in case the model has:
+   * Multiple inputs with different zero-index dimension meanings
+   * Input without a batch dimension
+   * 0D, 1D, or 3D shape
+
+   The `InferenceEngine::CNNNetwork::setBatchSize` method is a high-level API method that wraps the `InferenceEngine::CNNNetwork::reshape` method call and works for trivial models from the batch placement standpoint.
+   Use `InferenceEngine::CNNNetwork::reshape` for other models.
+
+   Using the `InferenceEngine::CNNNetwork::setBatchSize` method for models with a non-zero index batch placement or for models with inputs that do not have a batch dimension may lead to undefined behaviour.
+    
+You can change input shapes multiple times using the `InferenceEngine::CNNNetwork::reshape` and `InferenceEngine::CNNNetwork::setBatchSize` methods in any order.
+If a model has a hard-coded batch dimension, use `InferenceEngine::CNNNetwork::setBatchSize` first to change the batch, then call `InferenceEngine::CNNNetwork::reshape` to update other dimensions, if needed.
+
 Inference Engine takes three kinds of a model description as an input, which are converted into an `InferenceEngine::CNNNetwork` object:
 1. [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) through `InferenceEngine::Core::ReadNetwork`
 2. [ONNX model](../IE_DG/OnnxImporterTutorial.md) through `InferenceEngine::Core::ReadNetwork`
@@ -23,33 +53,7 @@ for (const auto & parameter : parameters) {

 To feed input data of a shape that is different from the model input shape, reshape the model first.

-OpenVINO™ provides the following methods for runtime model reshaping:
-
-* **Set a new input shape** with the `InferenceEngine::CNNNetwork::reshape` method.<br>
-   The `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers. 
-   You can reshape a model multiple times like in this application scheme:
-   ```
-   ReadNetwork -> reshape(input_1_shape) -> LoadNetwork -> infer(input_1)
-              \
-               -> reshape(input_2_shape) -> LoadNetwork -> infer(input_2)
-   ```
-   > **NOTES**:
-   > - Starting with the 2021.1 release, the Model Optimizer converts topologies keeping shape-calculating sub-graphs by default, which enables correct shape propagation during reshaping.
-   > - Older versions of IRs are not guaranteed to reshape successfully. Please regenerate them with the Model Optimizer of the latest version of OpenVINO™.<br>
-   > - If an ONNX model does not have a fully defined input shape and the model was imported with the ONNX importer, reshape the model before loading it to the plugin.
-* **Set a new batch dimension value** with the `InferenceEngine::CNNNetwork::setBatchSize` method.<br>     
-   The meaning of a model batch may vary depending on the model design.
-   The `InferenceEngine::CNNNetwork::setBatchSize` method deduces the index of a batch dimension based only on the input rank. 
-   This method does not work for models with a non-zero index batch placement or models with inputs without a batch dimension. 
-   The batch-setting algorithm does not involve the shape inference mechanism.
-   Batch of input and output shapes for all layers is set to a new batch value without layer validation.
-   It may cause both positive and negative side effects.
-   Due to the limitations described above, the current method is not recommended to use.
-   If you need to set a new batch size for the model, use the `CNNNetwork::reshape` method instead.
-
-Do not use runtime reshaping methods simultaneously, especially do not call the `CNNNetwork::reshape` method after you use `InferenceEngine::CNNNetwork::setBatchSize`.
-The `InferenceEngine::CNNNetwork::setBatchSize` method causes irreversible conversion of the internal model representation into the legacy model representation.
-The method does not use nGraph for shape inference which leads to reduced reshape opportunities and may affect the performance of the model.
+Once the input shape of `InferenceEngine::CNNNetwork` is set, call the `InferenceEngine::Core::LoadNetwork` method to get an `InferenceEngine::ExecutableNetwork` object for inference with updated shapes.

 There are other approaches to reshape the model during the stage of <a href="_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html#when_to_specify_input_shapes">IR generation</a> or [nGraph::Function creation](../nGraph_DG/build_function.md).

@@ -62,8 +66,8 @@ Shape collision during shape propagation may be a sign that a new shape does not
 Changing the model input shape may result in intermediate operations shape collision.

 Examples of such operations:
- <a href="_docs_MO_DG_prepare_model_convert_model_IR_V10_opset1.html#Reshape">`Reshape` operation</a> with a hard-coded output shape value
- <a href="_docs_MO_DG_prepare_model_convert_model_IR_V10_opset1.html#MatMul">`MatMul` operation</a> with the `Const` second input cannot be resized by spatial dimensions due to operation semantics
+- [`Reshape` operation](../ops/shape/Reshape_1.md) with a hard-coded output shape value
+- [`MatMul` operation](../ops/matrix/MatMul_1.md) with the `Const` second input cannot be resized by spatial dimensions due to operation semantics

 Model structure and logic should not change significantly after model reshaping.
 - The Global Pooling operation is commonly used to reduce output feature map of classification models output.
@@ -94,7 +98,7 @@ The algorithm for resizing network is the following:

 Here is a code example:

-@snippet openvino/docs/snippets/ShapeInference.cpp part0
+@snippet snippets/ShapeInference.cpp part0

 Shape Inference feature is used in [Smart classroom sample](@ref omz_demos_smart_classroom_demo_README).

--- a/docs/IE_DG/img/applying_low_latency.png
+++ b/docs/IE_DG/img/applying_low_latency.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f4f6e9d35869fa2c414e58914aaec1607eb7d4768b69c0cbcce5d5fa3ceddba3
+size 56444
--- a/docs/IE_DG/img/low_latency_limitation_1.png
+++ b/docs/IE_DG/img/low_latency_limitation_1.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:28f4e7ee50785e9c571725942e67c899d08e87af3802f6bea4721c64bfdb2bac
+size 21722
--- a/docs/IE_DG/img/low_latency_limitation_2.png
+++ b/docs/IE_DG/img/low_latency_limitation_2.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:0923af3acfb69dd0b88a5edf097e60c2655828b643d8e328561b13b0196c0850
+size 47997
--- a/docs/IE_DG/img/state_network_example.png
+++ b/docs/IE_DG/img/state_network_example.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:9976341ca931f3ab4e4fbccea26844b738adb27b091149a4c6231eda841ab867
+size 144541
--- a/docs/IE_DG/inference_engine_intro.md
+++ b/docs/IE_DG/inference_engine_intro.md
@@ -7,26 +7,26 @@ Inference Engine is a set of C++ libraries providing a common API to deliver inf

 For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages. 

-The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md">Inference Engine Build Instructions</a>.    
+The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode">Inference Engine Build Instructions</a>.    

 To learn about how to use the Inference Engine API for your application, see the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation.

-For complete API Reference, see the [API Reference](usergroup29.html) section.
+For complete API Reference, see the [Inference Engine API References](./api_references.html) section.

 Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel&reg; hardware device: CPU, GPU, VPU, etc. Each plugin implements the unified API and provides additional hardware-specific APIs.

 Modules in the Inference Engine component
---------------------------------------
+-----------------------------------------

 ### Core Inference Engine Libraries ###

 Your application must link to the core Inference Engine libraries:
 * Linux* OS:
-    - `libinference_engine.so`, which depends on `libinference_engine_transformations.so` and `libngraph.so`
-    - `libinference_engine_legacy.so`, which depends on `libtbb.so`
+    - `libinference_engine.so`, which depends on `libinference_engine_transformations.so`, `libtbb.so`, `libtbbmalloc.so` and `libngraph.so`
 * Windows* OS:
-    - `inference_engine.dll`, which depends on `inference_engine_transformations.dll` and `ngraph.dll`
-    - `inference_engine_legacy.dll`, which depends on `tbb.dll`
+    - `inference_engine.dll`, which depends on `inference_engine_transformations.dll`, `tbb.dll`, `tbbmalloc.dll` and `ngraph.dll`
+* macOS*:
+    - `libinference_engine.dylib`, which depends on `libinference_engine_transformations.dylib`, `libtbb.dylib`, `libtbbmalloc.dylib` and `libngraph.dylib`

 The required C++ header files are located in the `include` directory.

@@ -49,26 +49,26 @@ Starting from 2020.4 release, Inference Engine introduced a concept of `CNNNetwo

 For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available:

-| Plugin   | Device Type   |
-| ------------- | ------------- |
-|CPU|	Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE |
-|GPU| Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
-|MYRIAD|	Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X|
-|GNA|	Intel&reg; Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel&reg; Pentium&reg; Silver J5005 Processor, Intel&reg; Pentium&reg; Silver N5000 Processor, Intel&reg; Celeron&reg; J4005 Processor, Intel&reg; Celeron&reg; J4105 Processor, Intel&reg; Celeron&reg; Processor N4100, Intel&reg; Celeron&reg; Processor N4000, Intel&reg; Core&trade; i3-8121U Processor, Intel&reg; Core&trade; i7-1065G7 Processor, Intel&reg; Core&trade; i7-1060G7 Processor, Intel&reg; Core&trade; i5-1035G4 Processor, Intel&reg; Core&trade; i5-1035G7 Processor, Intel&reg; Core&trade; i5-1035G1 Processor, Intel&reg; Core&trade; i5-1030G7 Processor, Intel&reg; Core&trade; i5-1030G4 Processor, Intel&reg; Core&trade; i3-1005G1 Processor, Intel&reg; Core&trade; i3-1000G1 Processor, Intel&reg; Core&trade; i3-1000G4 Processor
-|HETERO|Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers|
-|MULTI| Simultaneous inference of the same network on several devices in parallel|
+| Plugin  | Device Type                   |
+| ------- | ----------------------------- |
+|CPU      |	Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE |
+|GPU      | Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics |
+|MYRIAD   |	Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X |
+|GNA      |	Intel&reg; Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel&reg; Pentium&reg; Silver J5005 Processor, Intel&reg; Pentium&reg; Silver N5000 Processor, Intel&reg; Celeron&reg; J4005 Processor, Intel&reg; Celeron&reg; J4105 Processor, Intel&reg; Celeron&reg; Processor N4100, Intel&reg; Celeron&reg; Processor N4000, Intel&reg; Core&trade; i3-8121U Processor, Intel&reg; Core&trade; i7-1065G7 Processor, Intel&reg; Core&trade; i7-1060G7 Processor, Intel&reg; Core&trade; i5-1035G4 Processor, Intel&reg; Core&trade; i5-1035G7 Processor, Intel&reg; Core&trade; i5-1035G1 Processor, Intel&reg; Core&trade; i5-1030G7 Processor, Intel&reg; Core&trade; i5-1030G4 Processor, Intel&reg; Core&trade; i3-1005G1 Processor, Intel&reg; Core&trade; i3-1000G1 Processor, Intel&reg; Core&trade; i3-1000G4 Processor |
+|HETERO   | Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers|
+|MULTI    | Simultaneous inference of the same network on several devices in parallel|

-The table below shows the plugin libraries and additional dependencies for Linux and Windows platforms.
+The table below shows the plugin libraries and additional dependencies for Linux, Windows and macOS platforms.

-| Plugin | Library name for Linux | Dependency libraries for Linux                  | Library name for Windows | Dependency libraries for Windows                                                                       |
-|--------|------------------------|-------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------|
-| CPU    | `libMKLDNNPlugin.so`   | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll`       | `inference_engine_lp_transformations.dll`    |
-| GPU    | `libclDNNPlugin.so`    | `libinference_engine_lp_transformations.so`, `libOpenCL.so`                                  | `clDNNPlugin.dll`        | `OpenCL.dll`, `inference_engine_lp_transformations.dll`                                                                                           |
-| MYRIAD | `libmyriadPlugin.so`   | `libusb.so`, `libinference_engine_lp_transformations.so`                                 | `myriadPlugin.dll`       | `usb.dll`, `inference_engine_lp_transformations.dll`                                                                                        |
-| HDDL   | `libHDDLPlugin.so`     | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`, `libinference_engine_lp_transformations.so`| `HDDLPlugin.dll`         | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`, `inference_engine_lp_transformations.dll` |
-| GNA    | `libGNAPlugin.so`      | `libgna.so`, `libinference_engine_lp_transformations.so`                                 | `GNAPlugin.dll`          | `gna.dll`, `inference_engine_lp_transformations.dll`                                                                                              |
-| HETERO | `libHeteroPlugin.so`   | Same as for selected plugins                    | `HeteroPlugin.dll`       | Same as for selected plugins                                                                           |
-| MULTI  | `libMultiDevicePlugin.so`   | Same as for selected plugins               | `MultiDevicePlugin.dll`  | Same as for selected plugins                                                                           |
+| Plugin | Library name for Linux      | Dependency libraries for Linux                              | Library name for Windows | Dependency libraries for Windows                                                                       | Library name for macOS       | Dependency libraries for macOS              |
+|--------|-----------------------------|-------------------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------|------------------------------|---------------------------------------------|
+| CPU    | `libMKLDNNPlugin.so`        | `libinference_engine_lp_transformations.so`                 | `MKLDNNPlugin.dll`       | `inference_engine_lp_transformations.dll`                                                              | `libMKLDNNPlugin.dylib`      | `inference_engine_lp_transformations.dylib` |
+| GPU    | `libclDNNPlugin.so`         | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll`        | `OpenCL.dll`, `inference_engine_lp_transformations.dll`                                                |  Is not supported            |  -                                          |
+| MYRIAD | `libmyriadPlugin.so`        | `libusb.so`,                                                | `myriadPlugin.dll`       | `usb.dll`                                                                                              | `libmyriadPlugin.dylib`      | `libusb.dylib`                              |
+| HDDL   | `libHDDLPlugin.so`          | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`             | `HDDLPlugin.dll`         | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll` |  Is not supported            |  -                                          |
+| GNA    | `libGNAPlugin.so`           | `libgna.so`,                                                | `GNAPlugin.dll`          | `gna.dll`                                                                                              |  Is not supported            |  -                                          |
+| HETERO | `libHeteroPlugin.so`        | Same as for selected plugins                                | `HeteroPlugin.dll`       | Same as for selected plugins                                                                           | `libHeteroPlugin.dylib`      |  Same as for selected plugins               |
+| MULTI  | `libMultiDevicePlugin.so`   | Same as for selected plugins                                | `MultiDevicePlugin.dll`  | Same as for selected plugins                                                                           | `libMultiDevicePlugin.dylib` |  Same as for selected plugins               |

 > **NOTE**: All plugin libraries also depend on core Inference Engine libraries.

@@ -76,15 +76,16 @@ Make sure those libraries are in your computer's path or in the place you pointe

 * Linux: `LD_LIBRARY_PATH`
 * Windows: `PATH`
+* macOS: `DYLD_LIBRARY_PATH`

-On Linux, use the script `bin/setupvars.sh` to set the environment variables.
+On Linux and macOS, use the script `bin/setupvars.sh` to set the environment variables.

 On Windows, run the `bin\setupvars.bat` batch file to set the environment variables.

 To learn more about supported devices and corresponding plugins, see the [Supported Devices](supported_plugins/Supported_Devices.md) chapter.

 Common Workflow for Using the Inference Engine API
---------------------------
+--------------------------------------------------
 The common workflow contains the following steps:

 1. **Create Inference Engine Core object** - Create an `InferenceEngine::Core` object to work with different devices, all device plugins are managed internally by the `Core` object. Register extensions with custom nGraph operations (`InferenceEngine::Core::AddExtension`).
--- a/docs/IE_DG/network_state_intro.md
+++ b/docs/IE_DG/network_state_intro.md
@@ -0,0 +1,275 @@
+Introduction to OpenVINO state API {#openvino_docs_IE_DG_network_state_intro}
+==============================
+
+This section describes how to work with stateful networks in OpenVINO toolkit, specifically:
+* How stateful networks are represented in IR and nGraph
+* How operations with state can be done
+
+The section additionally provides small examples of stateful network and code to infer it.
+
+## What is a stateful network
+
+ Several use cases require processing of data sequences. When length of a sequence is known and small enough, 
+ we can process it with RNN like networks that contain a cycle inside. But in some cases, like online speech recognition of time series 
+ forecasting, length of data sequence is unknown. Then data can be divided in small portions and processed step-by-step. But dependency 
+ between data portions should be addressed. For that, networks save some data between inferences - state. When one dependent sequence is over,
+ state should be reset to initial value and new sequence can be started.
+ 
+ Several frameworks have special API for states in networks. For example, Keras have special option for RNNs `stateful` that turns on saving state 
+ between inferences. Kaldi contains special specifier `Offset` to define time offset in a network. 
+ 
+ OpenVINO also contains special API to simplify work with networks with states. State is automatically saved between inferences, 
+ and there is a way to reset state when needed. You can also read state or set it to some new value between inferences.
+ 
+## OpenVINO state representation
+
+ OpenVINO contains special abstraction variable to represent state in a network. There are two operations to work with state: 
+* `Assign` to save value in state
+* `ReadValue` to read value saved on previous iteration
+
+You can find more details on these operations in [ReadValue specification](../ops/infrastructure/ReadValue_3.md) and 
+[Assign specification](../ops/infrastructure/Assign_3.md).
+
+## Examples of representation of a network with states
+
+To get a model with states ready for inference, you can convert a model from another framework to IR with Model Optimizer or create an nGraph function 
+(details can be found in [Build nGraph Function section](../nGraph_DG/build_function.md)). 
+Let's represent the following graph in both forms:
+![state_network_example]
+
+### Example of IR with state
+
+The `bin` file for this graph should contain float 0 in binary form. Content of `xml` is the following.
+
+```xml
+<?xml version="1.0" ?>
+<net name="summator" version="10">
+	<layers>
+		<layer id="0" name="init_value" type="Const" version="opset5">
+			<data element_type="f32" offset="0" shape="1,1" size="4"/>
+			<output>
+				<port id="1" precision="FP32">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="1" name="read" type="ReadValue" version="opset5">
+			<data variable_id="id"/>
+			<input>
+				<port id="0">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</input>
+			<output>
+				<port id="1" precision="FP32">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="2" name="input" type="Parameter" version="opset5">
+			<data element_type="f32" shape="1,1"/>
+			<output>
+				<port id="0" precision="FP32">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="3" name="add_sum" type="Add" version="opset5">
+			<input>
+				<port id="0">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+				<port id="1">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</input>
+			<output>
+				<port id="2" precision="FP32">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="4" name="save" type="Assign" version="opset5">
+			<data variable_id="id"/>
+			<input>
+				<port id="0">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</input>
+		</layer>
+                <layer id="10" name="add" type="Add" version="opset5">
+			<data axis="1"/>
+			<input>
+				<port id="0">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+				<port id="1">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</input>
+			<output>
+				<port id="2" precision="FP32">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="5" name="output/sink_port_0" type="Result" version="opset5">
+			<input>
+				<port id="0">
+					<dim>1</dim>
+					<dim>1</dim>
+				</port>
+			</input>
+		</layer>
+	</layers>
+	<edges>
+		<edge from-layer="0" from-port="1" to-layer="1" to-port="0"/>
+                <edge from-layer="2" from-port="0" to-layer="3" to-port="1"/>
+                <edge from-layer="1" from-port="1" to-layer="3" to-port="0"/>
+                <edge from-layer="3" from-port="2" to-layer="4" to-port="0"/>
+                <edge from-layer="3" from-port="2" to-layer="10" to-port="0"/> 
+                <edge from-layer="1" from-port="1" to-layer="10" to-port="1"/>
+                <edge from-layer="10" from-port="2" to-layer="5" to-port="0"/>
+	</edges>
+	<meta_data>
+		<MO_version value="unknown version"/>
+		<cli_parameters>
+		</cli_parameters>
+	</meta_data>
+</net>
+```
+
+### Example of creating model nGraph API
+
+```cpp
+    auto arg = make_shared<op::Parameter>(element::f32, Shape{1, 1});
+    auto init_const = op::Constant::create(element::f32, Shape{1, 1}, {0});
+    auto read = make_shared<op::ReadValue>(init_const, "v0");
+    std::vector<shared_ptr<Node>> args = {arg, read};
+    auto add = make_shared<op::Add>(arg, read);
+    auto assign = make_shared<op::Assign>(add, "v0");
+    auto add2 = make_shared<op::Add>(add, read);
+    auto res = make_shared<op::Result>(add2);
+
+    auto f = make_shared<Function>(ResultVector({res}), ParameterVector({arg}), SinkVector({assign}));
+```
+
+In this example, `SinkVector` is used to create `ngraph::Function`. For network with states, except inputs and outputs,  `Assign` nodes should also point to `Function` 
+to avoid deleting it during graph transformations. You can do it with the constructor, as shown in the example, or with the special method `add_sinks(const SinkVector& sinks)`. Also you can delete 
+sink from `ngraph::Function` after deleting the node from graph with the `delete_sink()` method.
+
+## OpenVINO state API
+
+ Inference Engine has the `InferRequest::QueryState` method  to get the list of states from a network and `IVariableState` interface to operate with states. Below you can find brief description of methods and the workable example of how to use this interface.  
+ is below and next section contains small workable example how this interface can be used.
+ 
+ * `std::string GetName() const`
+   returns name(variable_id) of according Variable
+ * `void Reset()`
+   reset state to default value
+ * `void SetState(Blob::Ptr newState)`
+   set new value for state
+ * `Blob::CPtr GetState() const`
+   returns current value of state
+
+## Example of stateful network inference
+
+Let's take an IR from the previous section example. The example below demonstrates inference of two independent sequences of data. State should be reset between these sequences.
+
+One infer request and one thread 
+will be used in this example. Using several threads is possible if you have several independent sequences. Then each sequence can be processed in its own infer 
+request. Inference of one sequence in several infer requests is not recommended. In one infer request state will be saved automatically between inferences, but 
+if the first step is done in one infer request and the second in another, state should be set in new infer request manually (using `IVariableState::SetState` method).
+
+@snippet openvino/docs/snippets/InferenceEngine_network_with_state_infer.cpp part1
+
+You can find more powerful examples demonstrating how to work with networks with states in speech sample and demo. 
+Decsriptions can be found in [Samples Overview](./Samples_Overview.md)
+
+[state_network_example]: ./img/state_network_example.png
+
+
+## LowLatency transformation
+
+If the original framework does not have a special API for working with states, after importing the model, OpenVINO representation will not contain Assign/ReadValue layers. For example, if the original ONNX model contains RNN operations, IR will contain TensorIterator operations and the values will be obtained only after the execution of whole TensorIterator primitive, intermediate values from each iteration will not be available. To be able to work with these intermediate values of each iteration and receive them with a low latency after each infer request, a special LowLatency transformation was introduced.
+
+LowLatency transformation changes the structure of the network containing [TensorIterator](../ops/infrastructure/TensorIterator_1.md) by adding the ability to work with state, inserting Assign/ReadValue layers as it is shown in the picture below.
+
+![applying_low_latency_example](./img/applying_low_latency.png)
+
+### Steps to apply LowLatency transformation
+
+1. Get CNNNetwork. Any way is acceptable:
+
+	* [from IR or ONNX model](Integrate_with_customer_application_new_API.md#integration-steps)
+	* [from nGraph Function](../nGraph_DG/build_function.md)
+
+2. [Reshape](ShapeInference) CNNNetwork network if necessary 
+**Necessary case:** the sequence_lengths dimention of input > 1, it means TensorIterator layer will have number_iterations > 1. We should reshape the inputs of the network to set sequence_dimension exactly to 1.
+```cpp
+
+// Network before reshape: Parameter (name: X, shape: [2 (sequence_lengths), 1, 16]) -> TensorIterator (num_iteration = 2, axis = 0) -> ...
+
+cnnNetwork.reshape({"X" : {1, 1, 16});
+
+// Network after reshape: Parameter (name: X, shape: [1 (sequence_lengths), 1, 16]) -> TensorIterator (num_iteration = 1, axis = 0) -> ...
+	
+```
+
+3. Apply LowLatency transformation
+```cpp
+#include "ie_transformations.hpp"
+
+...
+
+InferenceEngine::LowLatency(cnnNetwork);
+```
+**State naming rule:**  a name of state is a concatenation of names: original TensorIterator operation, Parameter of the body, and additional suffix "variable_" + id (0-base indexing, new indexing for each TensorIterator), for example:
+```
+tensor_iterator_name = "TI_name"
+body_parameter_name = "param_name"
+
+state_name = "TI_name/param_name/variable_0"
+```
+4. [Use state API](#openvino-state-api)
+
+ 
+### Known limitations
+1. Parameters are directly connected to States (ReadValues).
+
+	Removing Parameters from `ngraph::Function` is not possible.
+
+	![low_latency_limitation_1](./img/low_latency_limitation_1.png)
+
+	**Current solution:** replace Parameter with Constant (freeze) with the value [0, 0, 0 … 0] via [ModelOptimizer CLI](../MO_DG/prepare_model/convert_model/Converting_Model_General.md) `--input` or `--freeze_placeholder_with_value`.
+
+2.  Non-reshapable network.
+
+	Value of shapes is hard-coded somewhere in the network. 
+
+	![low_latency_limitation_2](./img/low_latency_limitation_2.png)
+
+	**Current solution:** trim non-reshapable layers via [ModelOptimizer CLI](../MO_DG/prepare_model/convert_model/Converting_Model_General.md) `--input`, `--output` or via nGraph.
+
+```cpp
+	// nGraph example:
+	auto func = cnnNetwork.getFunction();
+	auto new_const = std::make_shared<ngraph::opset5::Constant>(); // type, shape, value
+	for (const auto& node : func->get_ops()) {
+		if (node->get_friendly_name() == "name_of_non_reshapable_const") {
+			auto bad_const = std::dynamic_pointer_cast<ngraph::opset5::Constant>(node);
+			ngraph::replace_node(bad_const, new_const); // replace constant
+		}
+	}
+```
--- a/docs/IE_DG/protecting_model_guide.md
+++ b/docs/IE_DG/protecting_model_guide.md
@@ -33,7 +33,7 @@ a temporary memory block for model decryption, and use
 For more information, see the `InferenceEngine::Core` Class
 Reference Documentation.

-@snippet openvino/docs/snippets/protecting_model_guide.cpp part0
+@snippet snippets/protecting_model_guide.cpp part0

 Hardware-based protection, such as Intel&reg; Software Guard Extensions
 (Intel&reg; SGX), can be utilized to protect decryption operation secrets and
@@ -47,7 +47,7 @@ Currently there are no possibility to read external weights from memory for ONNX
 The `ReadNetwork(const std::string& model, const Blob::CPtr& weights)` function
 should be called with `weights` passed as an empty `Blob`.

-@snippet openvino/docs/snippets/protecting_model_guide.cpp part1
+@snippet snippets/protecting_model_guide.cpp part1

 [deploy_encrypted_model]: img/deploy_encrypted_model.png

@@ -57,7 +57,6 @@ should be called with `weights` passed as an empty `Blob`.
 - OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org)
 - Model Optimizer Developer Guide: [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
 - Inference Engine Developer Guide: [Inference Engine Developer Guide](Deep_Learning_Inference_Engine_DevGuide.md)
- For more information on Sample Applications, see the [Inference Engine Samples Overview](Samples_Overview.html)
+- For more information on Sample Applications, see the [Inference Engine Samples Overview](Samples_Overview.md)
 - For information on a set of pre-trained models, see the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index)
- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic)
 - For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit).
--- a/docs/IE_DG/supported_plugins/CL_DNN.md
+++ b/docs/IE_DG/supported_plugins/CL_DNN.md
@@ -1,9 +1,30 @@
 GPU Plugin {#openvino_docs_IE_DG_supported_plugins_CL_DNN}
 =======

-The GPU plugin uses the Intel&reg; Compute Library for Deep Neural Networks ([clDNN](https://01.org/cldnn)) to infer deep neural networks.
-clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel&reg; Processor Graphics including Intel&reg; HD Graphics and Intel&reg; Iris&reg; Graphics.
-For an in-depth description of clDNN, see: [clDNN sources](https://github.com/intel/clDNN) and [Accelerate Deep Learning Inference with Intel&reg; Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
+The GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks.
+clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel® Processor Graphics including Intel® HD Graphics, Intel® Iris® Graphics, Intel® Iris® Xe Graphics, and Intel® Iris® Xe MAX graphics.
+For an in-depth description of clDNN, see [Inference Engine source files](https://github.com/openvinotoolkit/openvino/tree/master/inference-engine/src/cldnn_engine) and [Accelerate Deep Learning Inference with Intel® Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
+
+## Device Naming Convention
+* Devices are enumerated as "GPU.X" where `X={0, 1, 2,...}`. Only Intel® GPU devices are considered.
+* If the system has an integrated GPU, it always has id=0 ("GPU.0").
+* Other GPUs have undefined order that depends on the GPU driver.
+* "GPU" is an alias for "GPU.0"
+* If the system doesn't have an integrated GPU, then devices are enumerated starting from 0.
+
+For demonstration purposes, see the [Hello Query Device C++ Sample](../../../inference-engine/samples/hello_query_device/README.md) that can print out the list of available devices with associated indices. Below is an example output (truncated to the device names only):
+
+```sh
+./hello_query_device
+Available devices: 
+    Device: CPU
+...
+    Device: GPU.0
+...
+    Device: GPU.1
+...
+    Device: HDDL
+```    

 ## Optimizations

@@ -92,7 +113,7 @@ When specifying key values as raw strings (that is, when using Python API), omit
 | `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>`                       | `0`               | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)<br> Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. |
 | `KEY_CLDNN_GRAPH_DUMPS_DIR` | `"<dump_dir>"`                       | `""`               | clDNN graph optimizer stages dump output directory (in GraphViz format)                                     |
 | `KEY_CLDNN_SOURCES_DUMPS_DIR` | `"<dump_dir>"`                       | `""`               | Final optimized clDNN OpenCL sources dump output directory                                   |
-| `KEY_GPU_THROUGHPUT_STREAMS`  | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).<br>This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low. <br>The default value is 1, which implies latency-oriented behaviour.<br>`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams). <br> A positive integer value creates the requested number of streams. |
+| `KEY_GPU_THROUGHPUT_STREAMS`  | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).<br>This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low. <br>The default value is 1, which implies latency-oriented behavior.<br>`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams). <br> A positive integer value creates the requested number of streams. |
 | `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO`                | `NO`              | Forces async requests (also from different executable networks) to execute serially.|

 ## Note on Debug Capabilities of the GPU Plugin
--- a/docs/IE_DG/supported_plugins/GNA.md
+++ b/docs/IE_DG/supported_plugins/GNA.md
@@ -2,98 +2,98 @@

 ## Introducing the GNA Plugin

-Intel&reg; Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.
+Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.

-Intel&reg; GNA is not intended to replace classic inference devices such as
-CPU, graphics processing unit (GPU), or vision processing unit (VPU) . It is designed for offloading 
+Intel® GNA is not intended to replace classic inference devices such as
+CPU, graphics processing unit (GPU), or vision processing unit (VPU). It is designed for offloading 
 continuous inference workloads including but not limited to noise reduction or speech recognition 
 to save power and free CPU resources.

-The GNA plugin provides a way to run inference on Intel&reg; GNA, as well as in the software execution mode on CPU.
+The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU.

-## Devices with Intel&reg; GNA
+## Devices with Intel® GNA

-Devices with Intel&reg; GNA support:
+Devices with Intel® GNA support:

-* [Intel&reg; Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html)
+* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html)

-* [Amazon Alexa* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice)
+* [Amazon Alexa\* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice)

-* [Intel&reg; Pentium&reg; Silver Processors N5xxx, J5xxx and Intel&reg; Celeron&reg; Processors N4xxx, J4xxx](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html):
-	- Intel&reg; Pentium&reg; Silver J5005 Processor
-	- Intel&reg; Pentium&reg; Silver N5000 Processor
-	- Intel&reg; Celeron&reg; J4005 Processor
-	- Intel&reg; Celeron&reg; J4105 Processor
-	- Intel&reg; Celeron&reg; Processor N4100
-	- Intel&reg; Celeron&reg; Processor N4000
+* [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html):
+	- Intel® Pentium® Silver J5005 Processor
+	- Intel® Pentium® Silver N5000 Processor
+	- Intel® Celeron® J4005 Processor
+	- Intel® Celeron® J4105 Processor
+	- Intel® Celeron® Processor N4100
+	- Intel® Celeron® Processor N4000

-* [Intel&reg; Core&trade; Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html):
-Intel&reg; Core&trade; i3-8121U Processor
+* [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html):
+Intel® Core™ i3-8121U Processor

-* [10th Generation Intel&reg; Core&trade; Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html):
-	- Intel&reg; Core&trade; i7-1065G7 Processor
-	- Intel&reg; Core&trade; i7-1060G7 Processor
-	- Intel&reg; Core&trade; i5-1035G4 Processor
-	- Intel&reg; Core&trade; i5-1035G7 Processor
-	- Intel&reg; Core&trade; i5-1035G1 Processor
-	- Intel&reg; Core&trade; i5-1030G7 Processor
-	- Intel&reg; Core&trade; i5-1030G4 Processor
-	- Intel&reg; Core&trade; i3-1005G1 Processor
-	- Intel&reg; Core&trade; i3-1000G1 Processor
-	- Intel&reg; Core&trade; i3-1000G4 Processor
+* [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html):
+	- Intel® Core™ i7-1065G7 Processor
+	- Intel® Core™ i7-1060G7 Processor
+	- Intel® Core™ i5-1035G4 Processor
+	- Intel® Core™ i5-1035G7 Processor
+	- Intel® Core™ i5-1035G1 Processor
+	- Intel® Core™ i5-1030G7 Processor
+	- Intel® Core™ i5-1030G4 Processor
+	- Intel® Core™ i3-1005G1 Processor
+	- Intel® Core™ i3-1000G1 Processor
+	- Intel® Core™ i3-1000G4 Processor

-* All [11th Generation Intel&reg; Core&trade; Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html).
+* All [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html).

-> **NOTE**: On platforms where Intel&reg; GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.
+> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.

 ## Drivers and Dependencies

-Intel&reg; GNA hardware requires a driver to be installed on the system.
+Intel® GNA hardware requires a driver to be installed on the system.

 * Linux\* OS:
-[Download Intel&reg; GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/)
+[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/)

 * Windows\* OS:
-Intel&reg; GNA driver for Windows is available through Windows Update\*
+Intel® GNA driver for Windows is available through Windows Update\*

 ## Models and Layers Limitations

-Because of specifics of hardware architecture, Intel&reg; GNA supports a limited set of layers, their kinds and combinations.
-For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support
-2D convolutions.
+Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations.
+For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted 
+for the GNA Plugin, because the plugin does not fully support 2D convolutions.
+
+For the list of supported layers, see the **GNA** column of the **Supported Layers** section in [Supported Devices](Supported_Devices.md).

-The list of supported layers can be found
-[here](Supported_Devices.md) (see the GNA column of Supported Layers section).
 Limitations include:

 - Only 1D convolutions are natively supported in the models converted from:
-	- [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework;
-	- [TensorFlow](../../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) framework; note that for TensorFlow models, the option `--disable_nhwc_to_nchw` must be used when running the Model Optimizer.
- The number of output channels for convolutions must be a multiple of 4
- Permute layer support is limited to the cases where no data reordering is needed, or when reordering is happening for 2 dimensions, at least one of which is not greater than 8
+	- [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework
+	- [TensorFlow](../../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) framework. For TensorFlow models, use the `--disable_nhwc_to_nchw` option when running the Model Optimizer.
+- The number of output channels for convolutions must be a multiple of 4.
+- Permute layer support is limited to the cases where no data reordering is needed or when reordering is happening for two dimensions, at least one of which is not greater than 8.

 #### Experimental Support for 2D Convolutions

-The Intel&reg; GNA hardware natively supports only 1D convolution.
+The Intel® GNA hardware natively supports only 1D convolution.

-However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. Such a transformation is performed by the GNA Plugin for Kaldi `nnet1` convolution. From this perspective, the Intel&reg; GNA hardware convolution operation accepts a `NHWC` input and produces `NHWC` output. Because OpenVINO&trade; only supports the `NCHW` layout, it may be necessary to insert `Permute` layers before or after convolutions.
+However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. GNA Plugin performs such a transformation for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts an `NHWC` input and produces an `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, you may need to insert `Permute` layers before or after convolutions.

-For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel&reg; GNA hardware convolution layer already produces the required `NHWC` result.
+For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result.

 ## Operation Precision

-Intel&reg; GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations, so compared to 32-bit floating point (`FP32`) results – for example, calculated on CPU using Inference Engine [CPU Plugin](CPU.md) – outputs calculated using reduced integer precision are different from the scores calculated using floating point.
+Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations. Outputs calculated using a reduced integer precision are different from the scores calculated using the floating point format, for example, `FP32` outputs calculated on CPU using the Inference Engine [CPU Plugin](CPU.md).

-Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so a model can run without calibration.
+Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so you can run a model without calibration.

-## <a name="execution-models">Execution Modes</a>
+## <a name="execution-modes">Execution Modes</a>

 | Mode | Description |
 | :---------------------------------| :---------------------------------------------------------|
-| `GNA_AUTO` | Uses Intel&reg; GNA if available, otherwise uses software execution mode on CPU. |
-| `GNA_HW` | Uses Intel&reg; GNA if available, otherwise raises an error. |
-| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel&reg; GNA, but not in the bit-exact mode. |
-| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel&reg; GNA in the bit-exact mode. |
+| `GNA_AUTO` | Uses Intel® GNA if available, otherwise uses software execution mode on CPU. |
+| `GNA_HW` | Uses Intel® GNA if available, otherwise raises an error. |
+| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode. |
+| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode. |
 | `GNA_SW_FP32` | Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (`FP32`). |

 ## Supported Configuration Parameters
@@ -101,42 +101,42 @@ Unlike other plugins supporting low-precision execution, the GNA plugin calculat
 The plugin supports the configuration parameters listed below.
 The parameters are passed as `std::map<std::string, std::string>` on `InferenceEngine::Core::LoadNetwork` or `InferenceEngine::SetConfig`.

-The parameter `KEY_GNA_DEVICE_MODE` can also be changed at run time using `InferenceEngine::ExecutableNetwork::SetConfig` (for any values excluding `GNA_SW_FP32`). This allows switching the
+You can change the `KEY_GNA_DEVICE_MODE` parameter at run time using `InferenceEngine::ExecutableNetwork::SetConfig`, which works for any value excluding `GNA_SW_FP32`. This enables you to switch the
 execution between software emulation mode and hardware emulation mode after the model is loaded.

 The parameter names below correspond to their usage through API keys, such as `GNAConfigParams::KEY_GNA_DEVICE_MODE` or `PluginConfigParams::KEY_PERF_COUNT`.
-When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix.
+When specifying key values as raw strings, that is, when using Python API, omit the `KEY_` prefix.

 | Parameter Name                    | Parameter Values                                          | Default Value     | Description                                                              |
 | :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------|
-| `KEY_GNA_COMPACT_MODE`            | `YES`/`NO`                                                | `NO`       | Reuse I/O buffers to save space (makes debugging harder)                 |
-| `KEY_GNA_SCALE_FACTOR`            | `FP32` number                                             | 1.0         | Scale factor to use for input quantization                               |
-| `KEY_GNA_DEVICE_MODE`             | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO`  | One of the modes described <a name="execution-models">Execution Models</a> |
-| `KEY_GNA_FIRMWARE_MODEL_IMAGE`    | `std::string`                                             | `""`        | Name for embedded model binary dump file                                 |
-| `KEY_GNA_PRECISION`               | `I16`/`I8`                                                | `I16`       | Hint to GNA plugin: preferred integer weight resolution for quantization |
-| `KEY_PERF_COUNT`                  | `YES`/`NO`                                                | `NO`        | Turn on performance counters reporting                                   |
-| `KEY_GNA_LIB_N_THREADS`           | 1-127 integer number                                      | 1           | Sets the number of GNA accelerator library worker threads used for inference computation in software modes
+| `KEY_GNA_COMPACT_MODE`            | `YES`/`NO`                                                | `YES`       | Enables I/O buffers reuse to save space. Makes debugging harder.                 |
+| `KEY_GNA_SCALE_FACTOR`            | `FP32` number                                             | 1.0         | Sets the scale factor to use for input quantization.                               |
+| `KEY_GNA_DEVICE_MODE`             | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO`  | One of the modes described in <a href="#execution-modes">Execution Modes</a> |
+| `KEY_GNA_FIRMWARE_MODEL_IMAGE`    | `std::string`                                             | `""`        | Sets the name for the embedded model binary dump file.                                 |
+| `KEY_GNA_PRECISION`               | `I16`/`I8`                                                | `I16`       | Sets the preferred integer weight resolution for quantization. |
+| `KEY_PERF_COUNT`                  | `YES`/`NO`                                                | `NO`        | Turns on performance counters reporting.                                   |
+| `KEY_GNA_LIB_N_THREADS`           | 1-127 integer number                                      | 1           | Sets the number of GNA accelerator library worker threads used for inference computation in software modes.

 ## How to Interpret Performance Counters

 As a result of collecting performance counters using `InferenceEngine::InferRequest::GetPerformanceCounts`, you can find various performance data about execution on GNA.
-Returned map stores a counter description as a key, counter value is stored in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows:
+Returned map stores a counter description as a key, and a counter value in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. The API enables you to retrieve counter units in cycles, you can convert cycles to seconds as follows:

 ```
 seconds = cycles / frequency
 ```

-Refer to the table below to learn about the frequency of Intel&reg; GNA inside a particular processor.
-Processor | Frequency of Intel&reg; GNA
+Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor.
+Processor | Frequency of Intel® GNA
 ---|---
-Intel&reg; Ice Lake processors| 400MHz
-Intel&reg; Core&trade; i3-8121U processor| 400MHz
-Intel&reg; Gemini Lake  processors | 200MHz
+Intel® Ice Lake processors| 400MHz
+Intel® Core™ i3-8121U processor| 400MHz
+Intel® Gemini Lake  processors | 200MHz

 Performance counters provided for the time being:

 * Scoring request performance results
-	* Number of total cycles spent on scoring in hardware (including compute and memory stall cycles)
+	* Number of total cycles spent on scoring in hardware including compute and memory stall cycles
 	* Number of stall cycles spent in hardware

 ## Multithreading Support in GNA Plugin
@@ -151,40 +151,40 @@ The GNA plugin supports the following configuration parameters for multithreadin

 ## Network Batch Size

-Intel&reg; GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one
+Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one
 input blob using `InferenceEngine::ICNNNetwork::setBatchSize`. Increasing batch size only improves efficiency of `Fully Connected` layers.

 > **NOTE**: For networks with `Convolutional`, `LSTM`, or `Memory` layers, the only supported batch size is 1.

 ## Compatibility with Heterogeneous Plugin

-Heterogeneous plugin was tested with the Intel&reg; GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin (for example, Softmax), use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration. For the list of supported networks, see the [Supported Frameworks](#supported-frameworks).
+Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin, such as Softmax, use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration.

-> **NOTE:** Due to limitation of the Intel&reg; GNA backend library, heterogeneous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices.
+> **NOTE:** Due to limitation of the Intel® GNA backend library, heterogenous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices.

-## Recovery from interruption by high-priority Windows audio processes\*
+## Recovery from Interruption by High-Priority Windows Audio Processes\*

-As noted in the introduction, GNA is designed for real-time workloads such as noise reduction.
+GNA is designed for real-time workloads such as noise reduction.
 For such workloads, processing should be time constrained, otherwise extra delays may cause undesired effects such as
-audio "glitches". To make sure that processing can satisfy real time requirements, the GNA driver provides a QoS
-(Quality of Service) mechanism which interrupts requests that might cause high-priority Windows audio processes to miss
-schedule, thereby causing long running GNA tasks to terminate early.
+*audio glitches*. To make sure that processing can satisfy real-time requirements, the GNA driver provides a Quality of Service
+(QoS) mechanism, which interrupts requests that might cause high-priority Windows audio processes to miss
+the schedule, thereby causing long running GNA tasks to terminate early.

 Applications should be prepared for this situation.
-If an inference (in `GNA_HW` mode) cannot be executed because of such an interruption, then `InferRequest::Wait()` will return status code
-`StatusCode::INFER_NOT_STARTED` (note that it will be changed to a more meaningful status code in future releases).
+If an inference in the `GNA_HW` mode cannot be executed because of such an interruption, then `InferRequest::Wait()` returns status code
+`StatusCode::INFER_NOT_STARTED`. In future releases, it will be changed to a more meaningful status code.

-Any application working with GNA must properly react if it receives this code. Various strategies are possible.
-One of the options is to immediately switch to GNA SW emulation mode:
+Any application working with GNA must properly react to this code.
+One of the strategies to adapt an application:

+1. Immediately switch to the GNA_SW emulation mode:
 ```cpp
 std::map<std::string, Parameter> newConfig;
 newConfig[GNAConfigParams::KEY_GNA_DEVICE_MODE] = Parameter("GNA_SW_EXACT");
 executableNet.SetConfig(newConfig);

 ```
-
-then resubmit and switch back to GNA_HW after some time hoping that the competing application has finished.
+2. Resubmit and switch back to GNA_HW expecting that the competing application has finished.

 ## See Also

--- a/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md
+++ b/docs/IE_DG/supported_plugins/GPU_RemoteBlob_API.md
@@ -102,15 +102,15 @@ Refer to the sections below to see pseudo-code of usage examples.

 This example uses the OpenCL context obtained from an executable network object.

-@snippet openvino/docs/snippets/GPU_RemoteBlob_API0.cpp part0
+@snippet snippets/GPU_RemoteBlob_API0.cpp part0

 ### Running GPU Plugin Inference within User-Supplied Shared Context

-@snippet openvino/docs/snippets/GPU_RemoteBlob_API1.cpp part1
+@snippet snippets/GPU_RemoteBlob_API1.cpp part1

 ### Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux

-@snippet openvino/docs/snippets/GPU_RemoteBlob_API2.cpp part2
+@snippet snippets/GPU_RemoteBlob_API2.cpp part2

 ## See Also

--- a/docs/IE_DG/supported_plugins/HDDL.md
+++ b/docs/IE_DG/supported_plugins/HDDL.md
@@ -21,7 +21,7 @@ For the "Supported Networks", please reference to [MYRIAD Plugin](MYRIAD.md)
 See VPU common configuration parameters for the [VPU Plugins](VPU.md).
 When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix.

-In addition to common parameters for Myriad plugin and HDDL plugin, HDDL plugin accepts the following options:
+In addition to common parameters for MYRIAD plugin and HDDL plugin, HDDL plugin accepts the following options:

 | Parameter Name                        | Parameter Values | Default      | Description                                                                     |
 | :---                                  | :---             | :---         | :---                                                                            |
--- a/docs/IE_DG/supported_plugins/HETERO.md
+++ b/docs/IE_DG/supported_plugins/HETERO.md
@@ -28,17 +28,17 @@ Default fallback policy decides which layer goes to which device automatically a

 Another way to annotate a network is to set affinity manually using <code>ngraph::Node::get_rt_info</code> with key `"affinity"`:

-@snippet openvino/docs/snippets/HETERO0.cpp part0
+@snippet snippets/HETERO0.cpp part0

 The fallback policy does not work if even one layer has an initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.

 > **NOTE**: If you set affinity manually, be careful at the current moment Inference Engine plugins don't support constant (`Constant`->`Result`) and empty (`Parameter`->`Result`) networks. Please avoid such subgraphs when you set affinity manually.

-@snippet openvino/docs/snippets/HETERO1.cpp part1
+@snippet snippets/HETERO1.cpp part1

 If you rely on the default affinity distribution, you can avoid calling <code>InferenceEngine::Core::QueryNetwork</code> and just call <code>InferenceEngine::Core::LoadNetwork</code> instead:

-@snippet openvino/docs/snippets/HETERO2.cpp part2
+@snippet snippets/HETERO2.cpp part2

 > **NOTE**: `InferenceEngine::Core::QueryNetwork` does not depend on affinities set by a user, but queries for layer support based on device capabilities.

@@ -74,7 +74,7 @@ Heterogeneous plugin can generate two files:
 * `hetero_affinity_<network name>.dot` - annotation of affinities per layer. This file is written to the disk only if default fallback policy was executed
 * `hetero_subgraphs_<network name>.dot` - annotation of affinities per graph. This file is written to the disk during execution of <code>ICNNNetwork::LoadNetwork()</code> for heterogeneous plugin

-@snippet openvino/docs/snippets/HETERO3.cpp part3
+@snippet snippets/HETERO3.cpp part3

 You can use GraphViz* utility or converters to `.png` formats. On Ubuntu* operating system, you can use the following utilities:
 * `sudo apt-get install xdot`
--- a/docs/IE_DG/supported_plugins/MULTI.md
+++ b/docs/IE_DG/supported_plugins/MULTI.md
@@ -32,11 +32,11 @@ You can use name of the configuration directly as a string, or use MultiDeviceCo
 
 Basically, there are three ways to specify the devices to be use by the "MULTI":

-@snippet openvino/docs/snippets/MULTI0.cpp part0
+@snippet snippets/MULTI0.cpp part0

 Notice that the priorities of the devices can be changed in real-time for the executable network:

-@snippet openvino/docs/snippets/MULTI1.cpp part1
+@snippet snippets/MULTI1.cpp part1

 Finally, there is a way to specify number of requests that the multi-device will internally keep for each device.
 Say if your original app was running 4 cameras with 4 inference requests now you would probably want to share these 4 requests between 2 devices used in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: "MULTI:CPU(2),GPU(2)" and use the same 4 requests in your app. However, such an explicit configuration is not performance portable and hence not recommended. Instead, the better way is to configure the individual devices and query the resulting number of requests to be used in the application level (see [Configuring the Individual Devices and Creating the Multi-Device On Top](#configuring-the-individual-devices-and-creating-the-multi-device-on-top)).
@@ -47,15 +47,17 @@ Inference Engine now features a dedicated API to enumerate devices and their cap
 ```sh
 ./hello_query_device
 Available devices: 
-	Device: CPU
+    Device: CPU
 ...
-	Device: GPU
+    Device: GPU.0
 ...
-	Device: HDDL
+    Device: GPU.1
+...
+    Device: HDDL
 ```
 Simple programmatic way to enumerate the devices and use with the multi-device is as follows:

-@snippet openvino/docs/snippets/MULTI2.cpp part2
+@snippet snippets/MULTI2.cpp part2

 Beyond trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified.
 For example this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample:
@@ -68,13 +70,13 @@ For example this is how two Intel® Movidius™ Myriad™ X sticks are listed wi
 So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480".
 Accordingly, the code that loops over all available devices of "MYRIAD" type only is below:

-@snippet openvino/docs/snippets/MULTI3.cpp part3
+@snippet snippets/MULTI3.cpp part3


 ## Configuring the Individual Devices and Creating the Multi-Device On Top
 As discussed in the first section, you shall configure each individual device as usual and then just create the "MULTI" device on top:

-@snippet openvino/docs/snippets/MULTI4.cpp part4
+@snippet snippets/MULTI4.cpp part4

 Alternatively, you can combine all the individual device settings into single config and load that, allowing the multi-device plugin to parse and apply that to the right devices. See code example in the next section.

@@ -84,17 +86,24 @@ See section of the [Using the multi-device with OpenVINO samples and benchmarkin
 ## Querying the Optimal Number of Inference Requests
 Notice that until R2 you had to calculate number of requests in your application for any device, e.g. you had to know that Intel® Vision Accelerator Design with Intel® Movidius™ VPUs required at least 32 inference requests to perform well. Now you can use the new GetMetric API to query the optimal number of requests. Similarly, when using the multi-device you don't need to sum over included devices yourself, you can query metric directly:

-@snippet openvino/docs/snippets/MULTI5.cpp part5
+@snippet snippets/MULTI5.cpp part5

 ## Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance
 Notice that every OpenVINO sample that supports "-d" (which stays for "device") command-line option transparently accepts the multi-device.
 The [Benchmark Application](../../../inference-engine/samples/benchmark_app/README.md) is the best reference to the optimal usage of the multi-device. As discussed multiple times earlier, you don't need to setup number of requests, CPU streams or threads as the application provides optimal out of the box performance.
 Below is example command-line to evaluate HDDL+GPU performance with that:
-```bash
-$ ./benchmark_app –d MULTI:HDDL,GPU –m <model> -i <input> -niter 1000
+
+```sh
+./benchmark_app –d MULTI:HDDL,GPU –m <model> -i <input> -niter 1000
 ```
 Notice that you can use the FP16 IR to work with multi-device (as CPU automatically upconverts it to the fp32) and rest of devices support it naturally. 
 Also notice that no demos are (yet) fully optimized for the multi-device, by means of supporting the OPTIMAL_NUMBER_OF_INFER_REQUESTS metric, using the GPU streams/throttling, and so on.

+## Video: MULTI Plugin
+[![](https://img.youtube.com/vi/xbORYFEmrqU/0.jpg)](https://www.youtube.com/watch?v=xbORYFEmrqU)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/xbORYFEmrqU" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
 ## See Also
 * [Supported Devices](Supported_Devices.md)
+
+
--- a/docs/IE_DG/supported_plugins/MYRIAD.md
+++ b/docs/IE_DG/supported_plugins/MYRIAD.md
@@ -71,6 +71,8 @@ In addition to common parameters, the MYRIAD plugin accepts the following option
 | `KEY_VPU_MYRIAD_FORCE_RESET` | `YES`/`NO`                             | `NO`        | Enables force reset of all booted devices when new ExecutableNetwork is created.<br />This is a plugin scope option and must be used with the plugin's SetConfig method only.<br />See <a href="#MYRIAD_DEVICE_ALLOC">Device allocation</a> section for details. |
 | `KEY_VPU_PLATFORM`           | empty string/`VPU_2450`/`VPU_2480`     | empty string | **Deprecated** Use `KEY_VPU_MYRIAD_PLATFORM` instead. <br />If set, the plugin will use a device with specific platform to allocate a network. |
 | `KEY_VPU_FORCE_RESET`        | `YES`/`NO`                             | `NO`         | **Deprecated** Use `KEY_VPU_MYRIAD_FORCE_RESET` instead. <br />Enables force reset of all booted devices when new ExecutableNetwork is created.<br />This is a plugin scope option and must be used with the plugin's SetConfig method only.<br />See <a href="#MYRIAD_DEVICE_ALLOC">Device allocation</a> section for details. |
+| `KEY_VPU_MYRIAD_MOVIDIUS_DDR_TYPE`        | `VPU_MYRIAD_DDR_AUTO`/   `VPU_MYRIAD_DDR_MICRON_2GB`/   `VPU_MYRIAD_DDR_SAMSUNG_2GB`/   `VPU_MYRIAD_DDR_HYNIX_2GB`/   `VPU_MYRIAD_DDR_MICRON_1GB`                             | `VPU_MYRIAD_DDR_AUTO`         | This option allows setting DDR type for the MyriadX board. |
+

 ## Device allocation <a name="MYRIAD_DEVICE_ALLOC">&nbsp;</a>

--- a/docs/IE_DG/supported_plugins/VPU.md
+++ b/docs/IE_DG/supported_plugins/VPU.md
@@ -9,12 +9,12 @@ This chapter provides information on the Inference Engine plugins that enable in

 ## Known Layers Limitations

-* `'ScaleShift'` layer is supported for zero value of `'broadcast'` attribute only.
-* `'CTCGreedyDecoder'` layer works with `'ctc_merge_repeated'` attribute equal 1.
-* `'DetectionOutput'` layer works with zero values of `'interpolate_orientation'` and `'num_orient_classes'` parameters only.
-* `'MVN'` layer uses fixed value for `'eps'` parameters (1e-9).
-* `'Normalize'` layer uses fixed value for `'eps'` parameters (1e-9) and is supported for zero value of `'across_spatial'` only.
-* `'Pad'` layer works only with 4D tensors.
+* `ScaleShift` layer is supported for zero value of `broadcast` attribute only.
+* `CTCGreedyDecoder` layer works with `ctc_merge_repeated` attribute equal 1.
+* `DetectionOutput` layer works with zero values of `interpolate_orientation` and `num_orient_classes` parameters only.
+* `MVN` layer uses fixed value for `eps` parameters (1e-9).
+* `Normalize` layer uses fixed value for `eps` parameters (1e-9) and is supported for zero value of `across_spatial` only.
+* `Pad` layer works only with 4D tensors.

 ## Optimizations

--- a/docs/IE_PLUGIN_DG/Doxyfile
+++ b/docs/IE_PLUGIN_DG/Doxyfile
@@ -844,11 +844,7 @@ EXCLUDE_SYMLINKS       = NO
 # Note that the wildcards are matched against the file with absolute path, so to
 # exclude all test directories for example use the pattern */test/*

-EXCLUDE_PATTERNS       = cnn_network_ngraph_impl.hpp \
-                         ie_imemory_state_internal.hpp \
-                         ie_memory_state_internal.hpp \
-                         ie_memory_state_base.hpp \
-                         generic_ie.hpp \
+EXCLUDE_PATTERNS       = generic_ie.hpp \
                         function_name.hpp \
                         macro_overload.hpp

--- a/docs/IE_PLUGIN_DG/ExecutableNetwork.md
+++ b/docs/IE_PLUGIN_DG/ExecutableNetwork.md
@@ -92,7 +92,7 @@ Returns a metric value for a metric with the name `name`.  A metric is a static

@snippet src/template_executable_network.cpp executable_network:get_metric

-The IE_SET_METRIC helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
+The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.

 ### `GetConfig()`

--- a/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.md
+++ b/docs/IE_PLUGIN_DG/LowPrecisionModelRepresentation.md
@@ -1,11 +1,11 @@
-# Representation of low-precision models
+# Representation of low-precision models {#lp_representation}
 The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime. 
 Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
 - **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
 - **Quantization**. The rest of this document is dedicated to the representation of quantized models.

 ## Representation of quantized models
-The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in [this document](../MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md)). This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
+The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in [this document](@ref openvino_docs_ops_quantization_FakeQuantize_1)). This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
 In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize between operation and data blobs.  The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for weights and one for activations (bias is quantized using the same parameters).
 ![quantized_convolution]
 <div align="center">Figure 1. Example of quantized Convolution operation.</div>
--- a/docs/IE_PLUGIN_DG/QuantizedNetworks.md
+++ b/docs/IE_PLUGIN_DG/QuantizedNetworks.md
@@ -3,13 +3,13 @@
 One of the feature of Inference Engine is the support of quantized networks with different precisions: INT8, INT4, etc.
 However, it is up to the plugin to define what exact precisions are supported by the particular HW.
 All quantized networks which can be expressed in IR have a unified representation by means of *FakeQuantize* operation. 
-For more details about low-precision model representation please refer to this [document](LowPrecisionModelRepresentation.md).
+For more details about low-precision model representation please refer to this [document](@ref lp_representation).

 ### Interpreting FakeQuantize at runtime
 During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
 - Independently based on the definition of *FakeQuantize* operation.
 - Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
-such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](../IE_DG/Int8Inference.md). 
+such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](@ref openvino_docs_IE_DG_Int8Inference). 

 Here we provide only a high-level overview of the interpretation rules of FakeQuantize. 
 At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**. 
--- a/docs/IE_PLUGIN_DG/layout.xml
+++ b/docs/IE_PLUGIN_DG/layout.xml
@@ -17,8 +17,10 @@
        </tab>
        <!-- API References -->
        <tab type="usergroup" title="API REFERENCE">
-            <!-- IE Developer Package -->
-            <tab type="modules" visible="yes" title="Inference Engine Plugin API Reference"/>
+            <!-- IE Plugin API -->
+            <tab type="user" url="group__ie__dev__api.html" visible="yes" title="Inference Engine Plugin API Reference"/>
+            <!-- IE Transformations API -->
+            <tab type="user" url="group__ie__transformation__api.html" visible="yes" title="Inference Engine Transformations API Reference"/>
        </tab>
        <tab type="usergroup" title="MAIN OPENVINO™ DOCS" url="../index.html"/>
    </navindex>
--- a/docs/Legal_Information.md
+++ b/docs/Legal_Information.md
@@ -4,9 +4,7 @@ This software and the related documents are Intel copyrighted materials, and you

 This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting [www.intel.com/design/literature.htm](https://www.intel.com/design/literature.htm).

-Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  
-
-Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  Any change to any of those factors may cause the results to vary.  You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit [www.intel.com/benchmarks](https://www.intel.com/benchmarks).
+Performance varies by use, configuration and other factors. Learn more at [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex).

 Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. 

@@ -14,7 +12,7 @@ Your costs and results may vary.

 Intel technologies may require enabled hardware, software or service activation.

-© Intel Corporation.  Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others.  
+© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others.  

 ## OpenVINO™ Logo
 To build equity around the project, the OpenVINO logo was created for both Intel and community usage. The logo may only be used to represent the OpenVINO toolkit and offerings built using the OpenVINO toolkit.
--- a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md
+++ b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md
@@ -12,51 +12,39 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi

 *  <code>.bin</code> - Contains the weights and biases binary data.

+> **TIP**: You also can work with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench).
+> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is a platform built upon OpenVINO™ and provides a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare 
+> performance of deep learning models on various Intel® architecture
+> configurations. In the DL Workbench, you can use most of OpenVINO™ toolkit components.
+> <br>
+> Proceed to an [easy installation from Docker](@ref workbench_docs_Workbench_DG_Install_from_Docker_Hub) to get started.
+
 ## What's New in the Model Optimizer in this Release?

 * Common changes:
-    * Implemented several optimization transformations to replace sub-graphs of operations with HSwish, Mish, Swish and SoftPlus operations.
-    * Model Optimizer generates IR keeping shape-calculating sub-graphs **by default**. Previously, this behavior was triggered if the "--keep_shape_ops" command line parameter was provided. The key is ignored in this release and will be deleted in the next release. To trigger the legacy behavior to generate an IR for a fixed input shape (folding ShapeOf operations and shape-calculating sub-graphs to Constant), use the "--static_shape" command line parameter. Changing model input shape using the Inference Engine API in runtime may fail for such an IR.
-    * Fixed Model Optimizer conversion issues resulted in non-reshapeable IR using the Inference Engine reshape API.
-    * Enabled transformations to fix non-reshapeable patterns in the original networks:
-        * Hardcoded Reshape
-            * In Reshape(2D)->MatMul pattern
-            * Reshape->Transpose->Reshape when the pattern can be fused to the ShuffleChannels or DepthToSpace operation
-        * Hardcoded Interpolate
-            * In Interpolate->Concat pattern
-        * Added a dedicated requirements file for TensorFlow 2.X as well as the dedicated install prerequisites scripts.
-        * Replaced the SparseToDense operation with ScatterNDUpdate-4.
+    * Updated requirements for the numpy component to avoid compatibility issues with TensorFlow 1.x.
+    * Improved reshape-ability of models with eltwise and CTCGreedyDecoder operations
 * ONNX*:
-    * Enabled an ability to specify the model output **tensor** name using the "--output" command line parameter.
    * Added support for the following operations:
-        * Acosh
-        * Asinh
-        * Atanh
-        * DepthToSpace-11, 13
-        * DequantizeLinear-10 (zero_point must be constant)
-        * HardSigmoid-1,6
-        * QuantizeLinear-10 (zero_point must be constant)
-        * ReduceL1-11, 13
-        * ReduceL2-11, 13
-        * Resize-11, 13 (except mode="nearest" with 5D+ input, mode="tf_crop_and_resize", and attributes exclude_outside and extrapolation_value with non-zero values)
-        * ScatterND-11, 13
-        * SpaceToDepth-11, 13
+        * Loop-11, 13
+        * Round-11
+        * GatherND-11, 12, 13
 * TensorFlow*:
+    * Added support for the TensorFlow Object Detection API models with pre-processing block when mean/scale values are applied prior to resizing of the image. Previously only the case when mean/scale values are applied after the resize was supported.
+    * Aligned FakeQuantized limits adjustment with TensorFlow approach
    * Added support for the following operations:
-        * Acosh
-        * Asinh
-        * Atanh
-        * CTCLoss
-        * EuclideanNorm
-        * ExtractImagePatches
-        * FloorDiv
+        * GatherND
+        * Round
+        * NonMaxSuppression
+        * LogSoftmax
+        * FakeQuantWithMinMaxVarsPerChannel
 * MXNet*:
    * Added support for the following operations:
-        * Acosh
-        * Asinh
-        * Atanh
+        * GatherND
+        * Round
 * Kaldi*:
-    * Fixed bug with ParallelComponent support. Now it is fully supported with no restrictions.
+    * Added support for the following operations:
+        * TdnnComponent

 > **NOTE:** 
 > [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019).
@@ -77,7 +65,6 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi
            * [Converting DeepSpeech from TensorFlow](prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md)
            * [Converting Language Model on One Billion Word Benchmark from TensorFlow](prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md)
            * [Converting Neural Collaborative Filtering Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md)
-
            * [Converting TensorFlow* Object Detection API Models](prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md)
            * [Converting TensorFlow*-Slim Image Classification Model Library Models](prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md)
            * [Converting CRNN Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md)
@@ -90,19 +77,30 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi
        * [Model Optimizations Techniques](prepare_model/Model_Optimization_Techniques.md)
        * [Cutting parts of the model](prepare_model/convert_model/Cutting_Model.md)
        * [Sub-graph Replacement in Model Optimizer](prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md)
-            * [(Deprecated) Case-Study: Converting SSD models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md)
-            * [(Deprecated) Case-Study: Converting Faster R-CNN models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md)
        * [Supported Framework Layers](prepare_model/Supported_Frameworks_Layers.md)
        * [Intermediate Representation and Operation Sets](IR_and_opsets.md)
        * [Operations Specification](../ops/opset.md)
        * [Intermediate Representation suitable for INT8 inference](prepare_model/convert_model/IR_suitable_for_INT8_inference.md)
-
-    * [Custom Layers in Model Optimizer](prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
+    * [Model Optimizer Extensibility](prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
        * [Extending Model Optimizer with New Primitives](prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md)
+        * [Extending Model Optimizer with Caffe Python Layers](prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.md)
+        * [Extending Model Optimizer with Custom MXNet* Operations](prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md)
        * [Legacy Mode for Caffe* Custom Layers](prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md)
-
    * [Model Optimizer Frequently Asked Questions](prepare_model/Model_Optimizer_FAQ.md)

 * [Known Issues](Known_Issues_Limitations.md)

 **Typical Next Step:** [Preparing and Optimizing your Trained Model with Model Optimizer](prepare_model/Prepare_Trained_Model.md)
+
+## Video: Model Optimizer Concept
+
+[![](https://img.youtube.com/vi/Kl1ptVb7aI8/0.jpg)](https://www.youtube.com/watch?v=Kl1ptVb7aI8)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/Kl1ptVb7aI8" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+## Video: Model Optimizer Basic Operation
+[![](https://img.youtube.com/vi/BBt1rseDcy0/0.jpg)](https://www.youtube.com/watch?v=BBt1rseDcy0)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/BBt1rseDcy0" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+
+## Video: Choosing the Right Precision
+[![](https://img.youtube.com/vi/RF8ypHyiKrY/0.jpg)](https://www.youtube.com/watch?v=RF8ypHyiKrY)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/RF8ypHyiKrY" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
--- a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md
+++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md
@@ -23,7 +23,7 @@ A summary of the steps for optimizing and deploying a model that was trained wit
 * **Object detection models:**
 	* SSD300-VGG16, SSD500-VGG16
 	* Faster-RCNN
-	* RefineDet (Myriad plugin only)
+	* RefineDet (MYRIAD plugin only)

 * **Face detection models:**
 	* VGG Face
--- a/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md
+++ b/docs/MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md
@@ -114,6 +114,7 @@ Where `HEIGHT` and `WIDTH` are the input images height and width for which the m
 | Unet | [Repo](https://github.com/kkweon/UNet-in-Tensorflow) |
 | Keras-TCN | [Repo](https://github.com/philipperemy/keras-tcn) |
 | PRNet | [Repo](https://github.com/YadiraF/PRNet) |
+| YOLOv4 | [Repo](https://github.com/Ma-Dan/keras-yolo4) |

 * YOLO topologies from DarkNet* can be converted using [instruction](tf_specific/Convert_YOLO_From_Tensorflow.md),
 * FaceNet topologies can be converted using [instruction](tf_specific/Convert_FaceNet_From_Tensorflow.md).
@@ -279,7 +280,7 @@ python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_custom_operatio

 * Launching the Model Optimizer for Inception V1 frozen model and use custom sub-graph replacement file `transform.json` for model conversion. For more information about this feature, refer to [Sub-Graph Replacement in the Model Optimizer](../customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md).
 ```sh
-python3 mo_tf.py --input_model inception_v1.pb -b 1 --tensorflow_use_custom_operations_config transform.json
+python3 mo_tf.py --input_model inception_v1.pb -b 1 --transformations_config transform.json
 ```

 * Launching the Model Optimizer for Inception V1 frozen model and dump information about the graph to TensorBoard log dir `/tmp/log_dir`
@@ -367,6 +368,10 @@ Refer to [Supported Framework Layers ](../Supported_Frameworks_Layers.md) for th

 The Model Optimizer provides explanatory messages if it is unable to run to completion due to issues like typographical errors, incorrectly used options, or other issues. The message describes the potential cause of the problem and gives a link to the [Model Optimizer FAQ](../Model_Optimizer_FAQ.md). The FAQ has instructions on how to resolve most issues. The FAQ also includes links to relevant sections in the Model Optimizer Developer Guide to help you understand what went wrong.

+## Video: Converting a TensorFlow Model
+[![](https://img.youtube.com/vi/QW6532LtiTc/0.jpg)](https://www.youtube.com/watch?v=QW6532LtiTc)
+<iframe width="560" height="315" src="https://www.youtube.com/embed/QW6532LtiTc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
+

 ## Summary
 In this document, you learned:
--- a/docs/MO_DG/prepare_model/convert_model/Converting_Model.md
+++ b/docs/MO_DG/prepare_model/convert_model/Converting_Model.md
@@ -38,5 +38,5 @@ Framework-specific parameters for:
 ## See Also
 * [Configuring the Model Optimizer](../Config_Model_Optimizer.md)
 * [IR Notation Reference](../../IR_and_opsets.md)
-* [Custom Layers in Model Optimizer](../customize_model_optimizer/Customize_Model_Optimizer.md) 
-* [Model Cutting](Cutting_Model.md)
+* [Model Optimizer Extensibility](../customize_model_optimizer/Customize_Model_Optimizer.md)
+* [Model Cutting](Cutting_Model.md)
--- a/docs/MO_DG/prepare_model/convert_model/Cutting_Model.md
+++ b/docs/MO_DG/prepare_model/convert_model/Cutting_Model.md
@@ -9,7 +9,6 @@ The following examples are the situations when model cutting is useful or even r
 *   model has pre- or post-processing parts that cannot be translated to existing Inference Engine layers.
 *   model has a training part that is convenient to be kept in the model, but not used during inference.
 *   model is too complex (contains lots of unsupported operations that cannot be easily implemented as custom layers), so the complete model cannot be converted in one shot.
-*   model is one of the supported [SSD models](../customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md). In this case, you need to cut a post-processing part off.
 *   problem with model conversion in the Model Optimizer or inference in the Inference Engine occurred. To localize the issue, limit the scope for conversion by iteratively searching for problematic places in the model.
 *   single custom layer or a combination of custom layers is isolated for debugging purposes.

@@ -389,4 +388,4 @@ In this case, when `--input_shape` is specified and the node contains multiple i
 The correct command line is:
 ```sh
 python3 mo.py --input_model=inception_v1.pb --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]
-```
+```
--- a/docs/MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md
+++ b/docs/MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md
@@ -1582,9 +1582,9 @@ OI, which means that Input changes the fastest, then Output.

 **Mathematical Formulation**

-    \f[
-        output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:]
-    \f]
+\f[
+    output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:]
+\f]


 **Inputs**
@@ -5086,7 +5086,9 @@ t \in \left ( 0, \quad tiles \right )

 Output tensor is populated by values computes in the following way:

-    output[i1, ..., i(axis-1), j, i(axis+1) ..., iN] = top_k(input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]), k, sort, mode)
+\f[
+output[i1, ..., i(axis-1), j, i(axis+1) ..., iN] = top_k(input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]), k, sort, mode)
+\f]

 So for each slice `input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]` which represents 1D array, top_k value is computed individually. Sorting and minimum/maximum are controlled by `sort` and `mode` attributes.

--- a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_EfficientDet_Models.md
+++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_EfficientDet_Models.md
@@ -46,7 +46,7 @@ To generate the IR of the EfficientDet TensorFlow model, run:<br>
 ```sh
 python3 $MO_ROOT/mo.py \
 --input_model savedmodeldir/efficientdet-d4_frozen.pb \
--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/automl_efficientdet.json \
+--transformations_config $MO_ROOT/extensions/front/tf/automl_efficientdet.json \
 --input_shape [1,$IMAGE_SIZE,$IMAGE_SIZE,3] \
 --reverse_input_channels
 ```
@@ -56,7 +56,7 @@ EfficientDet models were trained with different input image sizes. To determine
 dictionary in the [hparams_config.py](https://github.com/google/automl/blob/96e1fee/efficientdet/hparams_config.py#L304) file.
 The attribute `image_size` specifies the shape to be specified for the model conversion.

-The `tensorflow_use_custom_operations_config` command line parameter specifies the configuration json file containing hints
+The `transformations_config` command line parameter specifies the configuration json file containing hints
 to the Model Optimizer on how to convert the model and trigger transformations implemented in the 
 `$MO_ROOT/extensions/front/tf/AutomlEfficientDet.py`. The json file contains some parameters which must be changed if you
 train the model yourself and modified the `hparams_config` file or the parameters are different from the ones used for EfficientDet-D4.
--- a/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md
+++ b/docs/MO_DG/prepare_model/convert_model/tf_specific/Convert_YOLO_From_Tensorflow.md
@@ -45,6 +45,10 @@ python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weig
 ```sh
 python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3-tiny.weights --tiny
 ```
+At this step, you may receive a warning like `WARNING:tensorflow:Entity <...> could not be transformed and will be executed as-is.`. To workaround this issue, switch to gast 0.2.2 with the following command:
+```sh
+pip3 install --user gast==0.2.2
+```

 If you have YOLOv3 weights trained for an input image with the size different from 416 (320, 608 or your own), please provide the `--size` key with the size of your image specified while running the converter. For example, run the following command for an image with size 608:
 ```sh
@@ -87,7 +91,7 @@ To generate the IR of the YOLOv3 TensorFlow model, run:<br>
 ```sh
 python3 mo_tf.py
 --input_model /path/to/yolo_v3.pb
--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3.json
+--transformations_config $MO_ROOT/extensions/front/tf/yolo_v3.json
 --batch 1
 ```

@@ -95,18 +99,18 @@ To generate the IR of the YOLOv3-tiny TensorFlow model, run:<br>
 ```sh
 python3 mo_tf.py
 --input_model /path/to/yolo_v3_tiny.pb
--tensorflow_use_custom_operations_config $MO_ROOT/extensions/front/tf/yolo_v3_tiny.json
+--transformations_config $MO_ROOT/extensions/front/tf/yolo_v3_tiny.json
 --batch 1
 ```

 where:

 * `--batch` defines shape of model input. In the example, `--batch` is equal to 1, but you can also specify other integers larger than 1.
-* `--tensorflow_use_custom_operations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`.
+* `--transformations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`.

 > **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md).

-OpenVINO&trade; toolkit provides a demo that uses YOLOv3 model. For more information, refer to [Object Detection YOLO* V3 Demo, Async API Performance Showcase](@ref omz_demos_object_detection_demo_yolov3_async_README).
+OpenVINO&trade; toolkit provides a demo that uses YOLOv3 model. For more information, refer to [Object Detection C++ Demo](@ref omz_demos_object_detection_demo_ssd_async_README).

 ## Convert YOLOv1 and YOLOv2 Models to the IR

@@ -163,14 +167,14 @@ python3 ./mo_tf.py
 --input_model <path_to_model>/<model_name>.pb       \
 --batch 1                                       \
 --scale 255 \
--tensorflow_use_custom_operations_config <OPENVINO_INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/<yolo_config>.json
+--transformations_config <OPENVINO_INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/<yolo_config>.json
 ```
 where:

 * `--batch` defines shape of model input. In the example, `--batch` is equal to 1, but you can also specify other integers larger than 1.
 * `--scale` specifies scale factor that input values will be divided by. 
 The model was trained with input values in the range `[0,1]`. OpenVINO&trade; toolkit samples read input images as values in `[0,255]` range, so the scale 255 must be applied.
-* `--tensorflow_use_custom_operations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`.
+* `--transformations_config` adds missing `Region` layers to the model. In the IR, the `Region` layer has name `RegionYolo`.
 For other applicable parameters, refer to [Convert Model from TensorFlow](../Convert_Model_From_TensorFlow.md).

 > **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md).
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md
@@ -1,45 +1,41 @@
-# Extending the MXNet Model Optimizer with New Primitives  {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives}
+# Extending Model Optimizer for Custom MXNet* Operations {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives}

-This section describes how you can create a Model Optimizer extension for a custom layer from your MXNet* model. It supplements the main document [Extending Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md) and provides a step-by-step procedure. To create an extension for a particular layer, perform the following steps:
+This section provides instruction on how to support a custom MXNet operation (or as it called in the MXNet documentation
+"operator" or "layer") which is not a part of the MXNet operation set. For example, if the operator is implemented using
+the following [guide](https://mxnet.apache.org/versions/1.7.0/api/faq/new_op.html).
+
+This section describes a procedure on how to extract operator attributes in the Model Optimizer. The rest of the
+operation enabling pipeline and documentation on how to support MXNet operations from standard MXNet operation set is
+described in the main document [Customize_Model_Optimizer](Customize_Model_Optimizer.md).
+
+## Writing Extractor for Custom MXNet Operation
+Custom MXNet operations have an attribute `op` (defining the type of the operation) equal to `Custom` and attribute
+`op_type` which is an operation type defined by an user. Implement extractor class inherited from the
+`MXNetCustomFrontExtractorOp` class instead of `FrontExtractorOp` class used for standard framework operations in order
+to extract attributes for such kind of operations. The `op` class attribute value should be set to the `op_type` value
+so the extractor is triggered for this kind of operation.
+
+There is the example of the extractor for the custom operation registered with type (`op_type` value) equal to
+`MyCustomOp` having attribute `my_attribute` of the floating point type with default value `5.6`. In this sample we
+assume that we have already created the `CustomOp` class (inherited from `Op` class) for the Model Optimizer operation
+for this MXNet custom operation as described in the [Customize_Model_Optimizer](Customize_Model_Optimizer.md).

-1.  Create the file `custom_proposal_ext.py` in the folder `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/mxnet`
-If your MXNet layer has op `Custom`, create the `CustomProposalFrontExtractor` class inherited from `MXNetCustomFrontExtractorOp`:
-```py
-from mo.front.extractor import MXNetCustomFrontExtractorOp
-class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
-    pass
-```
-Otherwise, for layers that are not standard MXNet layers, create the `ProposalFrontExtractor` class inherited from `FrontExtractorOp`:
-```py
-    from mo.front.extractor import FrontExtractorOp
-    class ProposalFrontExtractor(FrontExtractorOp):
-        pass
-```
-2.  Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
-```py
-from mo.front.extractor import MXNetCustomFrontExtractorOp
-class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
-    op = '_contrib_Proposal'
-    enabled = True
-```
-3.  Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function:
 ```py
+from extension.ops.custom_op import CustomOp  # implementation of the MO operation class
 from mo.front.mxnet.extractors.utils import get_mxnet_layer_attrs
 from mo.front.extractor import MXNetCustomFrontExtractorOp
-from mo.ops.op import Op

-class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
-    op = '_contrib_Proposal'
-    enabled = True
+class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):  # inherit from specific base class
+    op = 'MyCustomOp'  # the value corresponding to the `op_type` value of the MXNet operation
+    enabled = True  # the extractor is enabled
+
    @staticmethod
    def extract(node):
-    attrs = get_mxnet_layer_attrs(node.symbol_dict)
+        attrs = get_mxnet_layer_attrs(node.symbol_dict)  # parse the attributes to a dictionary with string values
        node_attrs = {
-            'feat_stride': attrs.float('feat_stride', 16)
+            'my_attribute': attrs.float('my_attribute', 5.6)
        }
-        
-        # update the attributes of the node
-        Op.get_op_class_by_name('Proposal').update_node_stat(node, node_attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before
-        return __class__.enabled
-```

+        CustomOp.update_node_stat(node, node_attrs)  # update the attributes of the node
+        return self.enabled
+```
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.md
@@ -0,0 +1,89 @@
+# Extending Model Optimizer with Caffe* Python Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers}
+
+This section provides instruction on how to support a custom Caffe operation written only in Python. For example, the
+[Faster-R-CNN model]((http://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0)) implemented in
+Caffe contains a custom layer Proposal written in Python. The layer is described in the
+[Faster-R-CNN protoxt](https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt)
+the following way:
+```sh
+layer {
+  name: 'proposal'
+  type: 'Python'
+  bottom: 'rpn_cls_prob_reshape'
+  bottom: 'rpn_bbox_pred'
+  bottom: 'im_info'
+  top: 'rois'
+  python_param {
+    module: 'rpn.proposal_layer'
+    layer: 'ProposalLayer'
+    param_str: "'feat_stride': 16"
+  }
+}
+```
+
+This section describes only a procedure on how to extract operator attributes in the Model Optimizer. The rest of the
+operation enabling pipeline and documentation on how to support other Caffe operations (written in C++) is described in
+the main document [Customize_Model_Optimizer](Customize_Model_Optimizer.md).
+
+## Writing Extractor for Caffe Python Layer
+Custom Caffe Python layers have an attribute `type` (defining the type of the operation) equal to `Python` and two
+mandatory attributes `module` and `layer` in the `python_param` dictionary. The `module` defines the Python module name
+with the layer implementation, while `layer` value is an operation type defined by an user. In order to extract
+attributes for such an operation it is necessary to implement extractor class inherited from the
+`CaffePythonFrontExtractorOp` class instead of `FrontExtractorOp` class used for standard framework layers. The `op`
+class attribute value should be set to the `module + "." + layer` value so the extractor is triggered for this kind of
+operation.
+
+Here is a simplified example of the extractor for the custom operation Proposal from Faster-R-CNN model mentioned above.
+The full code with additional checks is provided in the
+`<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe/proposal_python_ext.py`. The sample code uses
+operation `ProposalOp` which corresponds to `Proposal` operation described in the [Available Operations Sets](../../../ops/opset.md)
+document. Refer to the source code below for a detailed explanation of the extractor.
+
+```py
+from extensions.ops.proposal import ProposalOp
+from mo.front.extractor import CaffePythonFrontExtractorOp
+
+
+class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
+    op = 'rpn.proposal_layer.ProposalLayer'  # module + "." + layer
+    enabled = True  # extractor is enabled
+
+    @staticmethod
+    def extract_proposal_params(node, defaults):
+        param = node.pb.python_param  # get the protobuf message representation of the layer attributes
+        # parse attributes from the layer protobuf message to a Python dictionary
+        attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
+        update_attrs = defaults
+
+        # the operation expects ratio and scale values to be called "ratio" and "scale" while Caffe uses different names
+        if 'ratios' in attrs:
+            attrs['ratio'] = attrs['ratios']
+            del attrs['ratios']
+        if 'scales' in attrs:
+            attrs['scale'] = attrs['scales']
+            del attrs['scales']
+
+        update_attrs.update(attrs)
+        ProposalOp.update_node_stat(node, update_attrs)  # update the node attributes
+
+    @classmethod
+    def extract(cls, node):
+        # define default values for the Proposal layer attributes
+        defaults = {
+            'feat_stride': 16,
+            'base_size': 16,
+            'min_size': 16,
+            'ratio': [0.5, 1, 2],
+            'scale': [8, 16, 32],
+            'pre_nms_topn': 6000,
+            'post_nms_topn': 300,
+            'nms_thresh': 0.7
+        }
+        cls.extract_proposal_params(node, defaults)
+        return cls.enabled
+```
+
+## See Also
+* [Customize_Model_Optimizer](Customize_Model_Optimizer.md)
+* [Legacy Mode for Caffe* Custom Layers](Legacy_Mode_for_Caffe_Custom_Layers.md)
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md
@@ -1,476 +1,3 @@
-# Extending the Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives}
+# Extending Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives}

-This section explains how to register a custom layer in the Model Optimizer, including how to register Proposal as a custom layer. This section also demonstrates how `Proposal` works as a custom layer.
-
-Model Optimizer loads the model, goes through the topology, and tries to find each layer type in the list of known layers. If the Model Optimizer does not find a layer in that list, it looks for the layer in the list of custom layers. If the Model Optimizer fails to find the layer among the defined custom layers, it registers a Caffe\* fallback for for the output shape inference. If the Model Optimizer does not find Caffe and cannot infer shapes, the Model Optimizer fails with an appropriate message.
-
-You must know two things about custom layers with the Model Optimizer:
-
-*   How to map a subgraph in a FW model to a subgraph consisting of Inference Engine layers. For Caffe, the subgraph is a 1-to-1 mapping of a Caffe layer to an Inference Engine layer.
-*   How to infer shapes for unknown subgraphs. This can be either for a step in which the internal representation consists of framework-specific layers, or for a step in which the internal representation consists of Inference Engine layers.
-
-You also have the option of a framework fallback for unknown subgraphs, for when the original framework is used for inference of output shapes of operations. The example below demonstrates the case in which the framework is not available or should not be used.
-
-## Preparing an Example Topology
-
-> **NOTE**: Skip this section if you have a topology with a layer that is not known to the Model Optimizer.
-
-The information in this section prepares a Caffe\* model with the provided, deployment-ready `prototxt` for a
-well-known topology called
-[Faster-R-CNN protoxt](https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt)
-to demonstrate the workflow. To use this example, you must have
-[weights and biases](http://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0) for inference,
-because `prototxt` just describes the structure of the topology.
-
-1.  Download the `.caffemodel` and `.prototxt` files
-2.  Run the Model Optimizer on the `.caffemodel` and `.prototxt` files:
-```shell
-python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
-```
-You will likely see the error message:
-```shell
-Error parsing text-format caffe.NetParameter: 196:16: Message type "caffe.DropoutParameter" has no field named "scale_train".
-```
-Whether you see the error depends on your Caffe version. For example, BVLC Caffe does not support the boolean parameter `scale_train` for the `dropout` layer. The error message does not matter, because the dropout layer is needed only for training, and the Model Optimizer removes it.
-3.  To proceed, comment out these lines in `test.prototxt`:
-```sh
-...
-layer {
-  name: "drop6"
-  type: "Dropout"
-  bottom: "fc6"
-  top: "fc6"
-  dropout_param {
-    dropout_ratio: 0.5
-    # scale_train: false # <-------------- comment out this line
-  }
-}
-...
-layer {
-  name: "drop7"
-  type: "Dropout"
-  bottom: "fc7"
-  top: "fc7"
-  dropout_param {
-    dropout_ratio: 0.5
-    # scale_train: false # <-------------- comment out this line
-  }
-}
-...
-```
-4.  Run the Model Optimizer on this model again:
-```shell
-python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
-```
-    You get the model successfully converted to Intermediate Representation, and you can infer it with the Inference Engine.
-
-    However, the aim of this tutorial is to demonstrate the way of supporting custom layers not yet supported by the Model Optimizer.
-    If you want to understand better how Model Optimizer works, remove the extension for layer `Proposal` and follow all steps of this tutorial.
-
-5.	Remove the extension for layer `Proposal`:
-```sh
-mkdir extensions/old
-mv extensions/front/caffe/proposal_python_ext.py extensions/old/proposal_python_ext_old.py
-mv extensions/ops/proposal_python_example.py extensions/old/proposal_python__example_old.py
-```
-6.	Now you can run the Model Optimizer on this model once again:
-```sh
-python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
-```
-You will see the message:
-```shell
-[ ERROR ]  Found custom layer proposal. Model Optimizer does not support this layer.
-Please, register it in CustomLayersMapping.xml or implement extension.
-For more information please refer to Model Optimizer FAQ, question #FAQ45.
-```
-This message means the Model Optimizer can load the model, but is unable to infer the shape and handle the custom layer properties.
-
-## Registering a Custom Layer as a Model Optimizer Extension
-
-In the following sections, you will learn how to make the Model Optimizer independent from Caffe\* when processing a
-model that has a custom layer. In this example, the custom layer is referred to as the Proposal layer.
-
-Use this section to implement the mapping rules for the `Proposal` layer attributes and the output shape calculation. As part of these steps, you must first create a class for the `Proposal` layer and inherit it from general-purpose Op that defines the interface of every new custom layer.
-
-In this section, it is important to understand the `Op` class and its function. The implementation of this class shows that it expects a graph and attributes to be passed when initializing. The graph and attributes are in `<INSTALL_DIR>/deployment_tools/model_optimizer/mo/ops/op.py`
-
-`Op` keeps the attributes for each operation and contains logic for handling node creation for internal model representation. `Op` is responsible for dumping each particular operation to the `.xml` format for the Intermediate Representation. By inheriting from it, the technical items are complete and you concentrate on the specificity of this layer: the attributes it supports and the rules on computing its output shape.
-
-Follow these steps:
-
-1.  Create the file `python_proposal.py` in the directory `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops`:
-```python
-from mo.ops.op import Op
-class PythonProposalOp(Op):
-    pass
-```
-2.  Define the name of the operation and make a stub constructor:
-```python
-from mo.ops.op import Op
-class PythonProposalOp(Op):
-    op = 'Proposal'
-    def __init__(self, graph, attrs):
-        super().__init__(graph)
-```
-3.  Every `Op` must have three specific fields defined: `type`, `op`, and `infer`. In most cases, the `type` and `op` names are the same, and `infer` is defined as a function to compute the output shape. Reflect these fields in your constructor:
-```python
-from mo.ops.op import Op
-class PythonProposalOp(Op):
-    op = 'Proposal'
-    def __init__(self, graph, attrs):
-        mandatory_props = {
-            'type': __class__.op,
-            'op': __class__.op,
-            'infer': None
-        }
-        super().__init__(graph, mandatory_props, attrs)
-```
- According to the Intermediate Representation catalog, Proposal layer has the following attributes:
-
-    *   `pre_nms_topn`
-    *   `post_nms_topn`
-    *   `nms_thresh`
-    *   `feat_stride`
-    *   `min_size`
-    *   `base_size`
-    *   `ratio`
-    *   `scale`
-4.  In defining supported attribute names, it is best to use the same names as in the original models. The names are similar to parameters and have no connection with the model layer properties. For clarity, you can use the name `my_ratio` for `ratio`. Other than defining the list of supported parameters, you can define only the parameters that appear in the Intermediate Representation in the `backend_attrs` method.  
-    Define your attributes:
-```python
-class PythonProposalOp(Op):
-    # ... constructor
-     def supported_attrs(self):
-            return [
-                'pre_nms_topn',
-                'post_nms_topn',
-                'nms_thresh',
-                'feat_stride',
-                'min_size',
-                'base_size',
-                'ratio',
-                'scale'
-            ]
-```
-5.  Model Optimizer now knows how to create the layer called Proposal when it is in the topology and what attributes this layer has. However, the Model Optimizer does not know how to calculate the output shape of this operation. Define a rule to calculate the output shape:
-```python
-import numpy as np
-from mo.graph.graph import Node
-from mo.ops.op import Op
-class PythonProposalOp(Op):
-   def __init__(self, graph, attrs):
-       mandatory_props = {
-           'type': __class__.op,
-           'op': __class__.op,
-           'infer': PythonProposalOp.calculate_output_shape
-       }
-       super().__init__(graph, mandatory_props, attrs)
-    # ... supported attrs
-    @staticmethod
-    def calculate_output_shape(node: Node):
-        node.out_node().shape = (1, 1, 1, 1) # any Proposal now has always the same output
-```
-6.  According to the Intermediate Representation catalog, Proposal layer has the following output calculation formula, where shape dynamically depends on the `post_nms_topn` parameter.  
-    Implement the output calculation formula in Python\*:
-```python
-import numpy as np
-class PythonProposalOp(Op):
-    # ... static fields
-    # ... constructor
-    # ... supported attrs
-    @staticmethod
-    def calculate_output_shape(node: Node):
-        input_shape = node.in_node(0).shape
-        out_shape = np.array([0, 0], dtype=np.int64)
-        # rois blob: holds R regions of interest, each is a 5 - tuple
-        # (n, x1, y1, x2, y2) specifying an image batch index n and a
-        # rectangle(x1, y1, x2, y2)
-        out_shape[0] = input_shape[0] * node.post_nms_topn
-        out_shape[1] = 5
-        node.out_node(0).shape = out_shape
-```
-    The node does not contain this parameter because it should be initialized in the constructor and in other parameters. The Inference Engine contains the implementation of a Caffe\*-like Proposal layer and works well with the default values from `caffe.proto`:
-```
-// Message that stores parameters used by ProposalLayer message ProposalParameter { optional uint32 feat_stride = 1 [default = 16]; optional uint32 base_size = 2 [default = 16]; optional uint32 min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional uint32 pre_nms_topn = 6 [default = 6000]; optional uint32 post_nms_topn = 7 [default = 300]; optional float nms_thresh = 8 [default = 0.7]; }
-```
-7.  Change the constructor as follows:
-```python
-class PythonProposalOp(Op):
-    # ... static fields
-    def __init__(self, graph, attrs):
-        mandatory_props = {
-            'type': __class__.op,
-            'op': __class__.op,
-            'feat_stride': 16,
-            'base_size': 16,
-            'min_size': 16,
-            'ratio': [0.5, 1, 2],
-            'scale': [8, 16, 32],
-            'pre_nms_topn': 6000,
-            'post_nms_topn': 300,
-            'nms_thresh': 0.7,
-            'infer': PythonProposalOp.calculate_output_shape
-        }
-        super().__init__(graph, mandatory_props, attrs)
-    # ... supported attrs
-    # ... calculate output shape
-
-```
-
-It is mandatory to call two functions right after the implementation of that class:
-
-```
-class ProposalPythonOp(Op):
-      ...
-
-register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer')
-Op.excluded_classes.append(ProposalPythonOp)
-```
-
-Note that the first call <code>register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer')</code> registers the extension of the layer in the Model Optimizer that will be found by a specific name (it is mandatory to join module name and layer name): <code>'rpn.proposal_layer.ProposalLayer'</code>.
-
-The second call prevents the Model Optimizer from using this extension as if it is an extension for a layer with type `Proposal`. Otherwise, this layer can be chosen as an implementation of extension that can lead to potential issues.
-
-**Summary**
-
-In this section you implemented support for a custom layer with type `Python` that is `Proposal` layer in the topology. You learned how to calculate output shape of this layer.
-
-The values of attributes are hardcoded, and in the next section you will learn how to extract these values from original framework model (Caffe model in this case).
-
-## Registering Rules to Pass Extension Layer Properties from a Caffe\* Model to the Intermediate Representation
-
-Model Optimizer now knows how to set the shape of the `PythonProposalOp` operation, but it is incorrect to initialize attributes with same values for every operation. Instead, the values should be extracted from the original topology. Model Optimizer does not know how to map the custom layer properties to the `PythonProposalOp`. For this, you must register the `FrontExtractorOp` instance.
-
-> **NOTE**: This step is required only if the layer requires parameters from the original model.
-
-1.	Remove call functions `register_caffe_python_extractor` and `Op.excluded_classes.append` from the file with `op`, because you will implement extracted attributes from prototxt by yourself.
-There are multiple types of layers in Caffe: for example, `Convolution` and `Pooling`. Also, there is a specific type for custom Python\* layers called `Python`. Therefore, it is necessary to distinguish between those 'usual' types of layers and custom ones. If you want to implement extensions for a layer with type different to `Python`, you need to inherit your class of operation (for example, `ProposalFrontExtractor`) from `FrontExtractorOp`. Otherwise, inherit your class of operation from `CaffePythonFrontExtractorOp`.
-2.  Create a file `python_proposal_ext.py` in the folder `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe`
-```py
-from mo.front.extractor import CaffePythonFrontExtractorOp
-class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
-    pass
-```
-For other layers types, inherit from `FrontExtractorOp`:
-```py
-	from mo.front.extractor import FrontExtractorOp
-	class ProposalFrontExtractor(FrontExtractorOp):
-		pass
-```
-You will implement extractor for layer with type `Python`, however, the steps are generally the same for layers with other types.
-3.  Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
-```py
-from mo.front.extractor import CaffePythonFrontExtractorOp
-class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
-    op = 'rpn.proposal_layer.ProposalLayer'
-    enabled = True
-```
-4.  Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function:
-```py
-from mo.front.extractor import CaffePythonFrontExtractorOp
-from mo.ops.op import Op
-class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
-    op = 'rpn.proposal_layer.ProposalLayer'
-    enabled = True
-    @staticmethod
-    def extract(node):
-        proto_layer = node.pb
-        param = proto_layer.python_param # each layer has a specific parameter, take a look at caffe.proto
-        python_params = str(param.param_str) # for Python layers, all params are in param_str
-        attrs = {
-            'feat_stride': int(python_params.split(':')[-1])
-        }
-        # update the attributes of the node
-        Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before
-        return __class__.enabled
-```
-> **NOTE:** if you implement extension for layer with type different to `Python`, change the following line: <code>Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)</code> to this line: <code>Op.get_op_class_by_name(__class__.op).update_node_stat(node, mapping_rule)</code>.
-You have successfully extracted the parameter `feat_stride` from `prototxt`, assuming it is the only parameter in this layer.
-5.  To increase the implementation flexibility:
-```py
-  from mo.front.extractor import CaffePythonFrontExtractorOp
-  from mo.ops.op import Op
-  class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
-      op = 'rpn.proposal_layer.ProposalLayer'
-      enabled = True
-      @staticmethod
-      def extract(node):
-          param = node.pb.python_param
-          attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
-          Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)
-          return ProposalPythonFrontExtractor.enabled
-```
-
-You can successfully convert the model. Open the `.xml` file and view your code:
-```xml
-...
-<layer id="42" name="proposal" precision="FP32" type="Python">
-    <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="[0.5, 1, 2]" scale="[8, 16, 32]"/>
-   <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>18</dim>
-            <dim>15</dim>
-            <dim>15</dim>
-        </port>
-        <port id="1">
-            <dim>1</dim>
-            <dim>36</dim>
-            <dim>15</dim>
-            <dim>15</dim>
-        </port>
-        <port id="2">
-            <dim>1</dim>
-            <dim>3</dim>
-        </port>
-     </input>
-     <output>
-        <port id="3">
-            <dim>300</dim>
-            <dim>5</dim>
-        </port>
-    </output>
-</layer>
-...
-```
-
-Look at the output shape of the custom layer you implemented. The shape was calculated according to the rules specified in `PythonProposalOp`. The `ratio` and `scale` properties have the value `[0.5, 1, 2]` and `[8, 16, 32]`. They have square brackets because they are originally a repeated parameter. You converted the parameter to a list in `PythonProposalOp`. Model Optimizer cast the value to a string. According to Python\* rules, a list has a string representation of opening and closing square brackets and values joined by commas.
-
-This is not a valid notation for the Intermediate Representation specification, because repeated parameters must be separated by a comma but without the brackets. Therefore, you must override the Model Optimizer default behavior regarding how it handles those parameters during the Intermediate Representation emitting stage, after the optimizations are complete. To do so, implement `backend_attrs()` in the `PythonProposalOp` class:
-```python
-class PythonProposalOp(Op):
-    ... other methods
-    def backend_attrs(self) -> list:
-        """
-        Gets list of attributes that should appear in resulting IR
-        Returns:
-            list of attributes names or list of tuples (name of attribute, pre-processing rule)
-        """
-        return [
-            (  # a tuple per attribute
-                'ratio',  # name of attribute
-                # pre-processing rule in a form of lambda
-                # lambda takes a PythonProposalOp node with all defined properties
-                # it translates [1,2,3] -> "1,2,3"
-                lambda node: ','.join(map(str, node['ratio']))
-            ),
-            (
-                'scale',
-                lambda node: ','.join(map(str, node['scale']))
-            ),
-            'feat_stride',
-            'base_size',
-            'min_size',
-            'pre_nms_topn',
-            'post_nms_topn',
-            'nms_thresh'
-            ]
-```
-The model can now be successfully converted.
-
-Open the `.xml` file. `ratio` and `scale` have the expected correct values `0.5,1,2` and `8,16,32`:
-```xml
-    ...
-
-    <layer id="33" name="proposal" precision="FP32" type="Python">
-        <data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="0.5,1,2" scale="8,16,32"/>
-        <input>
-          ...
-        </input>
-        <output>
-           ...
-        </output>
-    </layer>
-
-    ...
-```
-
-> **NOTE**: Model Optimizer supports the Faster-R-CNN topology. Run the following command for the same Intermediate Representation:
-
-```sh
-python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt --extensions <INSTALL_DIR>/deployment_tools/inference-engine/samples/object_detection_sample/fasterrcnn_extensions
-```
-
-**Summary**
-
-In this section you learned how to:
-
-1.  Create a framework-independent extension implementation of the Intermediate Representation custom layer with unified logic for calculating output shapes, specified set of attributes
-2.  Use the Framework-Specific property extractor to map original model custom layer properties to the expected properties of the Framework-Independent extension
-3.  Manipulate the custom layer properties representation in the resulting Intermediate Representation
-
-Files used in this section:
-
-*   `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops/python_proposal.py`:
-
-```py
-import networkx as nx
-import numpy as np
-from mo.front.extractor import attr_getter
-from mo.graph.graph import Node
-from mo.ops.op import Op
-
-class ProposalOp(Op):
-    op = 'Proposal'
-
-    def __init__(self, graph: nx.MultiDiGraph, attrs: dict):
-        mandatory_props = {
-            'type': __class__.op,
-            'op': __class__.op,
-            'post_nms_topn': 300,  # default in caffe-shared
-            'infer': ProposalOp.proposal_infer
-        }
-        super().__init__(graph, mandatory_props, attrs)
-
-    def supported_attrs(self):
-        return [
-            'feat_stride',
-            'base_size',
-            'min_size',
-            'ratio',
-            'scale',
-            'pre_nms_topn',
-            'post_nms_topn',
-            'nms_thresh'
-        ]
-
-    def backend_attrs(self):
-        return [
-            'feat_stride',
-            'base_size',
-            'min_size',
-            ('ratio', lambda node: attr_getter(node, 'ratio')),
-            ('scale', lambda node: attr_getter(node, 'scale')),
-            'pre_nms_topn',
-            'post_nms_topn',
-            'nms_thresh',
-        ]
-
-    @staticmethod
-    def proposal_infer(node: Node):
-        input_shape = node.in_node(0).shape
-        out_shape = np.array([0, 0], dtype=np.int64)
-        # rois blob: holds R regions of interest, each is a 5 - tuple
-        # (n, x1, y1, x2, y2) specifying an image batch index n and a
-        # rectangle(x1, y1, x2, y2)
-        out_shape[0] = input_shape[0] * node.post_nms_topn
-        out_shape[1] = 5
-        node.out_node(0).shape = out_shape
-```
-*   `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe/python_proposal_ext.py`:
-
-```py
-from mo.front.extractor import CaffePythonFrontExtractorOp
-from mo.ops.op import Op
-
-class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
-    op = 'rpn.proposal_layer.ProposalLayer'
-    enabled = True
-
-    @staticmethod
-    def extract(node):
-        param = node.pb.python_param
-        attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
-        Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)
-        return ProposalPythonFrontExtractor.enabled
-```
+This page is deprecated. Please, refer to [Model Optimizer Extensibility](Customize_Model_Optimizer.md) page for more information.
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md
@@ -1,10 +1,23 @@
 # Legacy Mode for Caffe* Custom Layers  {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Legacy_Mode_for_Caffe_Custom_Layers}

-> **NOTE**: This functionality is deprecated and will be removed in future releases.
+> **NOTE**: This functionality is deprecated and will be removed in the future releases.

-Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe\* framework installed on your system. This chapter covers this option.
+Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe\* framework
+installed on your system. This approach has several limitations:

-> **NOTE**: Caffe Python\* API has an issue when layer name does not correspond to the name of its top. The fix was implemented on [BVLC Caffe\*](https://github.com/BVLC/caffe/commit/35a7b87ad87457291dfc79bf8a7e7cf7ef278cbb). The Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the fallback procedure.
+* If your layer output shape depends on dynamic parameters, input data or previous layers parameters, calculation of
+output shape of the layer via Caffe can be incorrect. For example, `SimplerNMS` is filtering out bounding boxes that do
+not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some
+noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an
+option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched
+Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism
+for such layers described in the [Model Optimizer Extensibility](Customize_Model_Optimizer.md).
+* It is not possible to produce Intermediate Representation on a machine that does not have Caffe installed.
+
+> **NOTE**: Caffe Python\* API has an issue when layer name does not correspond to the name of its top. The fix was
+> implemented on [BVLC Caffe\*](https://github.com/BVLC/caffe/commit/35a7b87ad87457291dfc79bf8a7e7cf7ef278cbb). The
+> Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the
+> fallback procedure.

 > **NOTE**: The Caffe fallback feature was validated against [this GitHub revision](https://github.com/BVLC/caffe/tree/99466224dac86ddb86296b1e727794fb836bd80f). You may have issues with forks or later Caffe framework versions.

@@ -25,7 +38,8 @@ Where:

 **Example**:

-1.  `Proposal` layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in the `proposal_param` property of the layer:
+1.  `Proposal` layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in
+the `proposal_param` property of the layer:
 ```shell
 \<CustomLayer NativeType="Proposal" hasParam ="true" protoParamName = "proposal_param"/\> 
 ```
@@ -34,16 +48,6 @@ Where:
 \<CustomLayer NativeType="CustomLayer" hasParam ="false"/\>
 ```

-For this feature, you need an appropriate version of Caffe installed on the computer on which you run the Model Optimizer.
-
-## Constraints of Using the Caffe Fallback
-
-Several layers in the Caffe\* framework can have shapes that dynamically depend on the input data, not only the layers that proceed the layer and its parameters. For example, `SimplerNMS` is filtering out bounding boxes that do not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism for such layers.
-
-Known cases like `Proposal`, `DetectionOutput`, `SimplerNMS` are implemented as extensions and can be used out of the box.
-
-A detailed description of supported layers is in the [Operations Specification](../../../ops/opset.md) document.
-
 ## Building Caffe\*

 1.  Build Caffe\* with Python\* 3.5:
@@ -68,4 +72,4 @@ python3
 import caffe
 ```

-If Caffe was installed correctly, the `caffe` module is imported without errors.
+If Caffe was installed correctly, the `caffe` module is imported without errors.
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md
@@ -1,363 +1,4 @@
 # Sub-Graph Replacement in the Model Optimizer  {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer}

-Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow\*, but potentially could be actual for models created in any framework:
-
-*   Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations. A hint would be a description of this combination to the tool).
-*   Sub-graph of operations in the topology expresses a single layer known to Inference Engine.
-*   TensorFlow and Inference Engine use different layouts of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (for example, all the dimensions are squashed into single dim), it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer cannot produce correct Intermediate Representation.
-
-The detailed solutions for the examples above are given later, the next subsection shows what is common in all three examples.
-
-## Sub-graph Replacement
-
-In these cases, the sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node). The sub-graph replacement consists of the following steps:
-
-1.  Identify an existing sub-graph for replacement
-
-2.  Generate a new sub-graph
-
-3.  Connect a new sub-graph to the graph (create input/output edges to the new sub-graph)
-
-4.  Create output edges out of a new sub-graph to the graph
-
-5.  Do something with the original sub-graph (for example, remove it)
-
-Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods.
-
-## Replace a Single Operation with a Sub-graph of Operations
-
-For example, there is an operation `SquaredDifference` in TensorFlow which calculates \f$(a - b)^2\f$, where \f$a\f$ and \f$b\f$ are input tensors. Inference Engine does not support such operation. However, `SquaredDifference` could be expressed using two `Power` operations and one `Eltwise Add`. The `Power` operation calculates \f$scale * (a ^ {power}) + shift\f$, where \f$a\f$ is a tensor and \f$scale\f$, \f$power\f$ and \f$shift\f$ are float values. The first `Power` operation negates the value of tensor \f$b\f$. The second one is used to square the result of \f$a + (- b)\f$ which is calculated using the `Eltwise Add` operation applied to tensor \f$a\f$ and tensor \f$-b\f$.
-
-Given that, we can replace all `SquaredDifference` operations in the initial model with two `Power` and one `Eltwise` operations. The replacer is implemented in the following file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/SquaredDifference.py`.
-```python
-import networkx as nx
-from mo.front.common.replacement import FrontReplacementOp
-from mo.graph.graph import Node
-from mo.ops.eltwise import Eltwise
-from mo.ops.power import Power
-class SquaredDifference(FrontReplacementOp):
-    """
-    Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline.
-    This class replaces a single op SquaredDifference by a sub-graph consisting of 3 lower-level ops.
-    """
-    op = "SquaredDifference"
-    enabled = True
-    def replace_op(self, graph: nx.MultiDiGraph, node: Node):
-        negate = Power(graph, dict(scale=-1, name=node.name + '/negate_'))
-        add = Eltwise(graph, dict(operation='sum', name=node.name + '/add_'))
-        squared = Power(graph, dict(power=2, name=node.name + '/squared_'))
-        out_node = squared.create_node([add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])])
-        # Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0.
-        # The "explicit" version of the return value is: [(out_node.id, 0)])
-        return [out_node.id]
-```
-Model Optimizer internal representation of the graph uses the networkx module.
-
-**Key lines**:
-
-*   Line 1: Imports this module.
-
-*   Line 3: Imports class `FrontReplacementOp` that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (identifies an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline, all nodes in the graph are operation nodes or nodes of type `Const` that produce tensor with fixed value embedded into the node.
-
-*   Line 4: Imports class `Node` representing a single node in the computation graph.
-
-*   Lines 5 - 6: Import classes representing operations `Power` and `Eltwise`. These classes are inherited from base class `mo.ops.Op` that represents operation and stores its attributes.
-
-*   Line 9: Defines class `SquaredDifference` inherited from `FrontReplacementOp`. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front`, it is used for replacement for all supported frameworks.
-
-*   Line 15: Defines the class variable `op` that stores the name of the operation to be replaced. In this case, it is `SquaredDifference`.
-
-*   Line 16: Defines class variable `enabled` that controls whether the replacer is enabled or not. The only function that should be implemented in the class is `replace_op`. It gets graph to operate on and an instance of node of desired operation (`SquaredDifference` in this case). This function performs step two and three of the sub-graph replacement (generates a new sub-graph to replace with and connects a new sub-graph to the graph).
-
-*   Lines 19 - 21: Create instances of operations classes with required attributes.
-
-*   Line 23: Creates a sub-graph from the operations defined above. The `create_node` method of the `Op` class generates `Node` from the `Op` and uses single mandatory argument - the list of input nodes (represented as instances of `Node` class) to create input edges to the node being generated. Inputs of the `SquaredDifference` node are retrieved using `node.in_node(0)` and `node.in_node(1)` method calls. The `Eltwise Add` node gets first input as initial first input of `SquaredDifference` node, the second input of `add` is the result of negation of the second input of `SquaredDifference` node: `[add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]`. Then the result of `Add` node is squared. `out_node` node performs this calculation.
-
-The `replace_op` function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, `SquaredDifference` produces single tensor through output port 0, so the returned list contains single element. In general, each element is a tuple, where the first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0, it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned value is used to create the new sub-graph output edges (step 4 of the sub-graph replacement).
-
-Default implementation of the `FrontReplacementOp` class removes matched node and all its input/output edges (step 5 of the sub-graph replacement).
-
-Another example of such kind of replacement is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/Sub.py` class where all instances of `Sub` operations are replaced with two operations: `Power` to negate the second argument and the `Eltwise` to perform elementwise add.
-
-## Replace Sub-graph of Operations with a New Sub-graph of Operations
-
-The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three ways to achieve that:
-
-*  Use graph isomorphism pattern of the networkx module
-
-*  Use nodes name pattern to identify `scope` (according to TensorFlow terminology) to be replaced
-
-*  Use sets of `start` and `end` node names to match all nodes "between" them
-
-The next sections explain each option using real examples.
-
-### Replace Sub-graph of Operations Using Graph Isomorphism Pattern <a name="replace-using-isomorphism-pattern"></a>
-
-networkx Python\* module provides methods to find graph isomorphic to the given one using nodes and edges match: for example, `networkx.algorithms.isomorphism.categorical_node_match`, `networkx.algorithms.isomorphism.categorical_multiedge_match`. Model Optimizer uses these methods and provides simple API to use that feature.
-
-For example, the Caffe\* has layer called [Mean-Variance Normalization (MVN)](http://caffe.berkeleyvision.org/tutorial/layers/mvn.html), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow: `Mean`, `StopGradient`, `SquaredDifference`, `Squeeze` and `FusedBatchNorm`. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type `MVN`.
-
-The file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/mvn.py` performs such a replacement. The first part of the file is:
-```python
-class MVN(FrontReplacementSubgraph):
-    enabled = True
-    def pattern(self):
-        log.debug('Enabled MVN replacement')
-        return dict(
-            nodes=[
-                ('mean', dict(op='Mean')),
-                ('stop_grad', dict(op='StopGradient')),
-                ('sqdiff', dict(op='SquaredDifference')),
-                ('variance', dict(op='Mean')),
-                ('squeeze_mean', dict(op='Squeeze')),
-                ('squeeze_variance', dict(op='Squeeze')),
-                ('fbn', dict(op='FusedBatchNorm')),
-            ],
-            edges=[
-                ('mean', 'stop_grad', {'in': 0}),
-                ('stop_grad', 'sqdiff', {'in': 1}),
-                ('sqdiff', 'variance', {'in': 0}),
-                ('mean', 'squeeze_mean', {'in': 0}),
-                ('variance', 'squeeze_variance', {'in': 0}),
-                ('squeeze_mean', 'fbn', {'in': 3}),
-                ('squeeze_variance', 'fbn', {'in': 4}),
-            ],
-            node_attrs=['op'],
-            edge_attrs=['in'])
-```
-**Key lines**:
-
-*   Line 1: Defines class `MVN` inherited from class `FrontReplacementSubgraph` that performs sub-graph replacement using sub-graph isomorphism pattern.
-
-*   Line 3: Sets class variable `enabled` to value True meaning that this replacer is enabled.
-
-*   The function `pattern` defines the sub-graph constraints to be matched. It returns a dictionary with four keys:
-
-    *   the `nodes` defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node, the second element is a dictionary with desired attributes of the node.
-	
-    *   the `edges` defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes.
-	
-    *   the `node_attrs` contains the names of nodes attributes to use during sub-graph isomorphism search.
-	
-    *   the `edge_attrs` contains the names of edges attributes to use during sub-graph isomorphism search. 
-	
-        The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent, the sub-graph is not matched.
-*   Line 9: Adds constraint that sub-graph should contain node with attribute `op` with value `Mean`. The matched node gets an alias name `mean`. The same way the line 10 add constrain for node `StopGradient`, the matched node gets an alias name `stop_grad`.
-
-*   Line 18: Defines edge from node with alias name `mean` to node with alias name `stop_grad` having attribute `in` equal to 0. This means that the output of node `mean` is connected to the node `stop_grad` as a first input (Model Optimizer uses zero-based indexing that is why `in` is 0). Another example of defining the edges constraints is in line 25 where the edge from `squeeze_mean` is connected to the `fbn` node as fourth input.
-
-*   Lines 26 - 27: Specify a list of attributes to be checked. In fact, these lists are just list of all keys in the dictionaries for node and edge attributes.
-
-Now when the Model Optimizer knows how to find sub-graph (step 1 of the sub-graph replacement), it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is:
-```python
-def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict):
-    fbn = match['fbn']
-    input = fbn.in_node(0)
-    log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name))
-    if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id:
-        return
-    log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name))
-    MVN = Op.get_op_class_by_name('MVN')
-    mvn = MVN(graph, dict(
-        name=fbn.name + '/MVN_',
-        eps=fbn.eps,
-        required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3]
-    ))
-    mvn.attrs['old_infer'] = mvn.attrs['infer']
-    mvn.attrs['infer'] = __class__.infer
-    mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_'))
-    add = Eltwise(graph, dict(operation='sum', name=fbn.name + '/Add_'))
-    input_gamma = fbn.in_node(1)
-    input_beta = fbn.in_node(2)
-    mean_reduction = match['mean'].in_node(1)
-    variance_reduction = match['mean'].in_node(1)
-    new_subgraph = add.create_node([
-        mul.create_node([
-            mvn.create_node([input, mean_reduction, variance_reduction]),
-            input_gamma
-        ]),
-        input_beta
-    ])
-    replace_node(fbn, new_subgraph)
-```
-The function accepts two arguments - the graph and the dictionary `match`. The keys in the dictionary are the alias names of matched nodes (defined in the `nodes` list in the function `pattern`) and the values are the matched node of the graph (the instance of Node object).
-
-The function generates new sub-graph with node of type `MVN` and two nodes of the type `Eltwise` calculating sum and product. There is nothing interesting in how the graph is generated and mathematics behind that, so attention will be put to two aspects of this function.
-
-The first one is the call to function `replace_node` in line 36. `FusedBatchNorm` node is replaced with the output node of the generated sub-graph: all input edges of the `FusedBatchNorm` node are re-connected to the `new_subgraph` node, all consumers of the `FusedBatchNorm` node are updated to get inputs from the `new_subgraph` node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement).
-
-The second one is that the default implementation of the inference function for `MVN` operation is overwritten. In line 16, the default implementation of the inference function for `MVN` is saved to attribute `old_infer`. In line 17, the new inference function is saved to the instance of the `MVN` operation class. The new inference function code looks the following way:
-```python
-@staticmethod
-def infer(node: Node):
-    if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')):
-        log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name))
-        return
-    if not(all(node.in_node(1).value == node.required_reduction_indices) and
-        all(node.in_node(2).value == node.required_reduction_indices)):
-        log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format(
-            node.in_node(1).value,
-            node.in_node(2).value,
-            node.required_reduction_indices
-        ))
-        return
-    node.graph.remove_edge(node.in_node(1).id, node.id)
-    node.graph.remove_edge(node.in_node(2).id, node.id)
-    node.old_infer(node)
-```
-The `infer` function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom `infer` function performs additional checks that describe limitations of the `MVN` layer implementation in the Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the `MVN` node attributes. This is due to different `MVN` layer implementation in Inference Engine and TensorFlow\*: `mean` and `variance` are attributes of the node in Inference Engine while in TensorFlow they are input tensors. Edges are not removed in the `replace_sub_graph` function, because these edges are used in the `infer` function (lines 7-12).
-
-The last action in the `infer` method (line 19) is to call default infer function for the `MVN`, which is saved in the attribute `old_infer` of the node to infer output tensors shapes.
-
-On the step 5 of the sub-graph replacement, six matching nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. Six matching nodes are no more connected to the inputs of the network after replacing node `fbn` with a newly created sub-graph node. Since they are not marked as output nodes (using `--output` command line parameter), they could be removed.
-
-The replacement works for all sub-graph isomorphism instances found in the network.
-
-### Replace Sub-graph of Operations Using Nodes Name Pattern
-
-TensorFlow uses a mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides a graph into logical blocks that are easier to review in TensorBoard\*. The `scope`, in fact, just defines a common prefix for the node names in the scope.
-
-For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other, but located in different places of the network. For example, Inception V4 from `tensorflow.contrib.slim` module has inception blocks `Mixed_5b`, `Mixed_5c` and `Mixed_5d` with exactly the same nodes with the same attributes.
-
-Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called `InceptionBlock` and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (scope). In this particular case, some of the patterns are: `.*InceptionV4/Mixed_5b`, `.*InceptionV4/Mixed_5c` and `.*InceptionV4/Mixed_5d`. Each pattern starts with `.*`, because a prefix `InceptionV4` is added to all nodes names during a model freeze.
-
-The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. You should do the following additional steps in comparison with previously described replacements:
-
-1.  Prepare configuration file template defining node names patterns and information about custom layer attributes.
-
-2.  Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs.
-
-Consider the following possible configuration file for the Inception Block replacer:
-```json
-[
-    {
-        "custom_attributes": {
-            "attr1_key": "attr1_value",
-            "attr2_key": 123456
-        },
-        "id": "InceptionBlockReplacer",
-        "op": "InceptionBlock",
-        "instances": [
-            ".*InceptionV4/Mixed_5b",
-            ".*InceptionV4/Mixed_5c",
-            ".*InceptionV4/Mixed_5d"
-        ],
-        "match_kind": "scope"
-    }
-]
-```
-The `.json` file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys:
-
-*   `id` (mandatory) is a unique identifier of the replacer. It is used in the Python\* code that implements sub-graph replacement to link the class and the replacement description from the configuration file.
-
-*   `match_kind` (mandatory) is a string that specifies what matching algorithm is used. Currently supported `scope` and `points`. In this example, the first one is considered. The `points` match kind is described below.
-
-*   `instances` (mandatory) specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind `scope`.
-
-*   `custom_attributes` (optional) is a dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation `.xml` file.
-
-*   `op` (optional) is used only if the sub-graph replacement Python code is not needed, because the sub-graph should be replaced with a single node of type `op`. If this attribute is not set, it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example.
-
-When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter `--tensorflow_custom_operations_config_update` pointing to the generated configuration file. If the file is correct, Model Optimizer adds two keys to the `InceptionBlockReplacer` dictionary: `inputs` and `outputs` with the following content:
-```json
-[
-    {
-        "id": "InceptionBlockReplacer",
-        ...
-        "inputs": [
-            [
-                {
-                    "node": "Branch_2/Conv2d_0a_1x1/Conv2D$",
-                    "port": 0
-                },
-                {
-                    "node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
-                    "port": 0
-                },
-                {
-                    "node": "Branch_1/Conv2d_0a_1x1/Conv2D$",
-                    "port": 0
-                },
-                {
-                    "node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
-                    "port": 0
-                }
-            ]
-        ],
-        "outputs": [
-            {
-                "node": "concat$",
-                "port": 0
-            }
-        ]
-    }
-]
-```
-The value for key `inputs` is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the `instances`. Denote these nodes as input nodes of the sub-graph.
-
-In the InceptionV4 topology, the `InceptionV4/Mixed_5b` block has four input tensors from outside of the sub-graph, but all of them are produced by the node `InceptionV4/Mixed_5a/concat`. Therefore, the top-level list of the `inputs` contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by `InceptionV4/Mixed_5a/concat` node. In this case, all four input nodes consume input tensor into port 0.
-
-The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order in which the Model Optimizer attaches input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call `match.single_input_node(i)` in the sub-graph replacer code. More information about API is given below. If you need to change the order of input tensors, you can edit the configuration file in the text-editor.
-
-The value for the key `outputs` is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph, which could be obtained using call `match.output_node(i)`. The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.
-
-Now, when meaning of `inputs` and `outputs` attributes is clean, return back to the replacer implementation. The replacer `InceptionBlockReplacer` contains attribute `op` with the value `InceptionBlock`, which means that the identified sub-graph should be replaced with a single layer of type `InceptionBlock`. This layer is not known for the Model Optimizer, so it is necessary to define it. See [Extending the Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md). You must create file `extension/ops/InceptionBlock.py` with the following content:
-```python
-import numpy as np
-from mo.graph.graph import Node
-from mo.ops.op import Op
-class InceptionBlock(Op):
-    op = "InceptionBlock"
-    enabled = True
-    def __init__(self, graph, attrs):
-        super().__init__(graph, attrs, {
-            'type': __class__.op,
-            'op': __class__.op,
-        })
-```
-The shape inference function is not defined. In this case, Model Optimizer uses TensorFlow fallback to calculate shapes of the sub-graph output tensors.
-
-Run the Model Optimizer with the regular command line parameters, path to the model file and input shape (if necessary), and the parameter `--tensorflow_use_custom_operations_config` and point to the created configuration file. Model Optimizer generates Intermediate Representation `.xml` file with three sequential layers of type `InceptionBlock` like in the following example:
-```xml
-<layer id="1658" name="InceptionBlock1877" precision="FP32" type="InceptionBlock">
-    <input>
-        <port id="0">
-            <dim>1</dim>
-            <dim>384</dim>
-            <dim>35</dim>
-            <dim>35</dim>
-        </port>
-    </input>
-    <output>
-        <port id="1">
-            <dim>1</dim>
-            <dim>384</dim>
-            <dim>35</dim>
-            <dim>35</dim>
-        </port>
-    </output>
-</layer>
-```
-The implementation of the sub-graph replacement by scope with a single layer is complete. The next subsection explains 
-how Model Optimizer replaces sub-graph identified by start/end nodes (`points`) with another sub-graph.
-
-### <a name="sub_graph_replacement_using_points"></a> Replace Sub-graph of Operations Using Points
-In this scenario, for the matching algorithm user defines the sub-graph via a set of "start" and "end" nodes.
-Given the set, the Model Optimizer performs the following steps:
-1. Starts graph traversal from every _start_ nodes following the direction of the graph edges.
-The search stops in _end_   nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph.
-2. Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from "start" set.
-In this step the edges are traversed in the opposite edge direction. All newly visited nodes are added to the
-   matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the
-   matched sub-graph.
-3. Checks that all "end" nodes were reached from "input" nodes. If no then exit with error.
-4. Check that there are no "Placeholder" operations among added nodes. If it is not true then some side branch of
-   the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct so exit with error.
-
-This algorithm finds all nodes "between" start and end nodes. Also nodes needed for calculation of non-input nodes of the
-matched sub-graph produce _constant_ values because they do not depend on input of the network.
-**This sub-graph match has a limitation that each start node must have only one input**. Therefore, it is not possible
-to specify, for example, convolution node as input because it has two inputs: data tensor and tensor with weights.
-
-For example of replacement with points, please refer to the case-study of the 
-[conversion for the SSD models, created with TensorFlow Object Detection API](TensorFlow_SSD_ObjectDetection_API.md).
+The document has been deprecated. Refer to the [Model Optimizer Extensibility](Customize_Model_Optimizer.md)
+for the up-to-date documentation.
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md
@@ -1,449 +0,0 @@
-#  Converting Faster R-CNN models, created with TensorFlow Object Detection API  {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_Faster_RCNN_ObjectDetection_API}
-
-This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results.
-
-## Converting models created with TensorFlow Object Detection API version equal or higher than 1.6.0
-This chapter describes how to convert selected Faster R-CNN models from the TensorFlow Object Detection API zoo version equal or higher than 1.6.0. The full list of supported models is provided in the table below. Note that currently batch size 1 is supported only. The only Inference Engine plugin supporting these topologies inference is CPU.
-
-The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read chapter about [enabling TensorFlow Object Detection API SSD models](TensorFlow_SSD_ObjectDetection_API.md) first. Detailed information about Faster R-CNN topologies is provided [here](https://arxiv.org/abs/1506.01497).
-
-The TensorFlow network consists of a number of big blocks grouped by scope:
-
-*   `Preprocessor` performs scaling/resizing of the image and converts input data to [0, 1] interval. Has two outputs: the first one is modified input image and the second one is a constant tensor with shape (batch_size, 3) and values (resized_image_height, resized_image_width, 3).
-
-*    `FirstStageFeatureExtractor` is a backbone feature extractor.
-
-*    `FirstStageBoxPredictor` calculates boxes and classes predictions.
-
-*    `GridAnchorGenerator`  generates anchors coordinates.
-
-*    `ClipToWindow` crops anchors to the resized image size.
-
-*    `Decode` decodes coordinates of boxes using anchors and data from the `FirstStageBoxPredictor`.
-
-*    `BatchMultiClassNonMaxSuppression` performs non maximum suppression.
-
-*    `map` scales coordinates of boxes to [0, 1] interval by dividing coordinates by (resized_image_height, resized_image_width).
-
-*    `map_1` scales coordinates from [0, 1] interval to resized image sizes.
-
-*    `SecondStageFeatureExtractor` is a feature extractor for predicted Regions of interest (ROIs).
-
-*    `SecondStageBoxPredictor` refines box coordinates according `SecondStageFeatureExtractor`.
-
-*    `SecondStagePostprocessor` is Detection Output layer performing final boxes predictions.
-
-### Sub-graph replacements
-There are three sub-graph replacements defined in the `extensions/front/tf/legacy_faster_rcnn_support.json` used to convert these models:
-
-*   the first one replaces the `Preprocessor` block. The implementation of this replacer is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py`
-
-*   the second one replaces a number of blocks in the the graph including `GridAnchorGenerator`, `ClipToWindow`, `Decode`, `BatchMultiClassNonMaxSuppression`, `Tile`, `Tile_1` and `map` with Proposal and ROIRooling layers and some additional layers to pre-process input data
-
-*   the third one replaces `SecondStagePostprocessor` with a DetectionOutput layer.
-
-The second replacer is defined using the following configuration that matches sub-graph by points:
-
-```json
-    {
-        "custom_attributes": {
-            "nms_threshold": 0.7,
-            "feat_stride": 16,
-            "max_proposals": 100,
-            "anchor_base_size": 256,
-            "anchor_scales": [0.25, 0.5, 1.0, 2.0],
-            "anchor_aspect_ratios": [0.5, 1.0, 2.0],
-            "roi_spatial_scale": 0.0625
-        },
-        "id": "TFObjectDetectionAPIFasterRCNNProposalAndROIPooling",
-        "include_inputs_to_sub_graph": true,
-        "include_outputs_to_sub_graph": true,
-        "instances": {
-            "end_points": [
-                "CropAndResize",
-                "map_1/TensorArrayStack/TensorArrayGatherV3",
-                "map_1/while/strided_slice/Enter",
-                "BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3"
-            ],
-            "start_points": [
-                "FirstStageBoxPredictor/concat",
-                "FirstStageBoxPredictor/concat_1",
-                "GridAnchorGenerator/Identity",
-                "Shape",
-                "CropAndResize"
-            ]
-        },
-        "match_kind": "points"
-    }
-```
-
-The `start_points` list contains the following nodes:
-
-*   `FirstStageBoxPredictor/concat` node produces box coordinates predictions.
-
-*   `FirstStageBoxPredictor/concat_1` node produces classes predictions which will be used for the ROIs
-
-*   `GridAnchorGenerator/Identity` node produces anchors coordinates.
-
-*   `Shape` and `CropAndResize` nodes are specified as inputs to correctly isolate the required sub-graph. Refer to the [chapter](Subgraph_Replacement_Model_Optimizer.md) for more information about replacements by points.
-
-The `end_points` list contains the following nodes:
-
-*   `CropAndResize` is the node that performs ROI pooling operation.
-
-*   `map_1/TensorArrayStack/TensorArrayGatherV3`, `map_1/while/strided_slice/Enter` and `BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3` are specified to correctly isolate the sub-graph.
-
-The `custom_attributes` dictionary contains attributes where most values are taken from the topology-specific configuration file `samples/configs/faster_rcnn_*.config` of the [TensorFlow Object Detection API repository](https://github.com/tensorflow/models/tree/master/research/object_detection):
-
-*   `nms_threshold` is the value of the `first_stage_nms_iou_threshold` parameter.
-
-*   `feat_stride` is the value of the `height_stride` and `width_stride` parameters. Inference Engine supports case when these two values are equal that is why the replacement configuration file contains just one parameter.
-
-*   `max_proposals` is the value of the `max_total_detections` parameter which is a maximum number of proposal boxes from the Proposal layer and detected boxes.
-
-*   `anchor_base_size` is the base size of the generated anchor. The 256 is the default value for this parameter and it is not specified in the configuration file.
-
-*   `anchor_scales" is the value of the `scales` attrbite.
-
-*   `anchor_aspect_ratios` is the value of the `aspect_ratios` attribute.
-
-*   `roi_spatial_scale` is needed for the Inference Engine ROIPooling layer. It is the default value that is not actually used.
-
-The identifier for this replacer is `TFObjectDetectionAPIFasterRCNNProposalAndROIPooling`. The Python implementation of this replacer is in the file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/FasterRCNNs.py`.
-
-The first four functions of the replacer class are the following:
-
-```python
-class TFObjectDetectionAPIFasterRCNNProposalAndROIPooling(FrontReplacementFromConfigFileSubGraph):
-    """
-    This class replaces sub-graph of operations with Proposal and ROIPooling layers and additional layers transforming
-    tensors from layout of TensorFlow to layout required by Inference Engine.
-    Refer to comments inside the function for more information about performed actions.
-    """
-    replacement_id = 'TFObjectDetectionAPIFasterRCNNProposalAndROIPooling'
-
-    def run_after(self):
-        return [PreprocessorReplacement]
-
-    def run_before(self):
-        return [SecondStagePostprocessorReplacement]
-
-    def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
-        return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id}
-
-    def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
-        new_list = match.matched_nodes_names().copy()
-        # do not remove nodes that produce box predictions and class predictions
-        new_list.remove(match.single_input_node(0)[0].id)
-        new_list.remove(match.single_input_node(1)[0].id)
-        return new_list
-```
-
-The function `run_after` returns list of Python classes inherited from one of the replacer classes (`FrontReplacementOp`, `FrontReplacementPattern`, `FrontReplacementFromConfigFileSubGraph` etc) those current sub-graph replacement class must be run after. In this case the replacer must be run after the `Preprocessor` is removed by the `PreprocessorReplacement` replacer. Similar way the `run_before` function is used to tell Model Optimizer to execute `SecondStagePostprocessorReplacement` before this replacer.
-
-The `output_edges_match` function describes matching between the output nodes of the sub-graph before replacement and after. In this case the only needed output node of the sub-graph is the `CropAndResize` node which is identified with `match.output_node(0)[0]`. The new output node which is created in the `generate_sub_graph` function is identified with `new_sub_graph['roi_pooling_node']`.
-
-The `nodes_to_remove` function takes the default list of nodes to be removed which contains all matched nodes and remove from them two input nodes which are identified with `match.single_input_node(0)[0]` and `match.single_input_node(1)[0]`. These nodes will be connected as inputs to new nodes being generated in the `generate_sub_graph` function so they should node be removed.
-
-The code generating new sub-graph is the following:
-
-```python
-    def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
-        log.debug('TFObjectDetectionAPIFasterRCNNProposal: matched_nodes = {}'.format(match.matched_nodes_names()))
-
-        config_attrs = match.custom_replacement_desc.custom_attributes
-        nms_threshold = config_attrs['nms_threshold']
-        feat_stride = config_attrs['feat_stride']
-        max_proposals = config_attrs['max_proposals']
-        anchor_base_size = config_attrs['anchor_base_size']
-        roi_spatial_scale = config_attrs['roi_spatial_scale']
-        proposal_ratios = config_attrs['anchor_aspect_ratios']
-        proposal_scales = config_attrs['anchor_scales']
-        anchors_count = len(proposal_ratios) * len(proposal_scales)
-```
-
-These lines get parameters defined in the sub-graph replacement configuration file and calculate initial anchors count.
-
-```python
-        # get the ROIPool size from the CropAndResize which performs the same action
-        if 'CropAndResize' not in graph.nodes():
-            raise Error('Failed to find node with name "CropAndResize" in the topology. Probably this is not Faster'
-                        ' RCNN topology or it is not supported')
-        roi_pool_size = Node(graph, 'CropAndResize').in_node(3).value[0]
-```
-
-The code above gets the ROI Pooling spatial output dimension size as a value from the fourth argument of the node with name `CropAndResize`.
-
-```python
-        # Convolution/matmul node that produces classes predictions
-        # Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax
-        predictions_node = match.single_input_node(1)[0].in_node(0).in_node(0)
-        permute_predictions_op = Permute(graph, {'order': np.array([0, 2, 3, 1])})
-        permute_predictions_node = permute_predictions_op.create_node([], dict(name=predictions_node.name + '/Permute_'))
-        insert_node_after(predictions_node, permute_predictions_node, 0)
-
-        reshape_classes_op = Reshape(graph, {'dim': np.array([0, -1, 2])})
-        reshape_classes_node = reshape_classes_op.create_node([permute_predictions_node],
-                                                              dict(name='Reshape_FirstStageBoxPredictor_Class_'))
-        update_attrs(reshape_classes_node, 'shape_attrs', 'dim')
-
-        softmax_conf_op = Softmax(graph, {'axis': 1})
-        softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node],
-                                                        dict(name='FirstStageBoxPredictor_SoftMax_Class_'))
-```
-
-The output with class predictions from the `FirstStageBoxPredictor` is generated with a convolution operation. The convolution output data layout in TensorFlow is NHWC while Inference Engine uses NCHW layout. Model Optimizer by default converts the weights of TensorFlow convolutions to produce output tensor in NCHW layout required by Inference Engine. The issue arises because the class predictions tensor is passed through the Softmax operation to produce class probabilities. The Inference Engine Softmax is performed over the fastest-changing dimension which is 'W' in Inference Engine. Thus, the softmax operation will be performed over a wrong dimension after conversion of the convolution node producing classes predicitions. The solution is to add Permute and Reshape operations to prepare the input data for Softmax. The Reshape operation is required to make the size of the fastest-changing dimension equal to 2, because there are 2 classes being predicted: background and foreground.
-
-Another issue is that layout of elements in the predicted classes tensor is different between TensorFlow and Inference Engine Proposal layer requirements. In TensorFlow the tensor has the following virtual layout [N, H, W, num_anchors, num_classes] while the Inference Engine Proposal layer requires in the following virtual layout [N, num_classes, num_anchors, H, W]. Thus, it is necessary to reshape, permute and then reshape again output from the Softmax to the required shape for the Proposal layer:
-
-```python
-        reshape_softmax_op = Reshape(graph, {'dim': np.array([1, anchors_count, 2, -1])})
-        reshape_softmax_node = reshape_softmax_op.create_node([softmax_conf_node], dict(name='Reshape_Softmax_Class_'))
-        update_attrs(reshape_softmax_node, 'shape_attrs', 'dim')
-
-        permute_reshape_softmax_op = Permute(graph, {'order': np.array([0, 1, 3, 2])})
-        permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([reshape_softmax_node],
-                                                                              dict(name='Permute_'))
-
-        # implement custom reshape infer function because we need to know the input convolution node output dimension
-        # sizes but we can know it only after partial infer
-        reshape_permute_op = Reshape(graph, {'dim': np.ones([4]), 'anchors_count': anchors_count,
-                                             'conv_node': predictions_node})
-        reshape_permute_op.attrs['old_infer'] = reshape_permute_op.attrs['infer']
-        reshape_permute_op.attrs['infer'] = __class__.classes_probabilities_reshape_shape_infer
-        reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node],
-                                                              dict(name='Reshape_Permute_Class_'))
-        update_attrs(reshape_permute_node, 'shape_attrs', 'dim')
-```
-
-The Proposal layer has 3 inputs: classes probabilities, boxes predictions and a input shape of the image. The first two tensors are ready so it is necessary to create the Const operation that produces the desired third input tensor.
-
-```python
-        # create constant input with the image height, width and scale H and scale W (if present) required for Proposal
-        const_value = np.array([[input_height, input_width, 1]], dtype=np.float32)
-        const_op = Const(graph, dict(value=const_value, shape=const_value.shape))
-        const_node = const_op.create_node([], dict(name='Proposal_const_image_size_'))
-```
-
-Now add the Proposal layer:
-
-```python
-
-        proposal_op = ProposalOp(graph, dict(min_size=10, framework='tensorflow', box_coordinate_scale=10,
-                                             box_size_scale=5, post_nms_topn=max_proposals, feat_stride=feat_stride,
-                                             ratio=proposal_ratios, scale=proposal_scales, base_size=anchor_base_size,
-                                             pre_nms_topn=2**31 - 1,
-                                             nms_thresh=nms_threshold))
-        proposal_node = proposal_op.create_node([reshape_permute_node,
-                                                 match.single_input_node(0)[0].in_node(0).in_node(0),
-                                                 const_node],
-                                                dict(name=proposal_op.attrs['type'] + '_'))
-```
-
-The box coordinates in the TensorFlow are in the following layout "YXYX" while Inference Engine uses "XYXY" layout so it is necessary to swap coordinates produced by Proposal layer. It is implemented with help of a convolution node with a special filter of a size [5, 5]:
-
-```python
-        proposal_reshape_4d_op = Reshape(graph, {'dim': np.array([max_proposals, 1, 1, 5])})
-        proposal_reshape_4d_node = proposal_reshape_4d_op.create_node([proposal_node], dict(name="reshape_4d_"))
-        update_attrs(proposal_reshape_4d_node, 'shape_attrs', 'dim')
-
-        # create convolution node to swap X and Y coordinates in the proposals
-        conv_filter_const_data = np.array(np.array([[1, 0, 0, 0, 0],
-                                                    [0, 0, 1, 0, 0],
-                                                    [0, 1, 0, 0, 0],
-                                                    [0, 0, 0, 0, 1],
-                                                    [0, 0, 0, 1, 0]],
-                                                   dtype=np.float32).reshape([1, 1, 5, 5]), dtype=np.float32)
-        conv_filter_const_op = Const(graph, dict(value=conv_filter_const_data, spatial_dims=np.array([2, 3])))
-        conv_filter_const_node = conv_filter_const_op.create_node([], dict(name="conv_weights"))
-
-        conv_op = Op(graph, {
-                        'op': 'Conv2D',
-                        'bias_addable': False,
-                        'spatial_dims': np.array([1, 2]),
-                        'channel_dims': np.array([3]),
-                        'batch_dims': np.array([0]),
-                        'pad': None,
-                        'pad_spatial_shape': None,
-                        'input_feature_channel': 2,
-                        'output_feature_channel': 2,
-                        'output_shape': [max_proposals, 1, 1, 5],
-                        'dilation': np.array([1, 1, 1, 1], dtype=np.int64),
-                        'stride': np.array([1, 1, 1, 1]),
-                        'type': 'Convolution',
-                        'group': None,
-                        'layout': 'NHWC',
-                        'infer': __class__.fake_conv_shape_infer})
-        predictions_node = conv_op.create_node([proposal_reshape_4d_node, conv_filter_const_node], dict(name="conv_"))
-        update_ie_fields(graph.node[predictions_node.id])
-
-        proposal_reshape_2d_op = Reshape(graph, {'dim': np.array([max_proposals, 5])})
-        proposal_reshape_2d_node = proposal_reshape_2d_op.create_node([predictions_node], dict(name="reshape_2d_"))
-        # set specific name for this Reshape operation so we can use it in the DetectionOutput replacer
-        proposal_reshape_2d_node['name'] = 'swapped_proposals'
-```
-
-The ROIPooling layer in TensorFlow is implemented with operation called `CropAndResize` with bi-linear filtration. Inference Engine implementation of the ROIPooling layer with bi-linear filtration requires input boxes coordinates be scaled to [0, 1] interval. Adding elementwise multiplication of box coordinates solves this issue:
-
-```python
-        # the TF implementation of Proposal with bi-linear filtration need proposals scaled by image size
-        proposal_scale_const = np.array([1.0, 1 / input_height, 1 / input_width, 1 / input_height, 1 / input_width],
-                                        dtype=np.float32)
-        proposal_scale_const_op = Const(graph, dict(value=proposal_scale_const, shape=proposal_scale_const.shape))
-        proposal_scale_const_node = proposal_scale_const_op.create_node([], dict(name='Proposal_scale_const_'))
-
-        scale_proposals_op = Eltwise(graph, {'operation': 'mul'})
-        scale_proposals_node = scale_proposals_op.create_node([proposal_reshape_2d_node, proposal_scale_const_node],
-                                                              dict(name='scale_proposals_'))
-```
-
-The last step is to create the ROIPooling node with 2 inputs: the identified feature maps from the `FirstStageFeatureExtractor` and the scaled output of the Proposal layer:
-
-```python
-        feature_extractor_output_nodes = scope_output_nodes(graph, 'FirstStageFeatureExtractor')
-        if len(feature_extractor_output_nodes) != 1:
-            raise Error("Failed to determine FirstStageFeatureExtractor output node to connect it to the ROIPooling."
-                        "Found the following nodes: {}".format([node.name for node in feature_extractor_output_nodes]))
-
-        roi_pooling_op = ROIPooling(graph, dict(method="bilinear", framework="tensorflow",
-                                                pooled_h=roi_pool_size, pooled_w=roi_pool_size,
-                                                spatial_scale=roi_spatial_scale))
-        roi_pooling_node = roi_pooling_op.create_node([feature_extractor_output_nodes[0], scale_proposals_node],
-                                                      dict(name='ROI_Pooling_'))
-
-        return {'roi_pooling_node': roi_pooling_node}
-```
-
-The are two additional methods implemented in the replacer class:
-
-*   The `fake_conv_shape_infer` is a silly infer function for the convolution that permutes X and Y coordinates of the Proposal output which avoids setting a lot of internal attributes required for propoper shape inference.
-
-*   The "classes_probabilities_reshape_shape_infer" function is used to update the output dimension of the reshape operation. The output spatial dimensions depends on the convolution output spatial dimensions thus they are not known until the shape inference pass which is performed after this sub-graph replacement class. So this custom infer function is called instead of default Reshape shape inference function, updates the required attribute "dim" of the node with the convolution output spatial dimensions which are known at the time of calling this inference function and then call the default Reshape inference function.
-
-```python
-    @staticmethod
-    def fake_conv_shape_infer(node: Node):
-        node.out_node(0).shape = node.in_node(0).shape
-        # call functions to update internal attributes required for correct IR generation
-        mark_input_bins(node)
-        assign_dims_to_weights(node.in_node(1), [0, 1], node.input_feature_channel, node.output_feature_channel, 4)
-
-    @staticmethod
-    def classes_probabilities_reshape_shape_infer(node: Node):
-        # now we can determine the reshape dimensions from Convolution node
-        conv_node = node.conv_node
-        conv_output_shape = conv_node.out_node().shape
-
-        # update desired shape of the Reshape node
-        node.dim = np.array([0, conv_output_shape[1], conv_output_shape[2], node.anchors_count * 2])
-        node.old_infer(node)
-```
-
-The second replacer defined in the sub-graph replacement configuration file replaces the `SecondStagePostprocessor` block and is defined using scope:
-
-```json
-    {
-        "custom_attributes": {
-            "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
-            "confidence_threshold": 0.01,
-            "keep_top_k": 300,
-            "nms_threshold": 0.6,
-            "pad_mode": "caffe.ResizeParameter.CONSTANT",
-            "resize_mode": "caffe.ResizeParameter.WARP",
-            "max_detections_per_class": 100,
-            "num_classes": 90
-        },
-        "id": "SecondStagePostprocessorReplacement",
-        "inputs": [
-            [
-                {
-                    "node": "Reshape$",
-                    "port": 0
-                }
-            ],
-            [
-                {
-                    "node": "Reshape_1$",
-                    "port": 0
-                }
-            ],
-            [
-                {
-                    "node": "ExpandDims$",
-                    "port": 0
-                }
-            ]
-        ],
-        "instances": [
-            ".*SecondStagePostprocessor/"
-        ],
-        "match_kind": "scope",
-        "outputs": [
-            {
-                "node": "BatchMultiClassNonMaxSuppression/map/TensorArrayStack/TensorArrayGatherV3$",
-                "port": 0
-            }
-        ]
-    }
-```
-
-The replacement code is similar to the `SecondStagePostprocessor` replacement for the SSDs topologies. The are two major difference:
-
-*   The tensor with bounding boxes doesn't contain locations for class 0 (background class) but Inference Engine Detection Output layer requires it. The Const node with some dummy values are created and concatenated with the tensor.
-
-*   The priors tensor is not constant like in SSDs so the bounding boxes tensor must be scaled with variances [0.1, 0.1, 0.2, 0.2].
-
-The described above difference are resolved with the following code:
-
-```python
-        # TF produces locations tensor without boxes for background.
-        # Inference Engine DetectionOutput layer requires background boxes so we generate them with some values
-        # and concatenate with locations tensor
-        fake_background_locs_blob = np.tile([[[1, 1, 2, 2]]], [max_detections_per_class, 1, 1])
-        fake_background_locs_const_op = Const(graph, dict(value=fake_background_locs_blob,
-                                                          shape=fake_background_locs_blob.shape))
-        fake_background_locs_const_node = fake_background_locs_const_op.create_node([])
-
-        reshape_loc_op = Reshape(graph, {'dim': np.array([max_detections_per_class, num_classes, 4])})
-        reshape_loc_node = reshape_loc_op.create_node([match.single_input_node(0)[0].in_node(0)],
-                                                      dict(name='Reshape_loc_'))
-
-        concat_loc_op = Concat(graph, {'axis': 1})
-        concat_loc_node = concat_loc_op.create_node([fake_background_locs_const_node, reshape_loc_node],
-                                                    dict(name='Concat_fake_loc_'))
-
-        # blob with variances
-        variances_blob = np.array([0.1, 0.1, 0.2, 0.2])
-        variances_const_op = Const(graph, dict(value=variances_blob, shape=variances_blob.shape))
-        variances_const_node = variances_const_op.create_node([])
-
-        # reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift
-        reshape_loc_2d_op = Reshape(graph, {'dim': np.array([-1, 4])})
-        reshape_loc_2d_node = reshape_loc_2d_op.create_node([concat_loc_node], dict(name='reshape_locs_2d_'))
-
-        # element-wise multiply locations with variances
-        eltwise_locs_op = Eltwise(graph, {'operation': 'mul'})
-        eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node],
-                                                        dict(name='scale_locs_'))
-```
-
-### Example of Model Optimizer Command-Line for TensorFlow's Faster R-CNNs
-The final command line to convert Faster R-CNNs from the TensorFlow* Object Detection Zoo is the following:
-
-```sh
-./mo.py --input_model=<path_to_frozen.pb> --output=detection_boxes,detection_scores,num_detections --tensorflow_use_custom_operations_config extensions/front/tf/legacy_faster_rcnn_support.json
-```
-
-Note that there are minor changes that should be made to the and sub-graph replacement configuration file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_faster_rcnn_support.json` before converting particular Faster R-CNN topology. Refer to the table below.
-
-### Sub-Graph Replacement Configuration File Parameters to Convert Different Faster R-CNN Models
-|Model Name | Configuration File Changes|
-|:----|:----:|
-| faster_rcnn_inception_v2_coco | None
-| faster_rcnn_resnet50_coco | None
-| faster_rcnn_resnet50_lowproposals_coco | None
-| faster_rcnn_resnet101_coco | None
-| faster_rcnn_resnet101_lowproposals_coco | None
-| faster_rcnn_inception_resnet_v2_atrous_coco | "feat_stride: 8"
-| faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco| "feat_stride: 8"
-
--- a/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md
+++ b/docs/MO_DG/prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md
@@ -1,339 +0,0 @@
-# (Deprecated) Case Study: Converting SSD Models Created with TensorFlow* Object Detection API {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_SSD_ObjectDetection_API}
-
-This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results.
-
-## Converting Models Created with TensorFlow Object Detection API Version prior 1.6.0
-
-As explained in the [Sub-graph Replacement in Model Optimizer](Subgraph_Replacement_Model_Optimizer.md) section, there are multiple
-ways to setup the sub-graph matching. In this example we are focusing on the defining the sub-graph via a set of
-"start" and "end" nodes.
-The result of matching is two buckets of nodes:
-* Nodes "between" start and end nodes.
-* Nodes connected to the first list, but just on the constant path (e.g. these nodes are not connected to the inputs of the entire graph).
-
-Let's look closer to the SSD models from the TensorFlow* detection model
-<a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">zoo</a>:
-[SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and
-[SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz).
-
-*   Nodes "between" start and end nodes
-*   Nodes connected to the first list, but just on the constant path (for example, these nodes are not connected to the inputs of the entire graph). Let's look closer to the SSD models from the TensorFlow\* detection model <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">zoo</a> : [SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and [SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz).
-
-A distinct layer of any SSD topology is the `DetectionOutput` layer. This layer is implemented with a dozens of primitive operations in TensorFlow, while in Inference Engine, it is one [layer](../../../ops/opset.md). Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the `DetectionOutput` layer with a single well-known `DetectionOutput` node.
-
-The Inference Engine `DetectionOutput` layer consumes three tensors in the following order:
-
-1.  Tensor with locations of bounding boxes
-2.  Tensor with confidences for each bounding box
-3.  Tensor with prior boxes (anchors in TensorFlow terminology)
-
-`DetectionOutput` layer produces one tensor with seven numbers for each actual detection. There are more output tensors in the TensorFlow Object Detection API, but the values in them are consistent with the Inference Engine ones.
-
-The difference with [other examples](Subgraph_Replacement_Model_Optimizer.md) is that here the `DetectionOutput` sub-graph is replaced with a new sub-graph (not a single layer).
-
-Look at sub-graph replacement configuration file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_ssd_support.json` that is used to enable two models listed above:
-```json
-[  
-    {  
-        "custom_attributes": {  
-            "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",  
-            "confidence_threshold": 0.01,  
-            "keep_top_k": 200,  
-            "nms_threshold": 0.45,  
-            "pad_mode": "caffe.ResizeParameter.CONSTANT",  
-            "resize_mode": "caffe.ResizeParameter.WARP"  
-        },  
-        "id": "TFObjectDetectionAPIDetectionOutput",  
-        "include_inputs_to_sub_graph": true,  
-        "include_outputs_to_sub_graph": true,  
-        "instances": {  
-            "end_points": [  
-                "detection_boxes",  
-                "detection_scores",  
-                "num_detections"  
-            ],  
-            "start_points": [  
-                "Postprocessor/Shape",  
-                "Postprocessor/Slice",  
-                "Postprocessor/ExpandDims",  
-                "Postprocessor/Reshape_1"  
-            ]  
-        },  
-        "match_kind": "points"  
-    },
-    {
-        "custom_attributes": {
-        },
-        "id": "PreprocessorReplacement",
-        "inputs": [
-            [
-                {
-                    "node": "map/Shape$",
-                    "port": 0
-                },
-                {
-                    "node": "map/TensorArrayUnstack/Shape$",
-                    "port": 0
-                },
-                {
-                    "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
-                    "port": 2
-                }
-            ]
-        ],
-        "instances": [
-            ".*Preprocessor/"
-        ],
-        "match_kind": "scope",
-        "outputs": [
-            {
-                "node": "sub$",
-                "port": 0
-            },
-            {
-                "node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
-                "port": 0
-            }
-        ]
-    }
-]
-```
-
-**Key lines**:
-
-*	Lines 3-10 define static attributes that will be saved to the Intermediate Representation `.xml` file for `DetectionOutput` layer.
-
-*	Lines 12 and 13 define values for attributes that should be always set to "true" for this release of the Model Optimizer. These two attributes are specific for sub-graph match by points only.
-
-*	Lines 14-26 define one instance of the sub-graph to be match. It is an important difference between sub-graph matching by scope and points. Several instances could be specified for matching by scope, but matching with points allows specifying just one instance. So the full node names (not regular expressions like in case of match with scope) are specified in `instances` dictionary.
-
-The second sub-graph replacer with identifier `PreprocessorReplacement` is used to remove the `Preprocessing` block from the graph. The replacer removes all nodes from this scope except nodes performing mean value subtraction and scaling (if applicable). Implementation of the replacer is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` file.
-
-Now let's analyze the structure of the topologies generated with the Object Detection API. There are several blocks in the graph performing particular task:
-
-*   `Preprocessor` block resizes, scales and subtracts mean values from the input image.
-
-*   `FeatureExtractor` block is a [MobileNet](https://arxiv.org/abs/1704.04861) or other backbone to extract features.
-
-*   `MultipleGridAnchorGenerator` block creates initial bounding boxes locations (anchors).
-
-*   `Postprocessor` block acts as a `DetectionOutput` layer. So we need to replace `Postprocessor` block with `DetectionOutput` layer. It is necessary to add all input nodes of the `Postprocessor` scope to the list `start_points`. Consider inputs of each of these nodes:
-
-	*   `Postprocessor/Shape` consumes tensor with locations.
-	*   `Postprocessor/Slice` consumes tensor with confidences.
-	*   `Postprocessor/ExpandDims` consumes tensor with prior boxes.
-	*   `Postprocessor/Reshape_1` consumes tensor with locations similarly to the `Postprocessor/Shape` node. Despite the fact that the last node `Postprocessor/Reshape_1` gets the same tensor as node `Postprocessor/Shape`, it must be explicitly put to the list.
-
-Object Detection API `Postprocessor` block generates output nodes: `detection_boxes`, `detection_scores`, `num_detections`, `detection_classes`.
-
-Now consider the implementation of the sub-graph replacer, available in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/SSDs.py`. The file is rather big, so only some code snippets are used:
-```python
-class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
-    replacement_id = 'TFObjectDetectionAPIDetectionOutput'
-```
-
-These lines define the new `PostprocessorReplacement` class inherited from `FrontReplacementFromConfigFileSubGraph`. `FrontReplacementFromConfigFileSubGraph` is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need:
-
-*   `generate_sub_graph` performs new sub-graph generation and returns dictionary where key is an alias name for the node and value is a Node objects. The dictionary has the same format as parameter `match` in the `replace_sub_graph` method in the example with <a href="Subgraph_Replacement_Model_Optimizer.html#replace-using-isomorphism-pattern">networkx sub-graph isomorphism pattern</a>. This dictionary is passed as argument to the next three methods, so it should contain entries the for nodes that the functions need.
-
-*   `input_edges_match` specifies mapping between input edges to sub-graph before replacement and after replacement. The key of the dictionary is a tuple specifying input tensor of the sub-graph before replacement: sub-graph input node name and input port number for this node. The value for this key is also a tuple specifying the node where this tensor should be attached during replacement: the node name (or alias name of the node) and the input port for this node. If the port number is zero, the parameter could be omitted so the key or value is just a node name (alias). Default implementation of the method returns an empty dictionary, so Model Optimizer does not create new edges.
-
-*   `output_edges_match` returns mapping between old output edges of the matched nodes and new sub-graph node and output edge index. The format is similar to the dictionary returned in the `input_edges_match` method. The only difference is that instead of specifying input port numbers for the nodes it is necessary to specify output port number. Of course, this mapping is needed for the output nodes only. Default implementation of the method returns an empty dictionary, so the Model Optimizer does not create new edges.
-
-*   `nodes_to_remove` specifies list of nodes that Model Optimizer should remove after sub-graph replacement. Default implementation of the method removes all sub-graph nodes.
-
-Review of the replacer code, considering details of the `DetectionOutput` layer implementation in the Inference Engine. There are several constraints to the input tensors of the `DetectionOutput` layer:
-
-*   The tensor with locations must be of shape `[#&zwj;batch, #&zwj;prior_boxes * 4]` or `[#&zwj;batch, #&zwj;prior_boxes * 5]` depending on shared locations between different batches or not.
-*   The tensor with confidences must be of shape `[#&zwj;batch, #&zwj;prior_boxes * #&zwj;classes]` and confidences values are in range [0, 1], that is passed through `softmax` layer.
-*   The tensor with prior boxes must be of shape `[#&zwj;batch, 2, #&zwj;prior_boxes * 4]`. Inference Engine expects that it contains variance values which TensorFlow Object Detection API does not add.
-
-To enable these models, add `Reshape` operations for locations and confidences tensors and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow Object Detection API).
-
-Look at the `generate_sub_graph` method:
-```python
-def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
-    log.debug('PostprocessorReplacement.generate_sub_graph')
-    log.debug('matched_nodes = {}'.format(match.matched_nodes_names()))
-    # softmax to be applied to the confidence
-    softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True})
-    softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_'))
-    # Inference Engine DetectionOutput layer consumes flattened tensors
-    # reshape operation to flatten locations tensor
-    reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
-    reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_'))
-    # Inference Engine DetectionOutput layer consumes flattened tensors
-    # reshape operation to flatten confidence tensor
-    reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
-    reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_'))
-    # create Node object from Op class
-    detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
-    detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
-    detection_output_op.attrs['infer'] = __class__.do_infer
-    detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_'))
-    # create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the
-    # detection output with output of reshape of locations and reshape of confidence
-    create_edge(softmax_conf_node, reshape_conf_node, 0, 0)
-    create_edge(reshape_loc_node, detection_output_node, 0, 0)
-    create_edge(reshape_conf_node, detection_output_node, 0, 1)
-    return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node,
-            'reshape_loc_node': reshape_loc_node}
-```
-The method has two inputs: the graph to operate on and the instance of `SubgraphMatch` object, which describes matched sub-graph. The latter class has several useful methods to get particular input/output node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below.
-
-**Key lines**:
-
-*	Lines 6 and 7 create new instance of operation of type `Softmax` and graph Node object corresponding to that operation.
-
-*	Lines 11-12 and 16-17 create new instance of operation of type `Reshape` to reshape locations and confidences tensors correspondingly.
-
-*	Lines 20-23 create new instance of operation `DetectionOutput` and graph Node object corresponding to that operation.
-
-*	Lines 27-29 connect `softmax` node with `reshape` node and connect two reshaped locations and confidences tensors with `DetectionOutput` node.
-
-*	Lines 30-31 define dictionary with aliases for detection output node, reshape locations and confidences nodes. These aliases are used in the `input_edges_match` and `output_edges_match` methods.
-
-The `input_edges_match` method is the following:
-```python
-def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
-    locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0]
-    conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0]
-    priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0]
-    # create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name"
-    # which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second
-    # zero defines 'new_port'.
-    return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id,
-            conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id,
-            priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2),
-            }
-```
-The method has three parameters: input `graph`, `match` object describing matched sub-graph and `new_sub_graph` dictionary with alias names returned from the `generate_sub_graph` method.
-
-**Key lines**:
-
-*	Lines 2-4 initialize Node objects and input ports for the 	nodes where the input tensors for the sub-graph are consumed. The method `match.input_nodes(ind)` returns list of tuples where the first element is a Node object and the second is the input port for this node which consumes the ind-th input tensor of the sub-graph. `input_points` list in the configuration file defines the order of input tensors to the sub-graph. For example, the `locs_consumer_node` object of type Node is a node that consumes tensor with locations in the port with number `locs_consumer_node_port`.
-
-*	Lines 8-11 define dictionary with the mapping of tensors as described above. Note that the attribute `id` of the Node object contains the name of the node in the graph.
-
-The `output_edges_match` method is the following:
-```python
-def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
-    # the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only
-    # one output edge match
-    return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}
-```
-
-The method has the same three parameters as `input_edges_match` method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is `detection_boxes` according to the configuration file) to a single output tensor of the created `DetectionOutput` node. In fact, it is possible to use any output node of the initial sub-graph in mapping, because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes).
-
-Now, the Model Optimizer knows how to replace the sub-graph. The last step to enable the model is to cut-off some parts of the graph not needed during inference.
-
-It is necessary to remove the `Preprocessor` block where image is resized. Inference Engine does not support dynamic input shapes, so the Model Optimizer must froze the input image size, and thus, resizing of the image is not necessary. This is achieved by replacer `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` which is executed automatically.
-
-There are several `Switch` operations in the `Postprocessor` block without output edges. For example:
-```sh
-Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t
-```
-```sh
-Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f
-```
-```sh
-Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t
-```
-```sh
-Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f
-```
-
-Model Optimizer marks these nodes as output nodes of the topology. Some parts of the `Posprocessor` blocks are not removed during sub-graph replacement because of that. In order to fix this issue, it is necessary to specify output nodes of the graph manually using the `--output` command line parameter.
-
-###Example Model Optimizer Command-Line for TensorFlow\* SSD
-
-The final command line to convert SSDs from the TensorFlow Object Detection API Zoo is:
-```shell
-./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/legacy_ssd_support.json --output="detection_boxes,detection_scores,num_detections"
-```
-
-## Converting MobileNet V2 model created with TensorFlow Object Detection API <a name="convert_mobilenet_v2"></a>
-The [MobileNet V2 model](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) differs from the previous version, so converting the model requires a new sub-graph replacement configuration file and new command line parameters. The major differences are:
-
-* The `Preprocessor` block has two outputs: the pre-processed image and the pre-processed image size.
-* The `Postprocessor` block has one more input (in comparison with models created with TensorFlow Object Detection API
-version 1.6 or lower): the pre-processed image size.
-* Some node names have been changed in the `Postprocessor` block.
-
-The updated sub-graph replacement configuration file `extensions/front/tf/ssd_v2_support.json` reflecting these changes
-is the following:
-
-```json
-[
-    {
-        "custom_attributes": {
-            "code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
-            "confidence_threshold": 0.01,
-            "keep_top_k": 200,
-            "nms_threshold": 0.6,
-            "pad_mode": "caffe.ResizeParameter.CONSTANT",
-            "resize_mode": "caffe.ResizeParameter.WARP"
-        },
-        "id": "TFObjectDetectionAPIDetectionOutput",
-        "include_inputs_to_sub_graph": true,
-        "include_outputs_to_sub_graph": true,
-        "instances": {
-            "end_points": [
-                "detection_boxes",
-                "detection_scores",
-                "num_detections"
-            ],
-            "start_points": [
-                "Postprocessor/Shape",
-                "Postprocessor/scale_logits",
-                "Postprocessor/ExpandDims",
-                "Postprocessor/Reshape_1",
-                "Postprocessor/ToFloat"
-            ]
-        },
-        "match_kind": "points"
-    },
-    {
-        "custom_attributes": {
-        },
-        "id": "PreprocessorReplacement",
-        "inputs": [
-            [
-                {
-                    "node": "map/Shape$",
-                    "port": 0
-                },
-                {
-                    "node": "map/TensorArrayUnstack/Shape$",
-                    "port": 0
-                },
-                {
-                    "node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
-                    "port": 2
-                }
-            ]
-        ],
-        "instances": [
-            ".*Preprocessor/"
-        ],
-        "match_kind": "scope",
-        "outputs": [
-            {
-                "node": "sub$",
-                "port": 0
-            },
-            {
-                "node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
-                "port": 0
-            }
-        ]
-    }
-]
-```
-
-### Example of Model Optimizer Command-Line for TensorFlow SSD MobileNet V2
-The final command line to convert MobileNet SSD V2 from the TensorFlow Object Detection Zoo is the following:
-
-```sh
-./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --output="detection_boxes,detection_scores,num_detections"
-```
--- a/docs/Optimization_notice.md
+++ b/docs/Optimization_notice.md
@@ -1,3 +0,0 @@
-# Optimization Notice {#openvino_docs_Optimization_notice}
-
-![Optimization_notice](img/opt-notice-en_080411.gif)
--- a/docs/benchmarks/performance_benchmarks.md
+++ b/docs/benchmarks/performance_benchmarks.md
@@ -19,133 +19,181 @@ Measuring inference performance involves many variables and is extremely use-cas
 <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-datalabels"></script>
 <script src="https://cdnjs.cloudflare.com/ajax/libs/chartjs-plugin-annotation/0.5.7/chartjs-plugin-annotation.min.js"></script> 
 <script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-barchart-background@1.3.0/build/Plugin.Barchart.Background.min.js"></script>
+<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-deferred@1"></script>
 <!-- download this file and place on your server (or include the styles inline) -->
 <link rel="stylesheet" href="ovgraphs.css" type="text/css">
 \endhtmlonly


 \htmlonly
-<script src="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-1-096.js" id="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-1-096"></script>
+<script src="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-2-185.js" id="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="deeplabv3-tf-ov-2021-1-096.js" id="deeplabv3-tf-ov-2021-1-096"></script>
+<script src="deeplabv3-tf-ov-2021-2-185.js" id="deeplabv3-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="densenet-121-tf-ov-2021-1-096.js" id="densenet-121-tf-ov-2021-1-096"></script>
+<script src="densenet-121-tf-ov-2021-2-185.js" id="densenet-121-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="faster-rcnn-resnet50-coco-tf-ov-2021-1-096.js" id="faster-rcnn-resnet50-coco-tf-ov-2021-1-096"></script>
+<script src="faster-rcnn-resnet50-coco-tf-ov-2021-2-185.js" id="faster-rcnn-resnet50-coco-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="googlenet-v1-tf-ov-2021-1-096.js" id="googlenet-v1-tf-ov-2021-1-096"></script>
+<script src="googlenet-v1-tf-ov-2021-2-185.js" id="googlenet-v1-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="inception-v3-tf-ov-2021-1-096.js" id="inception-v3-tf-ov-2021-1-096"></script>
+<script src="inception-v3-tf-ov-2021-2-185.js" id="inception-v3-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="mobilenet-ssd-cf-ov-2021-1-096.js" id="mobilenet-ssd-cf-ov-2021-1-096"></script>
+<script src="mobilenet-ssd-cf-ov-2021-2-185.js" id="mobilenet-ssd-cf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="mobilenet-v1-1-0-224-tf-ov-2021-1-096.js" id="mobilenet-v1-1-0-224-tf-ov-2021-1-096"></script>
+<script src="mobilenet-v1-1-0-224-tf-ov-2021-2-185.js" id="mobilenet-v1-1-0-224-tf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="mobilenet-v2-pytorch-ov-2021-1-096.js" id="mobilenet-v2-pytorch-ov-2021-1-096"></script>
+<script src="mobilenet-v2-pytorch-ov-2021-2-185.js" id="mobilenet-v2-pytorch-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="resnet-18-pytorch-ov-2021-1-096.js" id="resnet-18-pytorch-ov-2021-1-096"></script>
+<script src="resnet-18-pytorch-ov-2021-2-185.js" id="resnet-18-pytorch-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="resnet-50-tf-ov-2021-1-096.js" id="resnet-50-tf-ov-2021-1-096"></script>
+<script src="resnet-50-tf-ov-2021-2-185.js" id="resnet-50-tf-ov-2021-2-185"></script>
 \endhtmlonly


 \htmlonly
-<script src="se-resnext-50-cf-ov-2021-1-096.js" id="se-resnext-50-cf-ov-2021-1-096"></script>
+<script src="se-resnext-50-cf-ov-2021-2-185.js" id="se-resnext-50-cf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="squeezenet1-1-cf-ov-2021-1-096.js" id="squeezenet1-1-cf-ov-2021-1-096"></script>
+<script src="squeezenet1-1-cf-ov-2021-2-185.js" id="squeezenet1-1-cf-ov-2021-2-185"></script>
 \endhtmlonly


 \htmlonly
-<script src="ssd300-cf-ov-2021-1-096.js" id="ssd300-cf-ov-2021-1-096"></script>
+<script src="ssd300-cf-ov-2021-2-185.js" id="ssd300-cf-ov-2021-2-185"></script>
 \endhtmlonly

 \htmlonly
-<script src="yolo-v3-tf-ov-2021-1-096.js" id="yolo-v3-tf-ov-2021-1-096"></script>
+<script src="yolo-v3-tf-ov-2021-2-185.js" id="yolo-v3-tf-ov-2021-2-185"></script>
 \endhtmlonly


 ## Platform Configurations

-Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.1. 
+Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.2. 

-Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of September 25, 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. 
+Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of December 9, 2020 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure. 

-Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, see [Performance Benchmark Test Disclosure](https://www.intel.com/content/www/us/en/benchmarks/benchmark.html).
+Performance varies by use, configuration and other factors. Learn more at [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex).

 Your costs and results may vary. 

 © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

-Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. [Notice Revision #2010804](https://software.intel.com/articles/optimization-notice).
+Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.

 Testing by Intel done on: see test date for each HW platform below.

 **CPU Inference Engines**

-|                                 | Intel® Xeon® E-2124G  | Intel® Xeon® Silver 4216R    | Intel® Xeon® Gold 5218T      | Intel® Xeon® Platinum 8270   | 
-| ------------------------------- | ----------------------| ---------------------------- | ---------------------------- | ---------------------------- |
-| Motherboard                     | ASUS* WS C246 PRO     | Intel® Server Board S2600STB | Intel® Server Board S2600STB | Intel® Server Board S2600STB |
-| CPU                             | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz |
-| Hyper Threading                 | OFF                   | ON                           | ON                           | ON                           |
-| Turbo Setting                   | ON                    | ON                           | ON                           | ON                           |
-| Memory                          | 2 x 16 GB DDR4 2666MHz| 12 x 32 GB DDR4 2666MHz      | 12 x 32 GB DDR4 2666MHz      | 12 x 32 GB DDR4 2933MHz      |
-| Operating System                | Ubuntu* 18.04 LTS     | Ubuntu* 18.04 LTS            | Ubuntu* 18.04 LTS            | Ubuntu* 18.04 LTS            |
-| Kernel Version                  | 5.3.0-24-generic      | 5.3.0-24-generic             | 5.3.0-24-generic             | 5.3.0-24-generic             |
-| BIOS Vendor                     | American Megatrends Inc.* | Intel Corporation        | Intel Corporation            | Intel Corporation            |
-| BIOS Version                    | 0904                  | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230    |
-| BIOS Release                    | April 12, 2019        | September 28, 2019           | September 28, 2019           | September 28, 2019           |
-| BIOS Settings        | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>change power policy <br>to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit |
-| Batch size                      | 1                     | 1                            | 1                            | 1                            |
-| Precision                       | INT8                  | INT8                         | INT8                         | INT8                         |
-| Number of concurrent inference requests | 4             | 32                           | 32                           | 52                           |
-| Test Date                       |  September 25, 2020        |  September 25, 2020               |  September 25, 2020               |  September 25, 2020               |
-| Power dissipation, TDP in Watt  | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1)                    | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1)                          | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1)             | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1)                          |
-| CPU Price on September 29, 2020, USD<br>Prices may vary  | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html)     | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html)                 | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html)                        | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html)                        |
+|                                 | Intel® Xeon® E-2124G               | Intel® Xeon® W1290P                | Intel® Xeon® Silver 4216R               | 
+| ------------------------------- | ----------------------             | ---------------------------        | ----------------------------            |
+| Motherboard                     | ASUS* WS C246 PRO                  | ASUS* WS W480-ACE                  | Intel® Server Board S2600STB            |
+| CPU                             | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® W-1290P CPU @ 3.70GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz |
+| Hyper Threading                 | OFF                                | ON                                 | ON                                      |
+| Turbo Setting                   | ON                                 | ON                                 | ON                                      |
+| Memory                          | 2 x 16 GB DDR4 2666MHz             | 4 x 16 GB DDR4 @ 2666MHz           |12 x 32 GB DDR4 2666MHz                  | 
+| Operating System                | Ubuntu* 18.04 LTS                  | Ubuntu* 18.04 LTS                  | Ubuntu* 18.04 LTS                       |
+| Kernel Version                  | 5.3.0-24-generic                   | 5.3.0-24-generic                   | 5.3.0-24-generic                        | 
+| BIOS Vendor                     | American Megatrends Inc.*          | American Megatrends Inc.           | Intel Corporation                       |
+| BIOS Version                    | 0904                               | 607                                | SE5C620.86B.02.01.<br>0009.092820190230 |
+| BIOS Release                    | April 12, 2019                     | May 29, 2020                       | September 28, 2019                      |
+| BIOS Settings                   | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>change power policy <br>to "performance", <br>save & exit |
+| Batch size                      | 1                                  | 1                                  | 1                            
+| Precision                       | INT8                               | INT8                               | INT8                         
+| Number of concurrent inference requests | 4                          | 5                                  | 32                           
+| Test Date                       | December 9, 2020                   | December 9, 2020                   | December 9, 2020             
+| Power dissipation, TDP in Watt  | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1)                    | [125](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html)                          | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) |
+| CPU Price on September 29, 2020, USD<br>Prices may vary  | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html)     | [539](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html)     |[1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html)                 | 

 **CPU Inference Engines (continue)**

-|                      | Intel® Core™ i5-8500               | Intel® Core™ i7-8700T               | Intel® Core™ i9-10920X               | 11th Gen Intel® Core™ i5-1145G7E  |
-| -------------------- | ---------------------------------- | ----------------------------------- |--------------------------------------|-----------------------------------|
-| Motherboard          | ASUS* PRIME Z370-A                 | GIGABYTE* Z370M DS3H-CF             | ASUS* PRIME X299-A II                | Intel Corporation<br>internal/Reference Validation Platform |
-| CPU                  | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | 11th Gen Intel® Core™ i5-1145G7E @ 2.60GHz   |
-| Hyper Threading      | OFF                                | ON                                  | ON                                   | ON                                           |
-| Turbo Setting        | ON                                 | ON                                  | ON                                   | ON                                           |
-| Memory               | 2 x 16 GB DDR4 2666MHz             | 4 x 16 GB DDR4 2400MHz              | 4 x 16 GB DDR4 2666MHz               | 2 x 8 GB DDR4 3200MHz                        |
-| Operating System     | Ubuntu* 18.04 LTS                  | Ubuntu* 18.04 LTS                   | Ubuntu* 18.04 LTS                    | Ubuntu* 18.04 LTS                            |
-| Kernel Version       | 5.3.0-24-generic                   | 5.0.0-23-generic                    | 5.0.0-23-generic                     | 5.8.0-05-generic                             |
-| BIOS Vendor          | American Megatrends Inc.*          | American Megatrends Inc.*           | American Megatrends Inc.*            | Intel Corporation                            |
-| BIOS Version         | 2401                               | F11                                 | 505                                  | TGLIFUI1.R00.3243.A04.2006302148             |
-| BIOS Release         | July 12, 2019                      | March 13, 2019                      | December 17, 2019                    | June 30, 2020                                |
-| BIOS Settings        | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings | Default Settings |
-| Batch size           | 1                                  | 1                                   | 1                                    | 1                                            |
-| Precision            | INT8                               | INT8                                | INT8                                 | INT8                                         |
-| Number of concurrent inference requests | 3               | 4                                   | 24                                   | 4                                            |
-| Test Date            | September 25, 2020                 | September 25, 2020                  | September 25, 2020                   | September 25, 2020                           |
-| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1)                                 | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [28](https://ark.intel.com/content/www/us/en/ark/products/208081/intel-core-i5-1145g7e-processor-8m-cache-up-to-4-10-ghz.html) |
-| CPU Price on September 29, 2020, USD<br>Prices may vary  | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html)                               | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html)                                 | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [309](https://mysamples.intel.com/SAM_U_Product/ProductDetail.aspx?InputMMID=99A3D1&RequestID=0&ProductID=1213750) |
+|                                 | Intel® Xeon® Gold 5218T                 | Intel® Xeon® Platinum 8270               | 
+| ------------------------------- | ----------------------------            | ----------------------------             |
+| Motherboard                     | Intel® Server Board S2600STB            | Intel® Server Board S2600STB             |
+| CPU                             | Intel® Xeon® Gold 5218T CPU @ 2.10GHz   | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz |
+| Hyper Threading                 | ON                                      | ON                                       |
+| Turbo Setting                   | ON                                      | ON                                       |
+| Memory                          | 12 x 32 GB DDR4 2666MHz                 | 12 x 32 GB DDR4 2933MHz                  |
+| Operating System                | Ubuntu* 18.04 LTS                       | Ubuntu* 18.04 LTS                        |
+| Kernel Version                  | 5.3.0-24-generic                        | 5.3.0-24-generic                         |
+| BIOS Vendor                     | Intel Corporation                       | Intel Corporation                        |
+| BIOS Version                    | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230  |
+| BIOS Release                    | September 28, 2019                      | September 28, 2019                       |
+| BIOS Settings                   | Select optimized default settings, <br>change power policy to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit |
+| Batch size                      | 1                                       | 1                                        |
+| Precision                       | INT8                                    | INT8                                     |
+| Number of concurrent inference requests |32                               | 52                                       |
+| Test Date                       | December 9, 2020                        | December 9, 2020                         |
+| Power dissipation, TDP in Watt  | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1)             | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1)                          |
+| CPU Price on September 29, 2020, USD<br>Prices may vary  | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html)                        | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html)                        |
+
+
+**CPU Inference Engines (continue)**
+
+|                      | Intel® Core™ i7-8700T               | Intel® Core™ i9-10920X               | Intel® Core™ i9-10900TE<br>(iEi Flex BX210AI)| 11th Gen Intel® Core™ i7-1185G7 |
+| -------------------- | ----------------------------------- |--------------------------------------| ---------------------------------------------|---------------------------------|
+| Motherboard          | GIGABYTE* Z370M DS3H-CF             | ASUS* PRIME X299-A II                | iEi / B595                                   | Intel Corporation<br>internal/Reference<br>Validation Platform |
+| CPU                  | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | Intel® Core™ i9-10900TE CPU @ 1.80GHz        | 11th Gen Intel® Core™ i7-1185G7 @ 3.00GHz |
+| Hyper Threading      | ON                                  | ON                                   | ON                                           | ON                                        |
+| Turbo Setting        | ON                                  | ON                                   | ON                                           | ON                                        |
+| Memory               | 4 x 16 GB DDR4 2400MHz              | 4 x 16 GB DDR4 2666MHz               | 2 x 8 GB DDR4 @ 2400MHz                      | 2 x 8 GB DDR4 3200MHz                     |
+| Operating System     | Ubuntu* 18.04 LTS                   | Ubuntu* 18.04 LTS                    | Ubuntu* 18.04 LTS                            | Ubuntu* 18.04 LTS                         |
+| Kernel Version       | 5.3.0-24-generic                    | 5.3.0-24-generic                     | 5.8.0-05-generic                             | 5.8.0-05-generic                          |
+| BIOS Vendor          | American Megatrends Inc.*           | American Megatrends Inc.*            | American Megatrends Inc.*                    | Intel Corporation                         |
+| BIOS Version         | F11                                 | 505                                  | Z667AR10                                     | TGLSFWI1.R00.3425.<br>A00.2010162309      |
+| BIOS Release         | March 13, 2019                      | December 17, 2019                    | July 15, 2020                                | October 16, 2020                          |
+| BIOS Settings        | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings | Default Settings      | Default Settings                          |
+| Batch size           | 1                                   | 1                                    | 1                                            | 1                                         |
+| Precision            | INT8                                | INT8                                 | INT8                                         | INT8                                      |
+| Number of concurrent inference requests |4                 | 24                                   | 5                                            | 4                                         |
+| Test Date            | December 9, 2020                    | December 9, 2020                     | December 9, 2020                             | December 9, 2020                          |
+| Power dissipation, TDP in Watt                             | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [35](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i9-10900te-processor-20m-cache-up-to-4-60-ghz.html)  | [28](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html#tab-blade-1-0-1) |
+| CPU Price on September 29, 2020, USD<br>Prices may vary    | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html)                | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [444](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i9-10900te-processor-20m-cache-up-to-4-60-ghz.html) | [426](https://ark.intel.com/content/www/us/en/ark/products/208664/intel-core-i7-1185g7-processor-12m-cache-up-to-4-80-ghz-with-ipu.html#tab-blade-1-0-0)             |
+
+
+**CPU Inference Engines (continue)**
+
+|                      | Intel® Core™ i5-8500               | Intel® Core™ i5-10500TE               | Intel® Core™ i5-10500TE<br>(iEi Flex-BX210AI)|
+| -------------------- | ---------------------------------- | -----------------------------------   |-------------------------------------- |
+| Motherboard          | ASUS* PRIME Z370-A                 | GIGABYTE* Z490 AORUS PRO AX           | iEi / B595                            |
+| CPU                  | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i5-10500TE CPU @ 2.30GHz | Intel® Core™ i5-10500TE CPU @ 2.30GHz |
+| Hyper Threading      | OFF                                | ON                                    | ON                                    |
+| Turbo Setting        | ON                                 | ON                                    | ON                                    |
+| Memory               | 2 x 16 GB DDR4 2666MHz             | 2 x 16 GB DDR4 @ 2666MHz              | 1 x 8 GB DDR4 @ 2400MHz               |
+| Operating System     | Ubuntu* 18.04 LTS                  | Ubuntu* 18.04 LTS                     | Ubuntu* 18.04 LTS                     |
+| Kernel Version       | 5.3.0-24-generic                   | 5.3.0-24-generic                      | 5.3.0-24-generic                      |
+| BIOS Vendor          | American Megatrends Inc.*          | American Megatrends Inc.*             | American Megatrends Inc.*             |
+| BIOS Version         | 2401                               | F3                                    | Z667AR10                              |
+| BIOS Release         | July 12, 2019                      | March 25, 2020                        | July 17, 2020                         |
+| BIOS Settings        | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings |
+| Batch size           | 1                                  | 1                                     | 1                                     |
+| Precision            | INT8                               | INT8                                  | INT8                                  |
+| Number of concurrent inference requests | 3               | 4                                     | 4                                    |
+| Test Date            | December 9, 2020                   | December 9, 2020                      | December 9, 2020                      |
+| Power dissipation, TDP in Watt                            | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1)| [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html)  | [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) |
+| CPU Price on September 29, 2020, USD<br>Prices may vary   | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html)               | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) |
+

 **CPU Inference Engines (continue)**

@@ -165,7 +213,7 @@ Testing by Intel done on: see test date for each HW platform below.
 | Batch size           | 1                                     | 1                                  |
 | Precision            | INT8                                  | INT8                               |
 | Number of concurrent inference requests | 4                  | 4                                  |
-| Test Date            | September 25, 2020                         | September 25, 2020                      |
+| Test Date            | December 9, 2020                         | December 9, 2020                      |
 | Power dissipation, TDP in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html)                                                              | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)|
 | CPU Price on September 29, 2020, USD<br>Prices may vary  | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html)                                                        | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html)       |

@@ -173,7 +221,7 @@ Testing by Intel done on: see test date for each HW platform below.

 **Accelerator Inference Engines**

-|                                         | Intel® Neural Compute Stick 2         | Intel® Vision Accelerator Design<br>with Intel® Movidius™ VPUs (Uzel* UI-AR8) | 
+|                                         | Intel® Neural Compute Stick 2         | Intel® Vision Accelerator Design<br>with Intel® Movidius™ VPUs (Mustang-V100-MX8) | 
 | --------------------------------------- | ------------------------------------- | ------------------------------------- |
 | VPU                                     | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 |
 | Connection                              | USB 2.0/3.0                           | PCIe X4                               |
@@ -181,7 +229,7 @@ Testing by Intel done on: see test date for each HW platform below.
 | Precision                               | FP16                                  | FP16                                  |
 | Number of concurrent inference requests | 4                                     | 32                                    |
 | Power dissipation, TDP in Watt          | 2.5                                   | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D)          |
-| CPU Price, USD<br>Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from September 29, 2020) | [768](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) (from May 15, 2020)                           |
+| CPU Price, USD<br>Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from December 9, 2020) | [214](https://www.arrow.com/en/products/mustang-v100-mx8-r10/iei-technology?gclid=Cj0KCQiA5bz-BRD-ARIsABjT4ng1v1apmxz3BVCPA-tdIsOwbEjTtqnmp_rQJGMfJ6Q2xTq6ADtf9OYaAhMUEALw_wcB) (from December 9, 2020)                           |
 | Host Computer                           | Intel® Core™ i7                       | Intel® Core™ i5                       |
 | Motherboard                             | ASUS* Z370-A II                       | Uzelinfo* / US-E1300                  |
 | CPU                                     | Intel® Core™ i7-8700 CPU @ 3.20GHz    | Intel® Core™ i5-6600 CPU @ 3.30GHz    |
@@ -193,9 +241,9 @@ Testing by Intel done on: see test date for each HW platform below.
 | BIOS Vendor                             | American Megatrends Inc.*             | American Megatrends Inc.*             |
 | BIOS Version                            | 411                                   | 5.12                                  |
 | BIOS Release                            | September 21, 2018                    | September 21, 2018                    |
-| Test Date                               | September 25, 2020                    | September 25, 2020                    |        
+| Test Date                               | December 9, 2020                      | December 9, 2020                      |        

-Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2021.1.html)
+Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2021.2.html)

 \htmlonly
 <style>
@@ -206,7 +254,7 @@ Please follow this link for more detailed configuration descriptions: [Configura
 <div class="opt-notice-wrapper">
 <p class="opt-notice">
 \endhtmlonly
-For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md).
+Results may vary. For workloads and configurations visit: [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex) and [Legal Information](../Legal_Information.md).
 \htmlonly
 </p>
 </div>
--- a/docs/benchmarks/performance_benchmarks_faq.md
+++ b/docs/benchmarks/performance_benchmarks_faq.md
@@ -49,7 +49,7 @@ Intel partners with various vendors all over the world. Visit the [Intel® AI: I
 We published a set of guidelines and recommendations to optimize your models available in an [introductory](../IE_DG/Intro_to_Performance.md) guide and an [advanced](../optimization_guide/dldt_optimization_guide.md) guide. For further support, please join the conversation in the [Community Forum](https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit).

 #### 9. Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support?
-The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](./performance_int8_vs_fp32.html) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270.
+The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](performance_int8_vs_fp32.md) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270.

 #### 10. Previous releases included benchmarks on googlenet-v1-CF (Caffe). Why is there no longer benchmarks on this neural network model?
 We replaced googlenet-v1-CF to resnet-18-pytorch due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1-CF. Both googlenet-v1-CF and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases.
--- a/docs/benchmarks/performance_int8_vs_fp32.md
+++ b/docs/benchmarks/performance_int8_vs_fp32.md
@@ -9,40 +9,36 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <th>Intel® Core™ <br>i7-8700T</th>
    <th>Intel® Xeon® <br>Gold <br>5218T</th>
    <th>Intel® Xeon® <br>Platinum <br>8270</th>
-    <th>Intel® Core™ <br>i7-1065G7</th>
-    <th>Intel® Core™ <br>i5-1145G7E</th>
+    <th>Intel® Core™ <br>i7-1185G7</th>
  </tr>
  <tr align="left">
    <th>OpenVINO <br>benchmark <br>model name</th>
    <th>Dataset</th>
-    <th colspan="4" align="center">Throughput speed-up FP16-INT8 vs FP32</th>
+    <th colspan="3" align="center">Throughput speed-up FP16-INT8 vs FP32</th>
  </tr>
  <tr>
    <td>bert-large-<br>uncased-whole-word-<br>masking-squad-0001</td>
    <td>SQuAD</td>
    <td>1.6</td>
-    <td>2.5</td>
+    <td>2.7</td>
    <td>2.0</td>
-    <td>N/A</td>
-    <td>2.8</td>
+    <td>2.6</td>
  </tr>
  <tr>
    <td>brain-tumor-<br>segmentation-<br>0001-MXNET</td>
    <td>BraTS</td>
    <td>1.5</td>
-    <td>1.7</td>
-    <td>1.6</td>
    <td>1.9</td>
+    <td>1.7</td>
    <td>1.8</td>
  </tr>
  <tr>
    <td>deeplabv3-TF</td>
    <td>VOC 2012<br>Segmentation</td>
-    <td>1.4</td>
+    <td>1.5</td>
    <td>2.4</td>
-    <td>2.6</td>
    <td>2.8</td>
-    <td>2.9</td>
+    <td>3.1</td>
  </tr>
  <tr>
    <td>densenet-121-TF</td>
@@ -50,7 +46,6 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>1.6</td>
    <td>3.2</td>
    <td>3.2</td>
-    <td>3.0</td>
    <td>3.2</td>
  </tr>
  <tr>
@@ -59,17 +54,15 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>2.0</td>
    <td>3.6</td>
    <td>3.5</td>
-    <td>3.2</td>
-    <td>3.5</td>
+    <td>3.4</td>
  </tr>
  <tr>
    <td>faster_rcnn_<br>resnet50_coco-TF</td>
    <td>MS COCO</td>
    <td>1.7</td>
-    <td>3.5</td>
    <td>3.4</td>
-    <td>3.6</td>
-    <td>3.6</td>
+    <td>3.4</td>
+    <td>3.4</td>
  </tr>
  <tr>
    <td>googlenet-v1-TF</td>
@@ -78,7 +71,6 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>3.6</td>
    <td>3.7</td>
    <td>3.5</td>
-    <td>3.6</td>
  </tr>
  <tr>
    <td>inception-v3-TF</td>
@@ -86,43 +78,38 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>1.8</td>
    <td>3.8</td>
    <td>4.0</td>
-    <td>3.7</td>
-    <td>3.7</td>
+    <td>3.5</td>
  </tr>
  <tr>
    <td>mobilenet-<br>ssd-CF</td>
    <td>VOC2012</td>
    <td>1.5</td>
-    <td>3.0</td>
-    <td>3.3</td>
    <td>3.1</td>
-    <td>3.3</td>
+    <td>3.6</td>
+    <td>3.1</td>
  </tr>
  <tr>
    <td>mobilenet-v1-1.0-<br>224-TF</td>
    <td>ImageNet</td>
    <td>1.5</td>
    <td>3.2</td>
-    <td>3.9</td>
-    <td>2.9</td>
-    <td>3.2</td>
+    <td>4.1</td>
+    <td>3.1</td>
  </tr>
  <tr>
    <td>mobilenet-v2-1.0-<br>224-TF</td>
    <td>ImageNet</td>
    <td>1.3</td>
    <td>2.7</td>
-    <td>3.8</td>
-    <td>2.2</td>
+    <td>4.3</td>
    <td>2.5</td>
  </tr>
  <tr>
    <td>mobilenet-v2-<br>pytorch</td>
    <td>ImageNet</td>
    <td>1.4</td>
-    <td>2.6</td>
-    <td>3.6</td>
-    <td>2.3</td>
+    <td>2.8</td>
+    <td>4.6</td>
    <td>2.4</td>
  </tr>
  <tr>
@@ -132,61 +119,54 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>3.7</td>
    <td>3.8</td>
    <td>3.6</td>
-    <td>3.6</td>
  </tr>
  <tr>
    <td>resnet-50-<br>pytorch</td>
    <td>ImageNet</td>
    <td>1.8</td>
    <td>3.6</td>
-    <td>3.8</td>
-    <td>3.5</td>
-    <td>3.6</td>
+    <td>3.9</td>
+    <td>3.4</td>
  </tr>
  <tr>
    <td>resnet-50-<br>TF</td>
    <td>ImageNet</td>
    <td>1.8</td>
-    <td>3.5</td>
-    <td>3.8</td>
+    <td>3.6</td>
+    <td>3.9</td>
    <td>3.4</td>
-    <td>4.0</td>
  </tr>
  <tr>
    <td>squeezenet1.1-<br>CF</td>
    <td>ImageNet</td>
    <td>1.6</td>
    <td>2.9</td>
-    <td>3.2</td>
-    <td>3.0</td>
+    <td>3.4</td>
    <td>3.2</td>
  </tr>
  <tr>
    <td>ssd_mobilenet_<br>v1_coco-tf</td>
    <td>VOC2012</td>
    <td>1.6</td>
-    <td>3.0</td>
-    <td>3.4</td>
    <td>3.1</td>
-    <td>3.3</td>
+    <td>3.7</td>
+    <td>3.0</td>
  </tr>
  <tr>
    <td>ssd300-CF</td>
    <td>MS COCO</td>
    <td>1.8</td>
    <td>3.7</td>
-    <td>3.6</td>
+    <td>3.7</td>
    <td>3.8</td>
-    <td>4.0</td>
  </tr>
  <tr>
    <td>ssdlite_<br>mobilenet_<br>v2-TF</td>
    <td>MS COCO</td>
    <td>1.4</td>
    <td>2.3</td>
-    <td>3.1</td>
-    <td>2.4</td>
-    <td>2.6</td>
+    <td>3.9</td>
+    <td>2.5</td>
  </tr>
  <tr>
    <td>yolo_v3-TF</td>
@@ -194,8 +174,7 @@ The table below illustrates the speed-up factor for the performance gain by swit
    <td>1.8</td>
    <td>3.8</td>
    <td>3.9</td>
-    <td>3.7</td>
-    <td>3.8</td>
+    <td>3.6</td>
  </tr>
 </table>

@@ -208,21 +187,14 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <th></th>
    <th>Intel® Core™ <br>i9-10920X CPU<br>@ 3.50GHZ (VNNI)</th>
    <th>Intel® Core™ <br>i9-9820X CPU<br>@ 3.30GHz (AVX512)</th>
-    <th>Intel® Core™ <br>i7-8700 CPU<br>@ 3.20GHz (AVX2)</th>
+    <th>Intel® Core™ <br>i7-6700 CPU<br>@ 4.0GHz (AVX2)</th>
+    <th>Intel® Core™ <br>i7-1185G7 CPU<br>@ 4.0GHz (TGL VNNI)</th>
  </tr>
  <tr align="left">
    <th>OpenVINO Benchmark <br>Model Name</th>
    <th>Dataset</th>
    <th>Metric Name</th>
-    <th colspan="3" align="center">Absolute Accuracy Drop, %</th>
-  </tr>
-  <tr>
-    <td>bert-large-<br>uncased-whole-word-<br>masking-squad-0001</td>
-    <td>SQuAD</td>
-    <td>F1</td>
-    <td>0.65</td>
-    <td>0.57</td>
-    <td>0.83</td>
+    <th colspan="4" align="center">Absolute Accuracy Drop, %</th>
  </tr>
  <tr>
    <td>brain-tumor-<br>segmentation-<br>0001-MXNET</td>
@@ -230,23 +202,26 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>Dice-index@ <br>Mean@ <br>Overall Tumor</td>
    <td>0.08</td>
    <td>0.08</td>
-    <td>0.9</td>
+    <td>0.08</td>
+    <td>0.08</td>
  </tr>
  <tr>
    <td>deeplabv3-TF</td>
    <td>VOC 2012<br>Segmentation</td>
    <td>mean_iou</td>
    <td>0.73</td>
+    <td>1.10</td>
+    <td>1.10</td>
    <td>0.73</td>
-    <td>1.11</td>
  </tr>
  <tr>
    <td>densenet-121-TF</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
-    <td>0.74</td>
-    <td>0.74</td>
-    <td>0.76</td>
+    <td>0.73</td>
+    <td>0.72</td>
+    <td>0.72</td>
+    <td>0.73</td>
  </tr>
  <tr>
    <td>facenet-<br>20180408-<br>102900-TF</td>
@@ -255,22 +230,25 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.02</td>
    <td>0.02</td>
    <td>0.02</td>
+    <td>0.47</td>
  </tr>
  <tr>
    <td>faster_rcnn_<br>resnet50_coco-TF</td>
    <td>MS COCO</td>
    <td>coco_<br>precision</td>
    <td>0.21</td>
-    <td>0.21</td>
    <td>0.20</td>
+    <td>0.20</td>
+    <td>0.21</td>
  </tr>
  <tr>
    <td>googlenet-v1-TF</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
    <td>0.03</td>
-    <td>0.03</td>
    <td>0.01</td>
+    <td>0.01</td>
+    <td>0.03</td>
  </tr>
  <tr>
    <td>inception-v3-TF</td>
@@ -279,6 +257,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.03</td>
    <td>0.01</td>
    <td>0.01</td>
+    <td>0.03</td>
  </tr>
  <tr>
    <td>mobilenet-<br>ssd-CF</td>
@@ -287,6 +266,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.35</td>
    <td>0.34</td>
    <td>0.34</td>
+    <td>0.35</td>
  </tr>
  <tr>
    <td>mobilenet-v1-1.0-<br>224-TF</td>
@@ -295,22 +275,25 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.27</td>
    <td>0.20</td>
    <td>0.20</td>
+    <td>0.27</td>
  </tr>
  <tr>
    <td>mobilenet-v2-1.0-<br>224-TF</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
-    <td>0.45</td>
-    <td>0.94</td>
-    <td>0.94</td>
+    <td>0.44</td>
+    <td>0.92</td>
+    <td>0.92</td>
+    <td>0.44</td>
  </tr>
  <tr>
    <td>mobilenet-v2-<br>PYTORCH</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
-    <td>0.35</td>
-    <td>0.63</td>
-    <td>0.63</td>
+    <td>0.25</td>
+    <td>7.42</td>
+    <td>7.42</td>
+    <td>0.25</td>
  </tr>
  <tr>
    <td>resnet-18-<br>pytorch</td>
@@ -319,6 +302,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.26</td>
    <td>0.25</td>
    <td>0.25</td>
+    <td>0.26</td>
  </tr>
  <tr>
    <td>resnet-50-<br>PYTORCH</td>
@@ -327,58 +311,65 @@ The following table shows the absolute accuracy drop that is calculated as the d
    <td>0.18</td>
    <td>0.19</td>
    <td>0.19</td>
+    <td>0.18</td>
  </tr>
  <tr>
    <td>resnet-50-<br>TF</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
    <td>0.15</td>
+    <td>0.11</td>
+    <td>0.11</td>
    <td>0.15</td>
-    <td>0.10</td>
  </tr>
  <tr>
    <td>squeezenet1.1-<br>CF</td>
    <td>ImageNet</td>
    <td>acc@top-1</td>
    <td>0.66</td>
-    <td>0.66</td>
    <td>0.64</td>
+    <td>0.64</td>
+    <td>0.66</td>
  </tr>
  <tr>
    <td>ssd_mobilenet_<br>v1_coco-tf</td>
    <td>VOC2012</td>
    <td>COCO mAp</td>
    <td>0.24</td>
-    <td>0.24</td>
    <td>3.07</td>
+    <td>3.07</td>
+    <td>0.24</td>
  </tr>
  <tr>
    <td>ssd300-CF</td>
    <td>MS COCO</td>
    <td>COCO mAp</td>
    <td>0.06</td>
-    <td>0.06</td>
    <td>0.05</td>
+    <td>0.05</td>
+    <td>0.06</td>
  </tr>
  <tr>
    <td>ssdlite_<br>mobilenet_<br>v2-TF</td>
    <td>MS COCO</td>
    <td>COCO mAp</td>
    <td>0.14</td>
+    <td>0.43</td>
+    <td>0.43</td>
    <td>0.14</td>
-    <td>0.47</td>
  </tr>
  <tr>
    <td>yolo_v3-TF</td>
    <td>MS COCO</td>
    <td>COCO mAp</td>
-    <td>0.20</td>
-    <td>0.20</td>
-    <td>0.36</td>
+    <td>0.12</td>
+    <td>0.35</td>
+    <td>0.35</td>
+    <td>0.12</td>
  </tr>
 </table>

-![INT8 vs FP32 Comparison](img/int8vsfp32.png "INT8 vs FP32 Comparison on Select Networks and Platforms")
+![INT8 vs FP32 Comparison](img/int8vsfp32.png)

 \htmlonly
 <style>
--- a/docs/doxygen/ie_c_api.config
+++ b/docs/doxygen/ie_c_api.config
@@ -2,16 +2,17 @@

 EXCLUDE_SYMBOLS        = INFERENCE_ENGINE_C_API_EXTERN \
                         INFERENCE_ENGINE_C_API \
+                         INFERENCE_ENGINE_C_API_CALLBACK \
                         IE_NODISCARD

 PREDEFINED             = "__attribute__(x)=" \
                         "__VA_ARGS__=" \
                         "INFERENCE_ENGINE_C_API_EXTERN=" \
+                         "INFERENCE_ENGINE_C_API_CALLBACK=" \
                         "INFERENCE_ENGINE_C_API=" \
                         "IE_NODISCARD=" \
                         "__cdecl=" \
                         "__declspec(x)=" \
-                         "__GNUC__=" \
                         "_WIN32"

 FILE_PATTERNS          = *.h
--- a/docs/doxygen/ie_docs.config
+++ b/docs/doxygen/ie_docs.config
@@ -903,8 +903,8 @@ EXCLUDE_PATTERNS       = */temp/* \
 # exclude all test directories use the pattern */test/*

 EXCLUDE_SYMBOLS        = InferenceEngine::details \
+                         InferenceEngine::gpu::details \
                         PRECISION_NAME \
-                         TBLOB_TOP_RESULT \
                         CASE \
                         CASE2 \
                         _CONFIG_KEY \
@@ -929,24 +929,26 @@ EXCLUDE_SYMBOLS        = InferenceEngine::details \
                         INFERENCE_ENGINE_API_CPP \
                         INFERENCE_ENGINE_API_CLASS \
                         INFERENCE_ENGINE_DEPRECATED \
-                         INFERENCE_ENGINE_NN_BUILDER_API_CLASS \
-                         INFERENCE_ENGINE_NN_BUILDER_DEPRECATED \
                         IE_SUPPRESS_DEPRECATED_START \
                         IE_SUPPRESS_DEPRECATED_END \
                         IE_SUPPRESS_DEPRECATED_START_WIN \
                         IE_SUPPRESS_DEPRECATED_END_WIN \
                         IE_SUPPRESS_DEPRECATED_END_WIN \
                         INFERENCE_ENGINE_INTERNAL \
-                         INFERENCE_ENGINE_INTERNAL_CNNLAYER_CLASS \
                         IE_DO_PRAGMA \
-                         REG_VALIDATOR_FOR
+                         parallel_* \
+                         for_* \
+                         splitter \
+                         InferenceEngine::parallel_* \
+                         NOMINMAX \
+                         TBB_PREVIEW_NUMA_SUPPORT \
+                         IE_THREAD_*

 # The EXAMPLE_PATH tag can be used to specify one or more files or directories
 # that contain example code fragments that are included (see the \include
 # command).

-EXAMPLE_PATH           = template_extension \
-                         ../inference-engine/samples
+EXAMPLE_PATH           = "@CMAKE_CURRENT_SOURCE_DIR@"

 # If the value of the EXAMPLE_PATH tag contains directories, you can use the
 # EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and
--- a/docs/doxygen/ie_docs.xml
+++ b/docs/doxygen/ie_docs.xml
@@ -37,17 +37,15 @@
                        </tab>
                        <tab type="user" title="Model Optimizations Techniques" url="@ref openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques"/>
                        <tab type="user" title="Cutting off Parts of a Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model"/>
-                        <tab type="usergroup" title="Sub-graph Replacement in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer">
-                            <tab type="user" title="[DEPRECATED] Case-Study: Converting SSD models created with the TensorFlow* Object Detection API" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_SSD_ObjectDetection_API"/>
-                            <tab type="user" title="[DEPRECATED] Case-Study: Converting Faster R-CNN models created with the TensorFlow* Object Detection API" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_Faster_RCNN_ObjectDetection_API"/>
-                        </tab>
+                        <tab type="user" title="Sub-graph Replacement in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer"/>
                        <tab type="user" title="Supported Framework Layers" url="@ref openvino_docs_MO_DG_prepare_model_Supported_Frameworks_Layers"/>
                        <tab type="user" title="[DEPRECATED] IR Notation Reference" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Legacy_IR_Layers_Catalog_Spec"/>
                        <tab type="user" title="IR suitable for INT8 inference" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_IR_suitable_for_INT8_inference"/>
                    </tab>
-                    <tab type="usergroup" title="Custom Layers in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer">
+                    <tab type="usergroup" title="Model Optimizer Extensibility" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer">
                        <tab type="user" title="Extending Model Optimizer with New Primitives" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives"/>
-                        <tab type="user" title="Extending MXNet Model Optimizer with New Primitives" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives"/>
+                        <tab type="user" title="Extending Model Optimizer with Caffe* Python Layers" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers"/>
+                        <tab type="user" title="Extending Model Optimizer for Custom MXNet* Operations" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives"/>
                        <tab type="user" title="Legacy Mode for Caffe* Custom Layers" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Legacy_Mode_for_Caffe_Custom_Layers"/>
                        <tab type="user" title="[DEPRECATED] Offloading Sub-Graph Inference" url="https://docs.openvinotoolkit.org/2020.1/_docs_MO_DG_prepare_model_customize_model_optimizer_Offloading_Sub_Graph_Inference.html"/>
                    </tab>
@@ -57,10 +55,10 @@
            </tab>
            <!-- Model Downloader -->
            <tab id="model_downloader" type="user" title="Model Downloader" url="@ref omz_tools_downloader_README"/>
-            <!-- Custom Layers Guide -->
-            <tab type="usergroup" title="Custom Layers Guide" url="@ref openvino_docs_HOWTO_Custom_Layers_Guide"></tab>
-        </tab>    
-        
+            <!-- Custom Operations Guide -->
+            <tab type="usergroup" title="Custom Operations Guide" url="@ref openvino_docs_HOWTO_Custom_Layers_Guide"></tab>
+        </tab>
+
        <!-- Intermediate Representation and Operations Sets -->
        <tab id="intermediate_representaton_and_operations_sets" type="usergroup" title="Intermediate Representation and Operations Sets" url="@ref openvino_docs_MO_DG_IR_and_opsets">
            <tab type="usergroup" title="Available Operations Sets" url="@ref openvino_docs_ops_opset">
@@ -81,6 +79,7 @@
                <tab type="user" title="Atan-1" url="@ref openvino_docs_ops_arithmetic_Atan_1"/>
                <tab type="user" title="Atanh-3" url="@ref openvino_docs_ops_arithmetic_Atanh_3"/>
                <tab type="user" title="AvgPool-1" url="@ref openvino_docs_ops_pooling_AvgPool_1"/>
+                <tab type="user" title="BatchNormInference-1" url="@ref openvino_docs_ops_normalization_BatchNormInference_1"/>
                <tab type="user" title="BatchNormInference-5" url="@ref openvino_docs_ops_normalization_BatchNormInference_5"/>
                <tab type="user" title="BatchToSpace-2" url="@ref openvino_docs_ops_movement_BatchToSpace_2"/>
                <tab type="user" title="BinaryConvolution-1" url="@ref openvino_docs_ops_convolution_BinaryConvolution_1"/>
@@ -261,6 +260,7 @@
                <tab type="usergroup" title="Utilities to Validate Your Converted Model" url="@ref openvino_inference_engine_tools_cross_check_tool_README">
                    <tab type="user" title="Using Cross Check Tool for Per-Layer Comparison Between Plugins" url="@ref openvino_inference_engine_tools_cross_check_tool_README"/>
                </tab>
+                <tab type="user" title="Introduction to OpenVINO state API" url="@ref openvino_docs_IE_DG_network_state_intro"/>
                <tab type="usergroup" title="Supported Devices" url="@ref openvino_docs_IE_DG_supported_plugins_Supported_Devices">
                    <tab type="usergroup" title="GPU Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_CL_DNN">
                        <tab type="user" title="RemoteBlob API of GPU Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_GPU_RemoteBlob_API"/>
@@ -276,7 +276,6 @@
                    <tab type="user" title="GNA Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_GNA"/>
                </tab>
                <tab type="user" title="Known Issues" url="@ref openvino_docs_IE_DG_Known_Issues_Limitations"/>
-                <tab type="user" title="Optimization Notice" url="@ref openvino_docs_Optimization_notice"/>
                <tab type="user" title="Glossary" url="@ref openvino_docs_IE_DG_Glossary"/>
            </tab>

@@ -296,7 +295,7 @@

            <!-- Compile Tool -->
            <tab type="user" title="Compile Tool" url="@ref openvino_inference_engine_tools_compile_tool_README"/>
-            
+
            <!-- API References -->
            <tab id="api_references" type="usergroup" title="API References">
                <!-- IE C -->
--- a/docs/doxygen/ie_plugin_api.config
+++ b/docs/doxygen/ie_plugin_api.config
@@ -9,7 +9,12 @@ GENERATE_TAGFILE       = "@DOCS_BINARY_DIR@/ie_plugin_api.tag"
 EXTRACT_LOCAL_CLASSES  = NO

 INPUT                  = "@DOCS_BINARY_DIR@/docs/IE_PLUGIN_DG" \
-                         "@IE_SOURCE_DIR@/src/plugin_api"
+                         "@IE_SOURCE_DIR@/src/plugin_api" \
+                         "@IE_SOURCE_DIR@/src/transformations/include" \
+                         "@OpenVINO_MAIN_SOURCE_DIR@/openvino/itt/include/openvino"
+
+
+RECURSIVE              = YES

 FILE_PATTERNS          = *.c \
                         *.cpp \
@@ -18,21 +23,20 @@ FILE_PATTERNS          = *.c \
                         *.hpp \
                         *.md

-EXCLUDE_PATTERNS       = cnn_network_ngraph_impl.hpp \
-                         ie_imemory_state_internal.hpp \
-                         ie_memory_state_internal.hpp \
-                         ie_memory_state_base.hpp \
-                         convert_function_to_cnn_network.hpp \
-                         generic_ie.hpp
+EXCLUDE_PATTERNS       = generic_ie.hpp

-EXCLUDE_SYMBOLS        =
+EXCLUDE_SYMBOLS        = InferenceEngine::details
+
+TAGFILES               = @DOCS_BINARY_DIR@/ie_api.tag=.."

 EXAMPLE_PATH           = "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/src" \
                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/include" \
                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/src/CMakeLists.txt" \
-                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/"
-                         CMakeLists.txt \
-                         "@CMAKE_CURRENT_SOURCE_DIR@/examples"
+                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/CMakeLists.txt" \
+                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/transformations" \
+                         "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/shared_tests_instances/" \
+                         "@CMAKE_CURRENT_SOURCE_DIR@/snippets"
+                         "@IE_SOURCE_DIR@/tests/functional/plugin/shared/include" \

 EXAMPLE_PATTERNS       = *.cpp \
                         *.hpp
@@ -41,12 +45,17 @@ ENUM_VALUES_PER_LINE   = 1

 EXPAND_ONLY_PREDEF     = YES

-PREDEFINED             = INFERENCE_ENGINE_API \
-                         INFERENCE_ENGINE_API_CPP \
-                         INFERENCE_ENGINE_API_CLASS \
-                         INFERENCE_ENGINE_DEPRECATED \
-                         IE_SUPPRESS_DEPRECATED_START \
-                         IE_SUPPRESS_DEPRECATED_END \
-                         IE_SUPPRESS_DEPRECATED_START_WIN \
-                         IE_SUPPRESS_DEPRECATED_END_WIN \
-                         IE_THREAD=IE_THREAD_TBB
+PREDEFINED             = "INFERENCE_ENGINE_API=" \
+                         "INFERENCE_ENGINE_API_CPP=" \
+                         "INFERENCE_ENGINE_API_CLASS=" \
+                         "INFERENCE_ENGINE_DEPRECATED=" \
+                         "inference_engine_transformations_EXPORTS" \
+                         "TRANSFORMATIONS_API=" \
+                         "NGRAPH_HELPER_DLL_EXPORT=" \
+                         "NGRAPH_HELPER_DLL_IMPORT=" \
+                         "IE_SUPPRESS_DEPRECATED_START=" \
+                         "IE_SUPPRESS_DEPRECATED_END=" \
+                         "IE_SUPPRESS_DEPRECATED_START_WIN=" \
+                         "IE_SUPPRESS_DEPRECATED_END_WIN=" \
+                         "IE_THREAD=IE_THREAD_TBB" \
+                         "NGRAPH_RTTI_DECLARATION="
--- a/docs/doxygen/ie_plugin_api.xml
+++ b/docs/doxygen/ie_plugin_api.xml
@@ -16,8 +16,10 @@
        </tab>
        <!-- API References -->
        <tab type="usergroup" title="API REFERENCE">
-            <!-- IE Developer Package -->
-            <tab type="modules" visible="yes" title="Inference Engine Plugin API Reference"/>
+            <!-- IE Plugin API Reference -->
+            <tab type="user" url="group__ie__dev__api.html" visible="yes" title="Inference Engine Plugin API Reference"/>
+            <!-- IE Transformations API Reference -->
+            <tab type="user" url="group__ie__transformation__api.html" visible="yes" title="Inference Engine Transformations API Reference"/>
        </tab>
        <tab type="usergroup" title="MAIN OPENVINO™ DOCS" url="../index.html"/>
    </navindex>
--- a/docs/doxygen/ngraph_py_api.xml
+++ b/docs/doxygen/ngraph_py_api.xml
@@ -19,10 +19,7 @@
      <tab type="user" title="DL Streamer API Reference" url="https://openvinotoolkit.github.io/dlstreamer_gst/"/>
      <tab type="user" title="nGraph С++ API Reference" url="../ngraph_cpp_api/annotated.html"/>
      <!-- nGraph Python API Reference -->
-      <tab type="files" visible="yes" title="nGraph Python API Reference">
-        <tab type="filelist" visible="yes" title="nGraph Python API Reference" intro=""/>
-        <tab type="globals" visible="yes" title="" intro=""/>
-      </tab>
+      <tab type="filelist" visible="yes" title="nGraph Python API Reference" intro=""/>
    </tab>
    <!-- Chinese docs -->
    <tab type="user" title="中文文件" url="https://docs.openvinotoolkit.org/cn/index.html"/>
--- a/docs/doxygen/openvino_docs.xml
+++ b/docs/doxygen/openvino_docs.xml
@@ -6,7 +6,7 @@
        <!-- GET STARTED category -->
        <tab type="usergroup" title="GET STARTED" url="index.html">
            <!-- Install Directly -->
-            <tab type="usergroup" title="Install Directly" url=""><!--automatically generated-->
+            <tab type="usergroup" title="Installation Guides" url=""><!--automatically generated-->
                <tab type="usergroup" title="Linux" url="@ref openvino_docs_install_guides_installing_openvino_linux">
                    <tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Linux* OS" url="@ref openvino_docs_install_guides_installing_openvino_linux"/>
                    <tab type="user" title="[DEPRECATED] Install Intel® Distribution of OpenVINO™ toolkit for Linux with FPGA Support" url="@ref openvino_docs_install_guides_installing_openvino_linux_fpga"/>
@@ -19,17 +19,18 @@
                <tab type="user" title="Raspbian OS" url="@ref openvino_docs_install_guides_installing_openvino_raspbian"/>
            </tab> 
            <!-- Install From Images and Repositories -->  
-            <tab type="usergroup" title="Install From Images and Repositories" url=""><!--automatically generated-->
+            <tab type="usergroup" title="Install From Images and Repositories" url="@ref openvino_docs_install_guides_installing_openvino_images">
                <tab type="usergroup" title="Docker" url="@ref openvino_docs_install_guides_installing_openvino_docker_linux">
                    <tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Linux* from a Docker* Image" url="@ref openvino_docs_install_guides_installing_openvino_docker_linux"/>
                    <tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Windows* from a Docker* Image" url="@ref openvino_docs_install_guides_installing_openvino_docker_windows"/>
                </tab>
+                <tab type="user" title="Docker with DL Workbench" url="./workbench_docs_Workbench_DG_Install_from_Docker_Hub.html"/><!-- Link to the original Workbench topic -->
                <tab type="user" title="APT" url="@ref openvino_docs_install_guides_installing_openvino_apt"/>
                <tab type="user" title="YUM" url="@ref openvino_docs_install_guides_installing_openvino_yum"/>
                <tab type="user" title="Anaconda Cloud" url="@ref openvino_docs_install_guides_installing_openvino_conda"/>
                <tab type="user" title="Yocto" url="@ref openvino_docs_install_guides_installing_openvino_yocto"/>
                <tab type="user" title="PyPI" url="@ref openvino_docs_install_guides_installing_openvino_pip"/>
-                <tab type="user" title="Build from Source" url="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md"/>
+                <tab type="user" title="Build from Source" url="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode"/>
            </tab>
            <!-- Get Started Guides-->
            <tab type="usergroup" title="Get Started Guides" url=""><!--automatically generated-->
@@ -37,7 +38,10 @@
                <tab type="user" title="Linux" url="@ref openvino_docs_get_started_get_started_linux"/>
                <tab type="user" title="Windows" url="@ref openvino_docs_get_started_get_started_windows"/>
                <tab type="user" title="macOS" url="@ref openvino_docs_get_started_get_started_macos"/>
+                <tab type="user" title="Raspbian" url="@ref openvino_docs_get_started_get_started_raspbian"/>
+                <tab type="user" title="Get Started with OpenVINO via DL Workbench" url="@ref openvino_docs_get_started_get_started_dl_workbench"/>
                <tab type="user" title="Legal Information" url="@ref openvino_docs_Legal_Information"/>
+                <tab type="user" title="Introduction to DL Workbench" url="./openvino_docs_get_started_get_started_dl_workbench.html"/><!-- Link to the original Workbench topic -->
            </tab>
            <!-- Configuration for Hardware -->
            <tab type="usergroup" title="Configuration for Hardware" url=""><!--automatically generated-->
@@ -56,6 +60,7 @@
                <tab type="user" title="Introduction" url="@ref openvino_docs_security_guide_introduction"/>
                <tab type="user" title="Using DL Workbench Securely" url="@ref openvino_docs_security_guide_workbench"/>
                <tab type="user" title="Using Encrypted Models" url="@ref openvino_docs_IE_DG_protecting_model_guide"/>
+                <tab type="user" title="Security Add-on" url="@ref ovsa_get_started"/>
            </tab>
        </tab>

@@ -72,7 +77,7 @@
                <!-- Performance Benchmarks -->
                <tab type="usergroup" title="Performance Measures" url="@ref openvino_docs_performance_benchmarks">
                    <tab type="user" title="Performance Information Frequently Asked Questions" url="@ref openvino_docs_performance_benchmarks_faq"/>
-                    <tab type="user" title="Download Performance Data Spreadsheet in MS Excel* Format" url="https://docs.openvinotoolkit.org/downloads/benchmark_files/OV-2021.1-Download-Excel.xlsx"/>
+                    <tab type="user" title="Download Performance Data Spreadsheet in MS Excel* Format" url="https://docs.openvinotoolkit.org/downloads/benchmark_files/OV-2021.2-Download-Excel.xlsx"/>
                    <tab type="user" title="INT8 vs. FP32 Comparison on Select Networks and Platforms" url="@ref openvino_docs_performance_int8_vs_fp32"/>
                </tab>
                <tab type="user" title="Performance Optimization Guide" url="@ref openvino_docs_optimization_guide_dldt_optimization_guide"/>
@@ -91,6 +96,19 @@
                <tab type="user" title="DL Streamer API Reference" url="https://openvinotoolkit.github.io/dlstreamer_gst/"/>
                <!-- DL Streamer Examples -->
                <tab type="usergroup" title="DL Streamer Examples" url="@ref gst_samples_README">
+                </tab>
+                <!-- G-API -->
+                <tab type="usergroup" title="Graph API (G-API) Developer Guide" url="@ref openvino_docs_gapi_gapi_intro">
+                    <tab type="user" title="Introduction to G-API" url="@ref openvino_docs_gapi_gapi_intro"/>
+                    <tab type="user" title="G-API Kernel API" url="@ref openvino_docs_gapi_kernel_api"/>
+                    <tab type="user" title="Use Cases: Implementing a Face Beautification Algorithm" url="@ref openvino_docs_gapi_face_beautification"/>
+                    <tab type="user" title="Use Cases: Building a Face Analytics Pipeline" url="@ref openvino_docs_gapi_gapi_face_analytics_pipeline"/>
+                    <tab type="usergroup" title="API Reference" url="">
+                        <tab type="user" title="G-API Core functionality" url="https://docs.opencv.org/4.2.0/df/d1f/group__gapi__core.html"/>
+                        <tab type="user" title="G-API Image processing functionality" url="https://docs.opencv.org/4.2.0/d2/d00/group__gapi__imgproc.html"/>
+                        <tab type="user" title="G-API Drawing and composition functionality" url="https://docs.opencv.org/4.2.0/df/de4/group__gapi__draw.html"/>
+                        <tab type="user" title="G-API Framework" url="https://docs.opencv.org/4.2.0/d7/d0d/group__gapi.html"/>
+                    </tab>
                </tab>    
                <!-- OpenVX -->
                <tab type="user" title="OpenVX Developer Guide" url="https://software.intel.com/en-us/openvino-ovx-guide"/>
@@ -120,14 +138,15 @@
                <tab type="user" title="Hello Query Device C++ Sample" url="@ref openvino_inference_engine_samples_hello_query_device_README"/>
                <tab type="user" title="Hello Query Device Python* Sample" url="@ref openvino_inference_engine_ie_bridges_python_sample_hello_query_device_README"/>
                <tab type="user" title="nGraph Function C++ Sample" url="@ref openvino_inference_engine_samples_ngraph_function_creation_sample_README"/>
+                <tab type="user" title="nGraph Function Python Sample" url="@ref openvino_inference_engine_ie_bridges_python_samples_ngraph_function_creation_sample_README"/>
                <tab type="user" title="Object Detection C++ Sample SSD" url="@ref openvino_inference_engine_samples_object_detection_sample_ssd_README"/>
                <tab type="user" title="Object Detection Python* Sample SSD" url="@ref openvino_inference_engine_ie_bridges_python_sample_object_detection_sample_ssd_README"/>
                <tab type="user" title="Object Detection C Sample SSD" url="@ref openvino_inference_engine_ie_bridges_c_samples_object_detection_sample_ssd_README"/>
                <tab type="user" title="Automatic Speech Recognition C++ Sample" url="@ref openvino_inference_engine_samples_speech_sample_README"/>
                <tab type="user" title="Neural Style Transfer C++ Sample" url="@ref openvino_inference_engine_samples_style_transfer_sample_README"/>
                <tab type="user" title="Neural Style Transfer Python* Sample" url="@ref openvino_inference_engine_ie_bridges_python_sample_style_transfer_sample_README"/>
-                <tab type="user" title="Benchmark C++ App" url="@ref openvino_inference_engine_samples_benchmark_app_README"/>
-                <tab type="user" title="Benchmark Python* App" url="@ref openvino_inference_engine_tools_benchmark_tool_README"/>
+                <tab type="user" title="Benchmark C++ Tool" url="@ref openvino_inference_engine_samples_benchmark_app_README"/>
+                <tab type="user" title="Benchmark Python* Tool" url="@ref openvino_inference_engine_tools_benchmark_tool_README"/>
            </tab>

            <!-- DL Streamer Examples -->
@@ -146,7 +165,8 @@
                <tab type="user" title="Benchmark Sample" url="@ref gst_samples_benchmark_README"/>
            </tab>
            <tab type="usergroup" title="Add-Ons" url="">
-              <tab type="user" title="Model Server" url="@ref openvino_docs_ovms"/>  
+              <tab type="user" title="Model Server" url="@ref openvino_docs_ovms"/>
+              <tab type="user" title="Security Add-on" url="./ovsa_get_started.html"/>  
            </tab>
        </tab>

--- a/docs/gapi/face_beautification.md
+++ b/docs/gapi/face_beautification.md
@@ -0,0 +1,435 @@
+# Implementing a Face Beautification Algorithm {#openvino_docs_gapi_face_beautification}
+
+## Introduction
+In this tutorial you will learn:
+
+* Basics of a sample face beautification algorithm;
+* How to infer different networks inside a pipeline with G-API;
+* How to run a G-API pipeline on a video stream.
+
+## Prerequisites
+This sample requires:
+
+* PC with GNU/Linux* or Microsoft Windows* (Apple macOS* is supported but was not tested)
+* OpenCV 4.2 or higher built with [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) (building with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is a plus)
+* The following pre-trained models from the [Open Model Zoo](@ref omz_models_intel_index)
+      * [face-detection-adas-0001](@ref omz_models_intel_face_detection_adas_0001_description_face_detection_adas_0001)
+      * [facial-landmarks-35-adas-0002](@ref omz_models_intel_facial_landmarks_35_adas_0002_description_facial_landmarks_35_adas_0002)
+
+To download the models from the Open Model Zoo, use the [Model Downloader](@ref omz_tools_downloader_README) tool.
+
+## Face Beautification Algorithm
+We will implement a simple face beautification algorithm using a combination of modern Deep Learning techniques and traditional Computer Vision. The general idea behind the algorithm is to make face skin smoother while preserving face features like eyes or a mouth contrast. The algorithm identifies parts of the face using a DNN inference, applies different filters to the parts found, and then combines it into the final result using basic image arithmetics:
+
+![Face Beautification Algorithm](../img/gapi_face_beautification_algorithm.png)
+
+Briefly the algorithm is described as follows:
+
+Briefly the algorithm is described as follows:
+- Input image \f$I\f$ is passed to unsharp mask and bilateral filters
+  (\f$U\f$ and \f$L\f$ respectively);
+- Input image \f$I\f$ is passed to an SSD-based face detector;
+- SSD result (a \f$[1 \times 1 \times 200 \times 7]\f$ blob) is parsed and converted to an array of faces;
+- Every face is passed to a landmarks detector;
+- Based on landmarks found for every face, three image masks are generated:
+  - A background mask \f$b\f$ -- indicating which areas from the original image to keep as-is;
+  - A face part mask \f$p\f$ -- identifying regions to preserve (sharpen).
+  - A face skin mask \f$s\f$ -- identifying regions to blur;
+- The final result \f$O\f$ is a composition of features above calculated as \f$O = b*I + p*U + s*L\f$.
+
+Generating face element masks based on a limited set of features (just 35 per face, including all its parts) is not very trivial and is described in the sections below.
+
+## Constructing a G-API Pipeline
+
+### Declare Deep Learning Topologies
+This sample is using two DNN detectors. Every network takes one input and produces one output. In G-API, networks are defined with macro G_API_NET():
+```cpp
+G_API_NET(FaceDetector,  <cv::GMat(cv::GMat)>, "face_detector");
+G_API_NET(LandmDetector, <cv::GMat(cv::GMat)>, "landm_detector");
+```
+To get more information, see Declaring Deep Learning topologies described in the "Face Analytics pipeline" tutorial.
+
+### Describe the Processing Graph
+The code below generates a graph for the algorithm above:
+```cpp
+cv::GComputation pipeline([=]()
+{
+    cv::GMat  gimgIn;                                                                           // input
+    cv::GMat  faceOut  = cv::gapi::infer<custom::FaceDetector>(gimgIn);
+    GArrayROI garRects = custom::GFacePostProc::on(faceOut, gimgIn, config::kConfThresh);       // post-proc
+    cv::GArray<cv::GMat> landmOut  = cv::gapi::infer<custom::LandmDetector>(garRects, gimgIn);
+    cv::GArray<Landmarks> garElems;                                                             // |
+    cv::GArray<Contour>   garJaws;                                                              // |output arrays
+    std::tie(garElems, garJaws)    = custom::GLandmPostProc::on(landmOut, garRects);            // post-proc
+    cv::GArray<Contour> garElsConts;                                                            // face elements
+    cv::GArray<Contour> garFaceConts;                                                           // whole faces
+    std::tie(garElsConts, garFaceConts) = custom::GGetContours::on(garElems, garJaws);          // interpolation
+    cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
+    cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
+                                                      config::kGSigma);                         // |
+    cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
+    cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
+                                                      config::kGSigma);                         // |draw masks
+    // The first argument in mask() is Blur as we want to subtract from                         // |
+    // BlurG the next step:                                                                     // |
+    cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
+    cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
+    cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
+    cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
+    cv::GMat gimgBilat       = custom::GBilatFilter::on(gimgIn, config::kBSize,
+                                                        config::kBSigmaCol, config::kBSigmaSp);
+    cv::GMat gimgSharp       = custom::unsharpMask(gimgIn, config::kUnshSigma,
+                                                   config::kUnshStrength);
+    // Applying the masks
+    // Custom function mask3C() should be used instead of just gapi::mask()
+    //  as mask() provides CV_8UC1 source only (and we have CV_8U3C)
+    cv::GMat gimgBilatMasked = custom::mask3C(gimgBilat, mskBlurFinal);
+    cv::GMat gimgSharpMasked = custom::mask3C(gimgSharp, mskSharpG);
+    cv::GMat gimgInMasked    = custom::mask3C(gimgIn,    mskNoFaces);
+    cv::GMat gimgBeautif = gimgBilatMasked + gimgSharpMasked + gimgInMasked;
+    return cv::GComputation(cv::GIn(gimgIn), cv::GOut(gimgBeautif,
+                                                      cv::gapi::copy(gimgIn),
+                                                      garFaceConts,
+                                                      garElsConts,
+                                                      garRects));
+});
+```
+The resulting graph is a mixture of G-API's standard operations, user-defined operations (namespace custom::), and DNN inference. The generic function `cv::gapi::infer<>()` allows you to trigger inference within the pipeline; networks to infer are specified as template parameters. The sample code is using two versions of `cv::gapi::infer<>()`:
+
+* A frame-oriented one is used to detect faces on the input frame.
+* An ROI-list oriented one is used to run landmarks inference on a list of faces – this version produces an array of landmarks per every face.
+More on this in "Face Analytics pipeline" ([Building a GComputation](@ref gapi_ifd_gcomputation) section).
+
+### Unsharp mask in G-API
+The unsharp mask \f$U\f$ for image \f$I\f$ is defined as:
+
+\f[U = I - s * L(M(I)),\f]
+
+where \f$M()\f$ is a median filter, \f$L()\f$ is the Laplace operator, and \f$s\f$ is a strength coefficient. While G-API doesn't provide this function out-of-the-box, it is expressed naturally with the existing G-API operations:
+
+```cpp
+inline cv::GMat custom::unsharpMask(const cv::GMat &src,
+                                    const int       sigma,
+                                    const float     strength)
+{
+    cv::GMat blurred   = cv::gapi::medianBlur(src, sigma);
+    cv::GMat laplacian = custom::GLaplacian::on(blurred, CV_8U);
+    return (src - (laplacian * strength));
+}
+```
+Note that the code snipped above is a regular C++ function defined with G-API types. Users can write functions like this to simplify graph construction; when called, this function just puts the relevant nodes to the pipeline it is used in.
+
+## Custom Operations
+The face beautification graph is using custom operations extensively. This chapter focuses on the most interesting kernels, refer to G-API Kernel API for general information on defining operations and implementing kernels in G-API.
+
+### Face detector post-processing
+A face detector output is converted to an array of faces with the following kernel:
+
+```cpp
+using VectorROI = std::vector<cv::Rect>;
+GAPI_OCV_KERNEL(GCPUFacePostProc, GFacePostProc)
+{
+    static void run(const cv::Mat   &inDetectResult,
+                    const cv::Mat   &inFrame,
+                    const float      faceConfThreshold,
+                          VectorROI &outFaces)
+    {
+        const int kObjectSize  = 7;
+        const int imgCols = inFrame.size().width;
+        const int imgRows = inFrame.size().height;
+        const cv::Rect borders({0, 0}, inFrame.size());
+        outFaces.clear();
+        const int    numOfDetections = inDetectResult.size[2];
+        const float *data            = inDetectResult.ptr<float>();
+        for (int i = 0; i < numOfDetections; i++)
+        {
+            const float faceId         = data[i * kObjectSize + 0];
+            if (faceId < 0.f)  // indicates the end of detections
+            {
+                break;
+            }
+            const float faceConfidence = data[i * kObjectSize + 2];
+            // We can cut detections by the `conf` field
+            //  to avoid mistakes of the detector.
+            if (faceConfidence > faceConfThreshold)
+            {
+                const float left   = data[i * kObjectSize + 3];
+                const float top    = data[i * kObjectSize + 4];
+                const float right  = data[i * kObjectSize + 5];
+                const float bottom = data[i * kObjectSize + 6];
+                // These are normalized coordinates and are between 0 and 1;
+                //  to get the real pixel coordinates we should multiply it by
+                //  the image sizes respectively to the directions:
+                cv::Point tl(toIntRounded(left   * imgCols),
+                             toIntRounded(top    * imgRows));
+                cv::Point br(toIntRounded(right  * imgCols),
+                             toIntRounded(bottom * imgRows));
+                outFaces.push_back(cv::Rect(tl, br) & borders);
+            }
+        }
+    }
+};
+```
+
+### Facial Landmarks Post-Processing
+The algorithm infers locations of face elements (like the eyes, the mouth and the head contour itself) using a generic facial landmarks detector (details) from OpenVINO™ Open Model Zoo. However, the detected landmarks as-is are not enough to generate masks — this operation requires regions of interest on the face represented by closed contours, so some interpolation is applied to get them. This landmarks processing and interpolation is performed by the following kernel:
+```cpp
+GAPI_OCV_KERNEL(GCPUGetContours, GGetContours)
+{
+    static void run(const std::vector<Landmarks> &vctPtsFaceElems,  // 18 landmarks of the facial elements
+                    const std::vector<Contour>   &vctCntJaw,        // 17 landmarks of a jaw
+                          std::vector<Contour>   &vctElemsContours,
+                          std::vector<Contour>   &vctFaceContours)
+    {
+        size_t numFaces = vctCntJaw.size();
+        CV_Assert(numFaces == vctPtsFaceElems.size());
+        CV_Assert(vctElemsContours.size() == 0ul);
+        CV_Assert(vctFaceContours.size()  == 0ul);
+        // vctFaceElemsContours will store all the face elements' contours found
+        //  in an input image, namely 4 elements (two eyes, nose, mouth) for every detected face:
+        vctElemsContours.reserve(numFaces * 4);
+        // vctFaceElemsContours will store all the faces' contours found in an input image:
+        vctFaceContours.reserve(numFaces);
+        Contour cntFace, cntLeftEye, cntRightEye, cntNose, cntMouth;
+        cntNose.reserve(4);
+        for (size_t i = 0ul; i < numFaces; i++)
+        {
+            // The face elements contours
+            // A left eye:
+            // Approximating the lower eye contour by half-ellipse (using eye points) and storing in cntLeftEye:
+            cntLeftEye = getEyeEllipse(vctPtsFaceElems[i][1], vctPtsFaceElems[i][0]);
+            // Pushing the left eyebrow clock-wise:
+            cntLeftEye.insert(cntLeftEye.end(), {vctPtsFaceElems[i][12], vctPtsFaceElems[i][13],
+                                                 vctPtsFaceElems[i][14]});
+            // A right eye:
+            // Approximating the lower eye contour by half-ellipse (using eye points) and storing in vctRightEye:
+            cntRightEye = getEyeEllipse(vctPtsFaceElems[i][2], vctPtsFaceElems[i][3]);
+            // Pushing the right eyebrow clock-wise:
+            cntRightEye.insert(cntRightEye.end(), {vctPtsFaceElems[i][15], vctPtsFaceElems[i][16],
+                                                   vctPtsFaceElems[i][17]});
+            // A nose:
+            // Storing the nose points clock-wise
+            cntNose.clear();
+            cntNose.insert(cntNose.end(), {vctPtsFaceElems[i][4], vctPtsFaceElems[i][7],
+                                           vctPtsFaceElems[i][5], vctPtsFaceElems[i][6]});
+            // A mouth:
+            // Approximating the mouth contour by two half-ellipses (using mouth points) and storing in vctMouth:
+            cntMouth = getPatchedEllipse(vctPtsFaceElems[i][8], vctPtsFaceElems[i][9],
+                                         vctPtsFaceElems[i][10], vctPtsFaceElems[i][11]);
+            // Storing all the elements in a vector:
+            vctElemsContours.insert(vctElemsContours.end(), {cntLeftEye, cntRightEye, cntNose, cntMouth});
+            // The face contour:
+            // Approximating the forehead contour by half-ellipse (using jaw points) and storing in vctFace:
+            cntFace = getForeheadEllipse(vctCntJaw[i][0], vctCntJaw[i][16], vctCntJaw[i][8]);
+            // The ellipse is drawn clock-wise, but jaw contour points goes vice versa, so it's necessary to push
+            //  cntJaw from the end to the begin using a reverse iterator:
+            std::copy(vctCntJaw[i].crbegin(), vctCntJaw[i].crend(), std::back_inserter(cntFace));
+            // Storing the face contour in another vector:
+            vctFaceContours.push_back(cntFace);
+        }
+    }
+};
+```
+The kernel takes two arrays of denormalized landmarks coordinates and returns an array of elements' closed contours and an array of faces' closed contours; in other words, outputs are, the first, an array of contours of image areas to be sharpened and, the second, another one to be smoothed.
+
+Here and below `Contour` is a vector of points.
+
+#### Get an Eye Contour
+Eye contours are estimated with the following function:
+```cpp
+inline int custom::getLineInclinationAngleDegrees(const cv::Point &ptLeft, const cv::Point &ptRight)
+{
+    const cv::Point residual = ptRight - ptLeft;
+    if (residual.y == 0 && residual.x == 0)
+        return 0;
+    else
+        return toIntRounded(atan2(toDouble(residual.y), toDouble(residual.x)) * 180.0 / CV_PI);
+}
+inline Contour custom::getEyeEllipse(const cv::Point &ptLeft, const cv::Point &ptRight)
+{
+    Contour cntEyeBottom;
+    const cv::Point ptEyeCenter((ptRight + ptLeft) / 2);
+    const int angle = getLineInclinationAngleDegrees(ptLeft, ptRight);
+    const int axisX = toIntRounded(cv::norm(ptRight - ptLeft) / 2.0);
+    // According to research, in average a Y axis of an eye is approximately
+    //  1/3 of an X one.
+    const int axisY = axisX / 3;
+    // We need the lower part of an ellipse:
+    static constexpr int kAngEyeStart = 0;
+    static constexpr int kAngEyeEnd   = 180;
+    cv::ellipse2Poly(ptEyeCenter, cv::Size(axisX, axisY), angle, kAngEyeStart, kAngEyeEnd, config::kAngDelta,
+                     cntEyeBottom);
+    return cntEyeBottom;
+}
+```
+Briefly, this function restores the bottom side of an eye by a half-ellipse based on two points in left and right eye corners. In fact, `cv::ellipse2Poly()` is used to approximate the eye region, and the function only defines ellipse parameters based on just two points: 
+- The ellipse center and the \f$X\f$ half-axis calculated by two eye Points.
+- The \f$Y\f$ half-axis calculated according to the assumption that an average eye width is \f$1/3\f$ of its length.
+- The start and the end angles which are 0 and 180 (refer to `cv::ellipse()` documentation).
+- The angle delta: how much points to produce in the contour.
+- The inclination angle of the axes.
+
+The use of the `atan2()` instead of just `atan()` in function `custom::getLineInclinationAngleDegrees()` is essential as it allows to return a negative value depending on the `x` and the `y` signs so we can get the right angle even in case of upside-down face arrangement (if we put the points in the right order, of course).
+
+#### Get a Forehead Contour
+The function approximates the forehead contour:
+```cpp
+inline Contour custom::getForeheadEllipse(const cv::Point &ptJawLeft,
+                                          const cv::Point &ptJawRight,
+                                          const cv::Point &ptJawLower)
+{
+    Contour cntForehead;
+    // The point amid the top two points of a jaw:
+    const cv::Point ptFaceCenter((ptJawLeft + ptJawRight) / 2);
+    // This will be the center of the ellipse.
+    // The angle between the jaw and the vertical:
+    const int angFace = getLineInclinationAngleDegrees(ptJawLeft, ptJawRight);
+    // This will be the inclination of the ellipse
+    // Counting the half-axis of the ellipse:
+    const double jawWidth  = cv::norm(ptJawLeft - ptJawRight);
+    // A forehead width equals the jaw width, and we need a half-axis:
+    const int axisX        = toIntRounded(jawWidth / 2.0);
+    const double jawHeight = cv::norm(ptFaceCenter - ptJawLower);
+    // According to research, in average a forehead is approximately 2/3 of
+    //  a jaw:
+    const int axisY        = toIntRounded(jawHeight * 2 / 3.0);
+    // We need the upper part of an ellipse:
+    static constexpr int kAngForeheadStart = 180;
+    static constexpr int kAngForeheadEnd   = 360;
+    cv::ellipse2Poly(ptFaceCenter, cv::Size(axisX, axisY), angFace, kAngForeheadStart, kAngForeheadEnd,
+                     config::kAngDelta, cntForehead);
+    return cntForehead;
+}
+```
+As we have only jaw points in our detected landmarks, we have to get a half-ellipse based on three points of a jaw: the leftmost, the rightmost and the lowest one. The jaw width is assumed to be equal to the forehead width and the latter is calculated using the left and the right points. Speaking of the \f$Y\f$ axis, we have no points to get it directly, and instead assume that the forehead height is about \f$2/3\f$ of the jaw height, which can be figured out from the face center (the middle between the left and right points) and the lowest jaw point.
+
+### Draw Masks
+When we have all the contours needed, you are able to draw masks:
+
+```cpp
+cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
+cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
+                                                  config::kGSigma);                         // |
+cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
+cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
+                                                  config::kGSigma);                         // |draw masks
+// The first argument in mask() is Blur as we want to subtract from                         // |
+// BlurG the next step:                                                                     // |
+cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
+cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
+cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
+cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
+```
+
+The steps to get the masks are:
+* the "sharp" mask calculation:
+    * fill the contours that should be sharpened;
+    * blur that to get the "sharp" mask (`mskSharpG`);
+* the "bilateral" mask calculation:
+    * fill all the face contours fully;
+    * blur that;
+    * subtract areas which intersect with the "sharp" mask --- and get the "bilateral" mask (`mskBlurFinal`);
+* the background mask calculation:
+    * add two previous masks
+    * set all non-zero pixels of the result as 255 (by `cv::gapi::threshold()`)
+    * revert the output (by `cv::gapi::bitwise_not`) to get the background mask (`mskNoFaces`).
+
+## Configuring and Running the Pipeline
+Once the graph is fully expressed, we can finally compile it and run on real data. G-API graph compilation is the stage where the G-API framework actually understands which kernels and networks to use. This configuration happens via G-API compilation arguments.
+
+### DNN Parameters
+This sample is using OpenVINO™ Toolkit Inference Engine backend for DL inference, which is configured the following way:
+```cpp
+auto faceParams  = cv::gapi::ie::Params<custom::FaceDetector>
+{
+    /*std::string*/ faceXmlPath,
+    /*std::string*/ faceBinPath,
+    /*std::string*/ faceDevice
+};
+auto landmParams = cv::gapi::ie::Params<custom::LandmDetector>
+{
+    /*std::string*/ landmXmlPath,
+    /*std::string*/ landmBinPath,
+    /*std::string*/ landmDevice
+};
+```
+Every `cv::gapi::ie::Params<>` object is related to the network specified in its template argument. We should pass there the network type we have defined in `G_API_NET()` in the early beginning of the tutorial.
+
+Network parameters are then wrapped in `cv::gapi::NetworkPackage`:
+```cpp
+auto networks      = cv::gapi::networks(faceParams, landmParams);
+```
+
+More details in "Face Analytics Pipeline" ([Configuring the Pipeline](@ref gapi_ifd_configuration) section).
+
+### Kernel Packages
+In this example we use a lot of custom kernels, in addition to that we use Fluid backend to optimize out memory for G-API's standard kernels where applicable. The resulting kernel package is formed like this:
+```cpp
+auto customKernels = cv::gapi::kernels<custom::GCPUBilateralFilter,
+                                       custom::GCPULaplacian,
+                                       custom::GCPUFillPolyGContours,
+                                       custom::GCPUPolyLines,
+                                       custom::GCPURectangle,
+                                       custom::GCPUFacePostProc,
+                                       custom::GCPULandmPostProc,
+                                       custom::GCPUGetContours>();
+auto kernels       = cv::gapi::combine(cv::gapi::core::fluid::kernels(),
+                                           customKernels);
+```
+
+### Compiling the Streaming Pipeline
+G-API optimizes execution for video streams when compiled in the "Streaming" mode.
+
+```cpp
+cv::GStreamingCompiled stream = pipeline.compileStreaming(cv::compile_args(kernels, networks));
+```
+More on this in "Face Analytics Pipeline" ([Configuring the pipeline](@ref gapi_ifd_configuration) section).
+
+### Running the streaming pipeline
+In order to run the G-API streaming pipeline, all we need is to specify the input video source, call `cv::GStreamingCompiled::start()`, and then fetch the pipeline processing results:
+```cpp
+if (parser.has("input"))
+{
+    stream.setSource(cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(parser.get<cv::String>("input")));
+}
+    auto out_vector = cv::gout(imgBeautif, imgShow, vctFaceConts,
+                               vctElsConts, vctRects);
+    stream.start();
+    avg.start();
+    while (stream.running())
+    {
+        if (!stream.try_pull(std::move(out_vector)))
+        {
+            // Use a try_pull() to obtain data.
+            // If there's no data, let UI refresh (and handle keypress)
+            if (cv::waitKey(1) >= 0) break;
+            else continue;
+        }
+        frames++;
+        // Drawing face boxes and landmarks if necessary:
+        if (flgLandmarks == true)
+        {
+            cv::polylines(imgShow, vctFaceConts, config::kClosedLine,
+                          config::kClrYellow);
+            cv::polylines(imgShow, vctElsConts, config::kClosedLine,
+                          config::kClrYellow);
+        }
+        if (flgBoxes == true)
+            for (auto rect : vctRects)
+                cv::rectangle(imgShow, rect, config::kClrGreen);
+        cv::imshow(config::kWinInput,              imgShow);
+        cv::imshow(config::kWinFaceBeautification, imgBeautif);
+    }
+```
+Once results are ready and can be pulled from the pipeline we display it on the screen and handle GUI events.
+
+See [Running the pipeline](@ref gapi_ifd_running) section in the "Face Analytics Pipeline" tutorial for more details.
+
+## Conclusion
+The tutorial has two goals: to show the use of brand new features of G-API introduced in OpenCV 4.2, and give a basic understanding on a sample face beautification algorithm.
+
+The result of the algorithm application:
+
+![Face Beautification example](../img/gapi_face_beautification_example.jpg)
+
+On the test machine (Intel® Core™ i7-8700) the G-API-optimized video pipeline outperforms its serial (non-pipelined) version by a factor of 2.7 – meaning that for such a non-trivial graph, the proper pipelining can bring almost 3x increase in performance.
--- a/docs/gapi/gapi_face_analytics_pipeline.md
+++ b/docs/gapi/gapi_face_analytics_pipeline.md
@@ -0,0 +1,325 @@
+# Building a Face Analytics Pipeline {#openvino_docs_gapi_gapi_face_analytics_pipeline}
+
+## Overview
+In this tutorial you will learn:
+
+* How to integrate Deep Learning inference in a G-API graph.
+* How to run a G-API graph on a video stream and obtain data from it.
+
+## Prerequisites
+This sample requires:
+
+* PC with GNU/Linux* or Microsoft Windows* (Apple macOS* is supported but was not tested)
+* OpenCV 4.2 or higher built with [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) (building with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial)
+* The following pre-trained models from the [Open Model Zoo](@ref omz_models_intel_index):
+    * [face-detection-adas-0001](@ref omz_models_intel_face_detection_adas_0001_description_face_detection_adas_0001)
+    * [age-gender-recognition-retail-0013](@ref omz_models_intel_age_gender_recognition_retail_0013_description_age_gender_recognition_retail_0013)
+    * [emotions-recognition-retail-0003](@ref omz_models_intel_emotions_recognition_retail_0003_description_emotions_recognition_retail_0003)
+
+To download the models from the Open Model Zoo, use the [Model Downloader](@ref omz_tools_downloader_README) tool.
+
+## Introduction: Why G-API
+Many computer vision algorithms run on a video stream rather than on individual images. Stream processing usually consists of multiple steps – like decode, preprocessing, detection, tracking, classification (on detected objects), and visualization – forming a *video processing pipeline*. Moreover, many these steps of such pipeline can run in parallel – modern platforms have different hardware blocks on the same chip like decoders and GPUs, and extra accelerators can be plugged in as extensions, like Intel® Movidius™ Neural Compute Stick for deep learning offload.
+
+Given all this manifold of options and a variety in video analytics algorithms, managing such pipelines effectively quickly becomes a problem. For sure it can be done manually, but this approach doesn't scale: if a change is required in the algorithm (e.g. a new pipeline step is added), or if it is ported on a new platform with different capabilities, the whole pipeline needs to be re-optimized.
+
+Starting with version 4.2, OpenCV offers a solution to this problem. OpenCV G-API now can manage Deep Learning inference (a cornerstone of any modern analytics pipeline) with a traditional Computer Vision as well as video capturing/decoding, all in a single pipeline. G-API takes care of pipelining itself – so if the algorithm or platform changes, the execution model adapts to it automatically.
+
+## Pipeline Overview
+Our sample application is based on [Interactive Face Detection](omz_demos_interactive_face_detection_demo_README) demo from Open Model Zoo. A simplified pipeline consists of the following steps:
+
+1. Image acquisition and decode
+2. Detection with preprocessing
+3. Classification with preprocessing for every detected object with two networks
+4. Visualization
+
+![Face Analytics Pipeline Overview](../img/gapi_face_analytics_pipeline.png)
+
+## Construct a pipeline {#gapi_ifd_constructing}
+
+Constructing a G-API graph for a video streaming case does not differ much from a [regular usage](https://docs.opencv.org/4.5.0/d0/d1e/gapi.html#gapi_example) of G-API -- it is still about defining graph *data* (with cv::GMat, `cv::GScalar`, and `cv::GArray`) and *operations* over it. Inference also becomes an operation in the graph, but is defined in a little bit different way.
+
+### Declare Deep Learning topologies {#gapi_ifd_declaring_nets}
+
+In contrast with traditional CV functions (see [core](https://docs.opencv.org/4.5.0/df/d1f/group__gapi__core.html) and [imgproc](https://docs.opencv.org/4.5.0/d2/d00/group__gapi__imgproc.html)) where G-API declares distinct operations for every function, inference in G-API is a single generic operation `cv::gapi::infer<>`. As usual, it is just an interface and it can be implemented in a number of ways under the hood. In OpenCV 4.2, only OpenVINO™ Inference Engine-based backend is available, and OpenCV's own DNN module-based backend is to come.
+
+`cv::gapi::infer<>` is _parametrized_ by the details of a topology we are going to execute. Like operations, topologies in G-API are strongly typed and are defined with a special macro `G_API_NET()`:
+
+```cpp
+// Face detector: takes one Mat, returns another Mat
+G_API_NET(Faces, <cv::GMat(cv::GMat)>, "face-detector");
+// Age/Gender recognition - takes one Mat, returns two:
+// one for Age and one for Gender. In G-API, multiple-return-value operations
+// are defined using std::tuple<>.
+using AGInfo = std::tuple<cv::GMat, cv::GMat>;
+G_API_NET(AgeGender, <AGInfo(cv::GMat)>,   "age-gender-recoginition");
+// Emotion recognition - takes one Mat, returns another.
+G_API_NET(Emotions, <cv::GMat(cv::GMat)>, "emotions-recognition");
+```
+
+Similar to how operations are defined with `G_API_OP()`, network description requires three parameters:
+1. A type name. Every defined topology is declared as a distinct C++ type which is used further in the program -- see below.
+2. A `std::function<>`-like API signature. G-API traits networks as regular "functions" which take and return data. Here network `Faces` (a detector) takes a `cv::GMat` and returns a `cv::GMat`, while network `AgeGender` is known to provide two outputs (age and gender blobs, respectively) -- so its has a `std::tuple<>` as a return type.
+3. A topology name -- can be any non-empty string, G-API is using these names to distinguish networks inside. Names should be unique in the scope of a single graph.
+
+## Building a GComputation {#gapi_ifd_gcomputation}
+
+Now the above pipeline is expressed in G-API like this:
+
+```cpp
+cv::GComputation pp([]() {
+    // Declare an empty GMat - the beginning of the pipeline.
+    cv::GMat in;
+    // Run face detection on the input frame. Result is a single GMat,
+    // internally representing an 1x1x200x7 SSD output.
+    // This is a single-patch version of infer:
+    // - Inference is running on the whole input image;
+    // - Image is converted and resized to the network's expected format
+    //   automatically.
+    cv::GMat detections = cv::gapi::infer<custom::Faces>(in);
+    // Parse SSD output to a list of ROI (rectangles) using
+    // a custom kernel. Note: parsing SSD may become a "standard" kernel.
+    cv::GArray<cv::Rect> faces = custom::PostProc::on(detections, in);
+    // Now run Age/Gender model on every detected face. This model has two
+    // outputs (for age and gender respectively).
+    // A special ROI-list-oriented form of infer<>() is used here:
+    // - First input argument is the list of rectangles to process,
+    // - Second one is the image where to take ROI from;
+    // - Crop/Resize/Layout conversion happens automatically for every image patch
+    //   from the list
+    // - Inference results are also returned in form of list (GArray<>)
+    // - Since there're two outputs, infer<> return two arrays (via std::tuple).
+    cv::GArray<cv::GMat> ages;
+    cv::GArray<cv::GMat> genders;
+    std::tie(ages, genders) = cv::gapi::infer<custom::AgeGender>(faces, in);
+    // Recognize emotions on every face.
+    // ROI-list-oriented infer<>() is used here as well.
+    // Since custom::Emotions network produce a single output, only one
+    // GArray<> is returned here.
+    cv::GArray<cv::GMat> emotions = cv::gapi::infer<custom::Emotions>(faces, in);
+    // Return the decoded frame as a result as well.
+    // Input matrix can't be specified as output one, so use copy() here
+    // (this copy will be optimized out in the future).
+    cv::GMat frame = cv::gapi::copy(in);
+    // Now specify the computation's boundaries - our pipeline consumes
+    // one images and produces five outputs.
+    return cv::GComputation(cv::GIn(in),
+                            cv::GOut(frame, faces, ages, genders, emotions));
+});
+```
+
+Every pipeline starts with declaring empty data objects – which act as inputs to the pipeline. Then we call a generic `cv::gapi::infer<>` specialized to Faces detection network. `cv::gapi::infer<>` inherits its signature from its template parameter – and in this case it expects one input cv::GMat and produces one output cv::GMat.
+
+In this sample we use a pre-trained SSD-based network and its output needs to be parsed to an array of detections (object regions of interest, ROIs). It is done by a custom operation custom::PostProc, which returns an array of rectangles (of type `cv::GArray<cv::Rect>`) back to the pipeline. This operation also filters out results by a confidence threshold – and these details are hidden in the kernel itself. Still, at the moment of graph construction we operate with interfaces only and don't need actual kernels to express the pipeline – so the implementation of this post-processing will be listed later.
+
+After detection result output is parsed to an array of objects, we can run classification on any of those. G-API doesn't support syntax for in-graph loops like `for_each()` yet, but instead `cv::gapi::infer<>` comes with a special list-oriented overload.
+
+User can call `cv::gapi::infer<>` with a `cv::GArray` as the first argument, so then G-API assumes it needs to run the associated network on every rectangle from the given list of the given frame (second argument). Result of such operation is also a list – a cv::GArray of `cv::GMat`.
+
+Since AgeGender network itself produces two outputs, it's output type for a list-based version of `cv::gapi::infer` is a tuple of arrays. We use `std::tie()` to decompose this input into two distinct objects.
+
+Emotions network produces a single output so its list-based inference's return type is `cv::GArray<cv::GMat>`.
+
+## Configure the Pipeline {#gapi_ifd_configuration}
+
+G-API strictly separates construction from configuration -- with the idea to keep algorithm code itself platform-neutral. In the above listings we only declared our operations and expressed the overall data flow, but didn't even mention that we use OpenVINO™. We only described *what* we do, but not *how* we do it. Keeping these two aspects clearly separated is the design goal for G-API.
+
+Platform-specific details arise when the pipeline is *compiled* -- i.e. is turned from a declarative to an executable form. The way *how* to run stuff is specified via compilation arguments, and new inference/streaming features are no exception from this rule. 
+
+G-API is built on backends which implement interfaces (see [Architecture](https://docs.opencv.org/4.5.0/de/d4d/gapi_hld.html) and [Kernels](kernel_api.md) for details) -- thus `cv::gapi::infer<>` is a function which can be implemented by different backends. In OpenCV 4.2, only OpenVINO™ Inference Engine backend for inference is available. Every inference backend in G-API has to provide a special parameterizable structure to express *backend-specific* neural network parameters -- and in this case, it is `cv::gapi::ie::Params`:
+
+```cpp
+auto det_net = cv::gapi::ie::Params<custom::Faces> {
+    cmd.get<std::string>("fdm"),   // read cmd args: path to topology IR
+    cmd.get<std::string>("fdw"),   // read cmd args: path to weights
+    cmd.get<std::string>("fdd"),   // read cmd args: device specifier
+};
+auto age_net = cv::gapi::ie::Params<custom::AgeGender> {
+    cmd.get<std::string>("agem"),   // read cmd args: path to topology IR
+    cmd.get<std::string>("agew"),   // read cmd args: path to weights
+    cmd.get<std::string>("aged"),   // read cmd args: device specifier
+}.cfgOutputLayers({ "age_conv3", "prob" });
+auto emo_net = cv::gapi::ie::Params<custom::Emotions> {
+    cmd.get<std::string>("emom"),   // read cmd args: path to topology IR
+    cmd.get<std::string>("emow"),   // read cmd args: path to weights
+    cmd.get<std::string>("emod"),   // read cmd args: device specifier
+};
+```
+
+Here we define three parameter objects: `det_net`, `age_net`, and `emo_net`. Every object is a `cv::gapi::ie::Params` structure parametrization for each particular network we use. On a compilation stage, G-API automatically matches network parameters with their `cv::gapi::infer<>` calls in graph using this information.
+
+Regardless of the topology, every parameter structure is constructed with three string arguments – specific to the OpenVINO™ Inference Engine:
+
+* Path to the topology's intermediate representation (.xml file);
+* Path to the topology's model weights (.bin file);
+* Device where to run – "CPU", "GPU", and others – based on your OpenVINO™ Toolkit installation. These arguments are taken from the command-line parser.
+
+Once networks are defined and custom kernels are implemented, the pipeline is compiled for streaming:
+```cpp
+// Form a kernel package (with a single OpenCV-based implementation of our
+// post-processing) and a network package (holding our three networks).
+auto kernels = cv::gapi::kernels<custom::OCVPostProc>();
+auto networks = cv::gapi::networks(det_net, age_net, emo_net);
+// Compile our pipeline and pass our kernels & networks as
+// parameters.  This is the place where G-API learns which
+// networks & kernels we're actually operating with (the graph
+// description itself known nothing about that).
+auto cc = pp.compileStreaming(cv::compile_args(kernels, networks));
+```
+
+`cv::GComputation::compileStreaming()` triggers a special video-oriented form of graph compilation where G-API is trying to optimize throughput. Result of this compilation is an object of special type `cv::GStreamingCompiled` – in contrast to a traditional callable `cv::GCompiled`, these objects are closer to media players in their semantics.
+
+> **NOTE**: There is no need to pass metadata arguments describing the format of the input video stream in `cv::GComputation::compileStreaming()` – G-API figures automatically what are the formats of the input vector and adjusts the pipeline to these formats on-the-fly. User still can pass metadata there as with regular `cv::GComputation::compile()` in order to fix the pipeline to the specific input format.
+
+## Running the Pipeline  {#gapi_ifd_running}
+
+Pipelining optimization is based on processing multiple input video frames simultaneously, running different steps of the pipeline in parallel. This is why it works best when the framework takes full control over the video stream.
+
+The idea behind streaming API is that user specifies an *input source* to the pipeline and then G-API manages its execution automatically until the source ends or user interrupts the execution. G-API pulls new image data from the source and passes it to the pipeline for processing.
+
+Streaming sources are represented by the interface `cv::gapi::wip::IStreamSource`. Objects implementing this interface may be passed to `GStreamingCompiled` as regular inputs via `cv::gin()` helper function. In OpenCV 4.2, only one streaming source is allowed per pipeline -- this requirement will be relaxed in the future.
+
+OpenCV comes with a great class cv::VideoCapture and by default G-API ships with a stream source class based on it -- `cv::gapi::wip::GCaptureSource`. Users can implement their own
+streaming sources e.g. using [VAAPI](https://01.org/vaapi) or other Media or Networking APIs.
+
+Sample application specifies the input source as follows:
+```cpp
+auto in_src = cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(input);
+cc.setSource(cv::gin(in_src));
+```
+
+Please note that a GComputation may still have multiple inputs like `cv::GMat`, `cv::GScalar`, or `cv::GArray` objects. User can pass their respective host-side types (`cv::Mat`, `cv::Scalar`, `std::vector<>`) in the input vector as well, but in Streaming mode these objects will create "endless" constant streams. Mixing a real video source stream and a const data stream is allowed.
+
+Running a pipeline is easy – just call `cv::GStreamingCompiled::start()` and fetch your data with blocking `cv::GStreamingCompiled::pull()` or non-blocking `cv::GStreamingCompiled::try_pull()`; repeat until the stream ends:
+
+```cpp
+// After data source is specified, start the execution
+cc.start();
+// Declare data objects we will be receiving from the pipeline.
+cv::Mat frame;                      // The captured frame itself
+std::vector<cv::Rect> faces;        // Array of detected faces
+std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
+std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
+std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
+// Implement different execution policies depending on the display option
+// for the best performance.
+while (cc.running()) {
+    auto out_vector = cv::gout(frame, faces, out_ages, out_genders, out_emotions);
+    if (no_show) {
+        // This is purely a video processing. No need to balance
+        // with UI rendering.  Use a blocking pull() to obtain
+        // data. Break the loop if the stream is over.
+        if (!cc.pull(std::move(out_vector)))
+            break;
+    } else if (!cc.try_pull(std::move(out_vector))) {
+        // Use a non-blocking try_pull() to obtain data.
+        // If there's no data, let UI refresh (and handle keypress)
+        if (cv::waitKey(1) >= 0) break;
+        else continue;
+    }
+    // At this point we have data for sure (obtained in either
+    // blocking or non-blocking way).
+    frames++;
+    labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
+    labels::DrawFPS(frame, frames, avg.fps(frames));
+    if (!no_show) cv::imshow("Out", frame);
+}
+```
+
+The above code may look complex but in fact it handles two modes – with and without graphical user interface (GUI):
+
+* When a sample is running in a "headless" mode (`--pure` option is set), this code simply pulls data from the pipeline with the blocking `pull()` until it ends. This is the most performant mode of execution.
+* When results are also displayed on the screen, the Window System needs to take some time to refresh the window contents and handle GUI events. In this case, the demo pulls data with a non-blocking `try_pull()` until there is no more data available (but it does not mark end of the stream – just means new data is not ready yet), and only then displays the latest obtained result and refreshes the screen. Reducing the time spent in GUI with this trick increases the overall performance a little bit.
+
+## Comparison with Serial Mode
+The sample can also run in a serial mode for a reference and benchmarking purposes. In this case, a regular `cv::GComputation::compile()` is used and a regular single-frame `cv::GCompiled` object is produced; the pipelining optimization is not applied within G-API; it is the user responsibility to acquire image frames from `cv::VideoCapture` object and pass those to G-API.
+
+```cpp
+cv::VideoCapture cap(input);
+cv::Mat in_frame, frame;            // The captured frame itself
+std::vector<cv::Rect> faces;        // Array of detected faces
+std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
+std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
+std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
+while (cap.read(in_frame)) {
+    pp.apply(cv::gin(in_frame),
+             cv::gout(frame, faces, out_ages, out_genders, out_emotions),
+             cv::compile_args(kernels, networks));
+    labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
+    frames++;
+    if (frames == 1u) {
+        // Start timer only after 1st frame processed -- compilation
+        // happens on-the-fly here
+        avg.start();
+    } else {
+        // Measurfe & draw FPS for all other frames
+        labels::DrawFPS(frame, frames, avg.fps(frames-1));
+    }
+    if (!no_show) {
+        cv::imshow("Out", frame);
+        if (cv::waitKey(1) >= 0) break;
+    }
+}
+```
+
+On a test machine (Intel® Core™ i5-6600), with OpenCV built with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) support, detector network assigned to CPU, and classifiers to iGPU, the pipelined sample outperformes the serial one by the factor of 1.36x (thus adding +36% in overall throughput).
+
+## Conclusion
+G-API introduces a technological way to build and optimize hybrid pipelines. Switching to a new execution model does not require changes in the algorithm code expressed with G-API – only the way how graph is triggered differs.
+
+## Listing: Post-Processing Kernel
+G-API gives an easy way to plug custom code into the pipeline even if it is running in a streaming mode and processing tensor data. Inference results are represented by multi-dimensional `cv::Mat` objects so accessing those is as easy as with a regular DNN module.
+
+The OpenCV-based SSD post-processing kernel is defined and implemented in this sample as follows:
+```cpp
+// SSD Post-processing function - this is not a network but a kernel.
+// The kernel body is declared separately, this is just an interface.
+// This operation takes two Mats (detections and the source image),
+// and returns a vector of ROI (filtered by a default threshold).
+// Threshold (or a class to select) may become a parameter, but since
+// this kernel is custom, it doesn't make a lot of sense.
+G_API_OP(PostProc, <cv::GArray<cv::Rect>(cv::GMat, cv::GMat)>, "custom.fd_postproc") {
+    static cv::GArrayDesc outMeta(const cv::GMatDesc &, const cv::GMatDesc &) {
+        // This function is required for G-API engine to figure out
+        // what the output format is, given the input parameters.
+        // Since the output is an array (with a specific type),
+        // there's nothing to describe.
+        return cv::empty_array_desc();
+    }
+};
+// OpenCV-based implementation of the above kernel.
+GAPI_OCV_KERNEL(OCVPostProc, PostProc) {
+    static void run(const cv::Mat &in_ssd_result,
+                    const cv::Mat &in_frame,
+                    std::vector<cv::Rect> &out_faces) {
+        const int MAX_PROPOSALS = 200;
+        const int OBJECT_SIZE   =   7;
+        const cv::Size upscale = in_frame.size();
+        const cv::Rect surface({0,0}, upscale);
+        out_faces.clear();
+        const float *data = in_ssd_result.ptr<float>();
+        for (int i = 0; i < MAX_PROPOSALS; i++) {
+            const float image_id   = data[i * OBJECT_SIZE + 0]; // batch id
+            const float confidence = data[i * OBJECT_SIZE + 2];
+            const float rc_left    = data[i * OBJECT_SIZE + 3];
+            const float rc_top     = data[i * OBJECT_SIZE + 4];
+            const float rc_right   = data[i * OBJECT_SIZE + 5];
+            const float rc_bottom  = data[i * OBJECT_SIZE + 6];
+            if (image_id < 0.f) {  // indicates end of detections
+                break;
+            }
+            if (confidence < 0.5f) { // a hard-coded snapshot
+                continue;
+            }
+            // Convert floating-point coordinates to the absolute image
+            // frame coordinates; clip by the source image boundaries.
+            cv::Rect rc;
+            rc.x      = static_cast<int>(rc_left   * upscale.width);
+            rc.y      = static_cast<int>(rc_top    * upscale.height);
+            rc.width  = static_cast<int>(rc_right  * upscale.width)  - rc.x;
+            rc.height = static_cast<int>(rc_bottom * upscale.height) - rc.y;
+            out_faces.push_back(rc & surface);
+        }
+    }
+};
+```
--- a/docs/gapi/gapi_intro.md
+++ b/docs/gapi/gapi_intro.md
@@ -0,0 +1,52 @@
+# Introduction to OpenCV Graph API (G-API) {#openvino_docs_gapi_gapi_intro}
+OpenCV Graph API (G-API) is an OpenCV module targeted to make regular image and video processing fast and portable. G-API is a special module in OpenCV – in contrast with the majority of other main modules, this one acts as a framework rather than some specific CV algorithm. 
+
+G-API is positioned as a next level optimization enabler for computer vision, focusing not on particular CV functions but on the whole algorithm optimization.
+
+G-API provides means to define CV operations, construct graphs (in form of expressions) using it, and finally implement and run the operations for a particular backend.
+
+The idea behind G-API is that if an algorithm can be expressed in a special embedded language (currently in C++), the framework can catch its sense and apply a number of optimizations to the whole thing automatically. Particular optimizations are selected based on which [kernels](kernel_api.md) and [backends](https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html) are involved in the graph compilation process, for example, the graph can be offloaded to GPU via the OpenCL backend, or optimized for memory consumption with the Fluid backend. Kernels, backends, and their settings are parameters to the graph compilation, so the graph itself does not depend on any platform-specific details and can be ported easily.
+
+> **NOTE**: Graph API (G-API) was introduced in the most recent major OpenCV 4.0 release and now is being actively developed. The API is volatile at the moment and there may be minor but compatibility-breaking changes in the future.
+
+## G-API Concepts
+
+* *Graphs* are built by applying operations to data objects.
+   * API itself has no "graphs", it is expression-based instead.
+* *Data objects* do not hold actual data, only capture dependencies.
+* *Operations* consume and produce data objects.
+* A graph is defined by specifying its boundaries with data objects:
+   * What data objects are inputs to the graph?
+   * What are its outputs?
+
+The paragraphs below explain the G-API programming model and development workflow.   
+
+## Programming Model
+Building graphs is easy with G-API. In fact, there is no notion of graphs exposed in the API, so the user doesn’t need to operate in terms of “nodes” and “edges” — instead, graphs are constructed implicitly via expressions in a "functional" way. Expression-based graphs are built using two major concepts: *[operations](kernel_api.md)* and *[data objects](https://docs.opencv.org/4.2.0/db/df1/group__gapi__data__objects.html)*.
+
+In G-API, every graph begins and ends with data objects; data objects are passed to operations which produce (“return”) their results — new data objects, which are then passed to other operations, and so on. You can declare their own operations, G-API does not distinguish user-defined operations from its own predefined ones in any way.
+
+After the graph is defined, it needs to be compiled for execution. During the compilation, G-API figures out what the graph looks like, which kernels are available to run the operations in the graph, how to manage heterogeneity and to optimize the execution path. The result of graph compilation is a so-called “compiled” object. This object encapsulates the execution sequence for the graph inside and operates on real image data. You can set up the compilation process using various [compilation arguments](https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html). Backends expose some of their options as these arguments; also, actual kernels and DL network settings are passed into the framework this way.
+
+G-API supports graph compilation for two execution modes, *regular* and *streaming*, producing different types of compiled objects as the result.
+* <strong>Regular</strong> compiled objects are represented with class GCompiled, which follows functor-like semantics and has an overloaded operator(). When called for execution on the given input data, the GCompiled functor blocks the current thread and processes the data immediately — like a regular C++ function. By default, G-API tries to optimize the execution time for latency in this compilation mode.
+* Starting with OpenCV 4.2, G-API can also produce GStreamingCompiled objects that better fit the asynchronous pipelined execution model. This compilation mode is called **streaming mode**, and G-API tries to optimize the overall throughput by implementing the pipelining technique as described above. We will use both in our example.
+
+The overall process for the regular case is summarized in the diagram below:
+
+![G-API Programming Model](../img/gapi_programming_model.png)
+
+The graph is built with operations so having operations defined (**0**) is a basic prerequisite; a constructed expression graph (**1**) forms a `cv::GComputation` object; kernels (**2**) which implement operations are the basic requirement to the graph compilation (**3**); the actual execution (**4**) is handled by a `cv::GCompiled` object with takes input and produces output data.
+
+## Development Workflow
+One of the ways to organize a G-API development workflow is presented in the diagram below:
+
+![G-API development workflow](../img/gapi_development_workflow.png)
+
+Basically, it is a derivative from the programming model illustrated in the previous chapter. You start with an algorithm or a data flow in mind (**0**), mapping it to a graph model (**1**), then identifying what operations you need (**2**) to construct this graph. These operations may already exist in G-API or be missing, in the latter case we implement the missing ones as kernels (**3**). Then decide which execution model fits our case better, pass kernels and DL networks as arguments to the compilation process (**4**), and finally switch to the execution (**5**). The process is iterative, so if you want to change anything based on the execution results, get back to steps (**0**) or (**1**) (a dashed line).
+
+
+
+
+
+
--- a/docs/gapi/kernel_api.md
+++ b/docs/gapi/kernel_api.md
@@ -0,0 +1,188 @@
+# Graph API Kernel API {#openvino_docs_gapi_kernel_api}
+
+The core idea behind Graph API (G-API) is portability – a pipeline built with G-API must be portable (or at least able to be portable). It means that either it works out-of-the box when compiled for new platform, or G-API provides necessary tools to make it running there, with little-to-no changes in the algorithm itself.
+
+This idea can be achieved by separating kernel interface from its implementation. Once a pipeline is built using kernel interfaces, it becomes implementation-neutral – the implementation details (i.e. which kernels to use) are passed on a separate stage (graph compilation).
+
+Kernel-implementation hierarchy may look like:
+![Kernel API/implementation hierarchy example](../img/gapi_kernel_implementation_hierarchy.png)
+
+A pipeline itself then can be expressed only in terms of `A`, `B`, and so on, and choosing which implementation to use in execution becomes an external parameter.
+
+## Define a Kernel
+G-API provides a macro to define a new kernel interface `G_TYPED_KERNEL()`:
+
+```cpp
+#include <opencv2/gapi.hpp>
+G_TYPED_KERNEL(GFilter2D,
+               <cv::GMat(cv::GMat,int,cv::Mat,cv::Point,double,int,cv::Scalar)>,
+               "org.opencv.imgproc.filters.filter2D")
+{
+    static cv::GMatDesc                 // outMeta's return value type
+    outMeta(cv::GMatDesc    in       ,  // descriptor of input GMat
+            int             ddepth   ,  // depth parameter
+            cv::Mat      /* coeffs */,  // (unused)
+            cv::Point    /* anchor */,  // (unused)
+            double       /* scale  */,  // (unused)
+            int          /* border */,  // (unused)
+            cv::Scalar   /* bvalue */ ) // (unused)
+    {
+        return in.withDepth(ddepth);
+    }
+};
+```
+
+This macro is a shortcut to a new type definition. It takes three arguments to register a new type, and requires type body to be present (see below). The macro arguments are:
+
+* Kernel interface name -- Also serves as a name of new type defined with this macro;
+* Kernel signature -- An `std::function<>`-like signature which defines API of the kernel;
+* Kernel's unique name -- Used to identify kernel when its type information is stripped within the system.
+* Kernel declaration may be seen as function declaration -- In both cases a new entity must be used then according to the way it was defined.
+
+Kernel signature defines kernel's usage syntax -- which parameters it takes during graph construction. Implementations can also use this signature to derive it into backend-specific callback signatures (see next chapter).
+
+Kernel may accept values of any type, and G-API dynamic types are handled in a special way. All other types are opaque to G-API and passed to kernel in `outMeta()` or in execution callbacks as-is.
+
+Kernel's return value can only be of G-API dynamic type – `cv::GMat`, `cv::GScalar`, or `cv::GArray<T>`. If an operation has more than one output, it should be wrapped into an `std::tuple<>` (which can contain only mentioned G-API types). Arbitrary-output-number operations are not supported.
+
+Once a kernel is defined, it can be used in pipelines with special, G-API-supplied method `on()`. This method has the same signature as defined in kernel, so the following code is a perfectly legal construction:
+
+```cpp
+cv::GMat in;
+cv::GMat out = GFilter2D::on(/* GMat    */  in,
+                             /* int     */  -1,
+                             /* Mat     */  conv_kernel_mat,
+                             /* Point   */  cv::Point(-1,-1),
+                             /* double  */  0.,
+                             /* int     */  cv::BORDER_DEFAULT,
+                             /* Scalar  */  cv::Scalar(0));
+```
+This example has some verbosity, though, so usually a kernel declaration comes with a C++ function wrapper ("factory method") which enables optional parameters, more compact syntax, Doxygen comments, etc.:
+
+```cpp
+cv::GMat filter2D(cv::GMat   in,
+                  int        ddepth,
+                  cv::Mat    k,
+                  cv::Point  anchor  = cv::Point(-1,-1),
+                  double     scale   = 0.,
+                  int        border  = cv::BORDER_DEFAULT,
+                  cv::Scalar bval    = cv::Scalar(0))
+{
+    return GFilter2D::on(in, ddepth, k, anchor, scale, border, bval);
+}
+```
+So now it can be used like:
+```cpp
+cv::GMat in;
+cv::GMat out = filter2D(in, -1, conv_kernel_mat);
+```
+
+### Extra information
+In the current version, kernel declaration body (everything within the curly braces) must contain a static function `outMeta()`. This function establishes a functional dependency between operation's input and output metadata.
+
+Metadata is an information about data kernel operates on. Since non-G-API types are opaque to G-API, G-API cares only about G* data descriptors (i.e. dimensions and format of `cv::GMat`, etc).
+
+`outMeta()` is also an example of how kernel's signature can be transformed into a derived callback – note that in this example, outMeta() signature exactly follows the kernel signature (defined within the macro) but is different – where kernel expects `cv::GMat`, `outMeta()` takes and returns `cv::GMatDesc` (a G-API structure metadata for `cv::GMat`).
+
+The point of `outMeta()` is to propagate metadata information within computation from inputs to outputs and infer metadata of internal (intermediate, temporary) data objects. This information is required for further pipeline optimizations, memory allocation, and other operations done by G-API framework during graph compilation.
+
+## Implement a Kernel
+Once a kernel is declared, its interface can be used to implement versions of this kernel in different backends. This concept is naturally projected from object-oriented programming "Interface/Implementation" idiom: an interface can be implemented multiple times, and different implementations of a kernel should be substitutable with each other without breaking the algorithm (pipeline) logic (Liskov Substitution Principle).
+
+Every backend defines its own way to implement a kernel interface. This way is regular, though – whatever plugin is, its kernel implementation must be "derived" from a kernel interface type.
+
+Kernel implementation are then organized into kernel packages. Kernel packages are passed to `cv::GComputation::compile()` as compile arguments, with some hints to G-API on how to select proper kernels (see more on this in "Heterogeneity"[TBD]).
+
+For example, the aforementioned Filter2D is implemented in "reference" CPU (OpenCV) plugin this way (NOTE – this is a simplified form with improper border handling):
+
+```cpp
+#include <opencv2/gapi/cpu/gcpukernel.hpp>     // GAPI_OCV_KERNEL()
+#include <opencv2/imgproc.hpp>                 // cv::filter2D()
+GAPI_OCV_KERNEL(GCPUFilter2D, GFilter2D)
+{
+    static void
+    run(const cv::Mat    &in,       // in - derived from GMat
+        const int         ddepth,   // opaque (passed as-is)
+        const cv::Mat    &k,        // opaque (passed as-is)
+        const cv::Point  &anchor,   // opaque (passed as-is)
+        const double      delta,    // opaque (passed as-is)
+        const int         border,   // opaque (passed as-is)
+        const cv::Scalar &,         // opaque (passed as-is)
+        cv::Mat          &out)      // out - derived from GMat (retval)
+    {
+        cv::filter2D(in, out, ddepth, k, anchor, delta, border);
+    }
+};
+```
+Note how CPU (OpenCV) plugin has transformed the original kernel signature:
+
+* Input `cv::GMat` has been substituted with `cv::Mat`, holding actual input data for the underlying OpenCV function call;
+* Output `cv::GMat `has been transformed into extra output parameter, thus `GCPUFilter2D::run()` takes one argument more than the original kernel signature.
+
+The basic intuition for kernel developer here is not to care where that cv::Mat objects come from instead of the original `cv::GMat` – and just follow the signature conventions defined by the plugin. G-API will call this method during execution and supply all the necessary information (and forward the original opaque data as-is).
+
+## Compound Kernels
+Sometimes kernel is a single thing only on API level. It is convenient for users, but on a particular implementation side it would be better to have multiple kernels (a subgraph) doing the thing instead. An example is `goodFeaturesToTrack()` – while in OpenCV backend it may remain a single kernel, with Fluid it becomes compound – Fluid can handle Harris response calculation but can't do sparse non-maxima suppression and point extraction to an STL vector:
+
+A compound kernel implementation can be defined using a generic macro `GAPI_COMPOUND_KERNEL()`:
+
+```cpp
+#include <opencv2/gapi/gcompoundkernel.hpp>       // GAPI_COMPOUND_KERNEL()
+using PointArray2f = cv::GArray<cv::Point2f>;
+G_TYPED_KERNEL(HarrisCorners,
+               <PointArray2f(cv::GMat,int,double,double,int,double)>,
+               "org.opencv.imgproc.harris_corner")
+{
+    static cv::GArrayDesc outMeta(const cv::GMatDesc &,
+                                  int,
+                                  double,
+                                  double,
+                                  int,
+                                  double)
+    {
+        // No special metadata for arrays in G-API (yet)
+        return cv::empty_array_desc();
+    }
+};
+// Define Fluid-backend-local kernels which form GoodFeatures
+G_TYPED_KERNEL(HarrisResponse,
+               <cv::GMat(cv::GMat,double,int,double)>,
+               "org.opencv.fluid.harris_response")
+{
+    static cv::GMatDesc outMeta(const cv::GMatDesc &in,
+                                double,
+                                int,
+                                double)
+    {
+        return in.withType(CV_32F, 1);
+    }
+};
+G_TYPED_KERNEL(ArrayNMS,
+               <PointArray2f(cv::GMat,int,double)>,
+               "org.opencv.cpu.nms_array")
+{
+    static cv::GArrayDesc outMeta(const cv::GMatDesc &,
+                                  int,
+                                  double)
+    {
+        return cv::empty_array_desc();
+    }
+};
+GAPI_COMPOUND_KERNEL(GFluidHarrisCorners, HarrisCorners)
+{
+    static PointArray2f
+    expand(cv::GMat in,
+           int      maxCorners,
+           double   quality,
+           double   minDist,
+           int      blockSize,
+           double   k)
+    {
+        cv::GMat response = HarrisResponse::on(in, quality, blockSize, k);
+        return ArrayNMS::on(response, maxCorners, minDist);
+    }
+};
+// Then implement HarrisResponse as Fluid kernel and NMSresponse
+// as a generic (OpenCV) kernel
+```
+It is important to distinguish a compound kernel from G-API high-order function, i.e. a C++ function which looks like a kernel but in fact generates a subgraph. The core difference is that a compound kernel is an *implementation detail* and a kernel implementation may be either compound or not (depending on backend capabilities), while a high-order function is a "macro" in terms of G-API and so cannot act as an interface which then needs to be implemented by a backend.
--- a/docs/get_started/dl_workbench_img/DL_Workbench.jpg
+++ b/docs/get_started/dl_workbench_img/DL_Workbench.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:6675f4b68df7eaa3d6188ecc8b5d53be572cf9c92f53abac3bc6416e6b428d0c
+size 196146
--- a/docs/get_started/dl_workbench_img/Get_Started_Page-b.png
+++ b/docs/get_started/dl_workbench_img/Get_Started_Page-b.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:539deb67a7d1c0e8b0c037f8e7488445be0895e8e717bed5cfec64131936870c
+size 198207
--- a/docs/get_started/dl_workbench_img/convert_model.png
+++ b/docs/get_started/dl_workbench_img/convert_model.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:2925e58a71d684e23776e6ed55cc85d9085b3ba5e484720528aeac5fa59f9e3a
+size 55404
--- a/docs/get_started/dl_workbench_img/create_configuration.png
+++ b/docs/get_started/dl_workbench_img/create_configuration.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a52661c05977d878c614c4f8510935982ce8a0e120e05690307d7c95e4ab31
+size 73999
--- a/docs/get_started/dl_workbench_img/dataset_loading.png
+++ b/docs/get_started/dl_workbench_img/dataset_loading.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ddb0550f3f04c177ec116d6c41e6d3a2ac1fedea7121e10ad3836f84c86a5c78
+size 35278
--- a/docs/get_started/dl_workbench_img/generate_dataset.png
+++ b/docs/get_started/dl_workbench_img/generate_dataset.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:f1e329304ff3d586bb2b8e2442333ede085593f40b1567bd5250508d33d3b9f9
+size 32668
--- a/docs/get_started/dl_workbench_img/import_model_01.png
+++ b/docs/get_started/dl_workbench_img/import_model_01.png
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:605515f25a746579d3622b7a274c7dece95e4fbfc6c1817f99431c1abf116070
+size 55409
--- a/Show More
+++ b/Show More