Merge remote-tracking branch 'upstream/master' into itikhono/ts/slice

2023-03-20 19:47:18 +04:00 · 2023-03-20 19:47:18 +04:00 · 981e8ad3c0
commit 981e8ad3c0
parent a56a0768f1 c5f65eea73
233 changed files with 8054 additions and 4569 deletions
--- a/cmake/dependencies.cmake
+++ b/cmake/dependencies.cmake
@ -177,10 +177,11 @@ function(ov_download_tbbbind_2_5)

    if(WIN32 AND X86_64)
        RESOLVE_DEPENDENCY(TBBBIND_2_5
-                ARCHIVE_WIN "tbbbind_2_5_static_win_v1.zip"
+                ARCHIVE_WIN "tbbbind_2_5_static_win_v2.zip"
                TARGET_PATH "${TEMP}/tbbbind_2_5"
                ENVIRONMENT "TBBBIND_2_5_ROOT"
-                SHA256 "a67afeea8cf194f97968c800dab5b5459972908295242e282045d6b8953573c1")
+                SHA256 "49ae93b13a13953842ff9ae8d01681b269b5b0bc205daf18619ea9a828c44bee"
+                USE_NEW_LOCATION TRUE)
    elseif(LINUX AND X86_64)
        RESOLVE_DEPENDENCY(TBBBIND_2_5
                ARCHIVE_LIN "tbbbind_2_5_static_lin_v2.tgz"
--- a/docs/Documentation/inference_modes_overview.md
+++ b/docs/Documentation/inference_modes_overview.md
@ -10,15 +10,15 @@
   openvino_docs_OV_UG_Running_on_multiple_devices
   openvino_docs_OV_UG_Hetero_execution
   openvino_docs_OV_UG_Automatic_Batching
- 
-@endsphinxdirective

-OpenVINO Runtime offers multiple inference modes to allow optimum hardware utilization under different conditions. The most basic one is a single-device mode, which defines just one device responsible for the entire inference workload. It supports a range of Intel hardware by means of plugins embedded in the Runtime library, each set up to offer the best possible performance. For a complete list of supported devices and instructions on how to use them, refer to the [guide on inference devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md).
+
+OpenVINO Runtime offers multiple inference modes to allow optimum hardware utilization under different conditions. The most basic one is a single-device mode, which defines just one device responsible for the entire inference workload. It supports a range of Intel hardware by means of plugins embedded in the Runtime library, each set up to offer the best possible performance. For a complete list of supported devices and instructions on how to use them, refer to the :doc:`guide on inference devices <openvino_docs_OV_UG_Working_with_devices>`.

 The remaining modes assume certain levels of automation in selecting devices for inference. Using them in the deployed solution may potentially increase its performance and portability. The automated modes are:

-* [Automatic Device Selection (AUTO)](../OV_Runtime_UG/auto_device_selection.md)
-* [Multi-Device Execution (MULTI)](../OV_Runtime_UG/multi_device.md)
-* [Heterogeneous Execution (HETERO)](../OV_Runtime_UG/hetero_execution.md)
-* [Automatic Batching Execution (Auto-batching)](../OV_Runtime_UG/automatic_batching.md)
+* :doc:`Automatic Device Selection (AUTO) <openvino_docs_OV_UG_supported_plugins_AUTO>`
+* :doc:``Multi-Device Execution (MULTI) <openvino_docs_OV_UG_Running_on_multiple_devices>`
+* :doc:`Heterogeneous Execution (HETERO) <openvino_docs_OV_UG_Hetero_execution>`
+* :doc:`Automatic Batching Execution (Auto-batching) <openvino_docs_OV_UG_Automatic_Batching>`

+@endsphinxdirective
--- a/docs/IE_PLUGIN_DG/AsyncInferRequest.md
+++ b/docs/IE_PLUGIN_DG/AsyncInferRequest.md
@ -16,7 +16,7 @@ OpenVINO Runtime Plugin API provides the base InferenceEngine::AsyncInferRequest

 #### Class Fields

- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ie_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
+- `_inferRequest` - a reference to the [synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request) implementation. Its methods are reused in the `AsyncInferRequest` constructor to define a device pipeline.
 - `_waitExecutor` - a task executor that waits for a response from a device about device tasks completion

 > **NOTE**: If a plugin can work with several instances of a device, `_waitExecutor` must be device-specific. Otherwise, having a single task executor for several devices does not allow them to work in parallel.
--- a/docs/IE_PLUGIN_DG/Building.md
+++ b/docs/IE_PLUGIN_DG/Building.md
@ -1,4 +1,4 @@
-# Build Plugin Using CMake {#openvino_docs_ie_plugin_dg_plugin_build}
+# Build Plugin Using CMake {#openvino_docs_ov_plugin_dg_plugin_build}

 OpenVINO build infrastructure provides the OpenVINO Developer Package for plugin development.

--- a/docs/IE_PLUGIN_DG/CompiledModel.md
+++ b/docs/IE_PLUGIN_DG/CompiledModel.md
@ -0,0 +1,89 @@
+# Compiled Model {#openvino_docs_ov_plugin_dg_compiled_model}
+
+ov::CompiledModel class functionality:
+- Compile an ov::Model instance to a backend specific graph representation
+- Create an arbitrary number of ov::InferRequest objects
+- Hold some common resources shared between different instances of ov::InferRequest. For example:
+	- ov::ICompiledModel::m_task_executor task executor to implement asynchronous execution
+	- ov::ICompiledModel::m_callback_executor task executor to run an asynchronous inference request callback in a separate thread
+
+CompiledModel Class
+------------------------
+
+OpenVINO Plugin API provides the interface ov::ICompiledModel which should be used as a base class for a compiled model. Based on that, a declaration of an compiled model class can look as follows: 
+
+@snippet src/compiled_model.hpp compiled_model:header
+
+### Class Fields
+
+The example class has several fields:
+
+- `m_request_id` - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
+- `m_cfg` - Defines a configuration a compiled model was compiled with.
+- `m_model` - Keeps a reference to transformed `ov::Model` which is used in OpenVINO reference backend computations. Note, in case of other backends with backend specific graph representation `m_model` has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
+- `m_loaded_from_cache` - Allows to understand that model was loaded from cache.
+
+### CompiledModel Constructor
+
+This constructor accepts a generic representation of a model as an ov::Model and is compiled into a backend specific device graph:
+
+@snippet src/compiled_model.cpp compiled_model:ctor
+
+The implementation `compile_model()` is fully device-specific.
+
+### compile_model()
+
+The function accepts a const shared pointer to `ov::Model` object and applies OpenVINO passes using `transform_model()` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_OV_UG_lpt) guide.
+
+@snippet src/compiled_model.cpp compiled_model:compile_model
+
+> **NOTE**: After all these steps, the backend specific graph is ready to create inference requests and perform inference.
+
+### export_model()
+
+The implementation of the method should write all data to the `model_stream`, which is required to import a backend specific graph later in the `Plugin::import_model` method:
+
+@snippet src/compiled_model.cpp compiled_model:export_model
+
+### create_sync_infer_request()
+
+The method creates an synchronous inference request and returns it.
+
+@snippet src/compiled_model.cpp compiled_model:create_sync_infer_request
+
+While the public OpenVINO API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
+
+- [Synchronous inference request](@ref openvino_docs_ov_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `infer` method.
+- [Asynchronous inference request](@ref openvino_docs_ie_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
+   - For single-stage pipelines, there is no need to define this method and create a class derived from ov::IAsyncInferRequest. For single stage pipelines, a default implementation of this method creates ov::IAsyncInferRequest wrapping a synchronous inference request and runs it asynchronously in the `m_request_executor` executor.
+   - For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
+   > **IMPORTANT**: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
+
+
+### create_infer_request()
+
+The method creates an asynchronous inference request and returns it.
+
+@snippet src/compiled_model.cpp compiled_model:create_infer_request
+
+### get_property()
+
+Returns a current value for a property with the name `name`. The method extracts configuration values a compiled model is compiled with.
+
+@snippet src/compiled_model.cpp compiled_model:get_property
+
+This function is the only way to get configuration values when a model is imported and compiled by other developers and tools.
+
+### set_property()
+
+The methods allows to set compiled model specific properties.
+
+@snippet src/compiled_model.cpp compiled_model:set_property
+
+### get_runtime_model()
+
+The methods returns the runtime model with backend specific information.
+
+@snippet src/compiled_model.cpp compiled_model:get_runtime_model
+
+The next step in plugin library implementation is the [Synchronous Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class.
--- a/docs/IE_PLUGIN_DG/ExecutableNetwork.md
+++ b/docs/IE_PLUGIN_DG/ExecutableNetwork.md
@ -1,90 +0,0 @@
-# Executable Network {#openvino_docs_ie_plugin_dg_executable_network}
-
-`ExecutableNetwork` class functionality:
- Compile an InferenceEngine::ICNNNetwork instance to a backend specific graph representation
- Create an arbitrary number of `InferRequest` objects
- Hold some common resources shared between different instances of `InferRequest`. For example:
-	- InferenceEngine::IExecutableNetworkInternal::_taskExecutor task executor to implement asynchronous execution
-	- InferenceEngine::IExecutableNetworkInternal::_callbackExecutor task executor to run an asynchronous inference request callback in a separate thread
-
-`ExecutableNetwork` Class
------------------------
-
-Inference Engine Plugin API provides the helper InferenceEngine::ExecutableNetworkThreadSafeDefault class recommended to use as a base class for an executable network. Based on that, a declaration of an executable network class can look as follows: 
-
-@snippet src/compiled_model.hpp executable_network:header
-
-#### Class Fields
-
-The example class has several fields:
-
- `_requestId` - Tracks a number of created inference requests, which is used to distinguish different inference requests during profiling via the Intel® Instrumentation and Tracing Technology (ITT) library.
- `_cfg` - Defines a configuration an executable network was compiled with.
- `_plugin` - Refers to a plugin instance.
- `_function` - Keeps a reference to transformed `ngraph::Function` which is used in ngraph reference backend computations. Note, in case of other backends with backend specific graph representation `_function` has different type and represents backend specific graph or just a set of computational kernels to perform an inference.
- `_inputIndex` - maps a name of input with its index among all network inputs.
- `_outputIndex` - maps a name of output with its index among all network outputs.
-
-### `ExecutableNetwork` Constructor with `ICNNNetwork`
-
-This constructor accepts a generic representation of a neural network as an InferenceEngine::ICNNNetwork reference and is compiled into a backend specific device graph:
-
-@snippet src/compiled_model.cpp executable_network:ctor_cnnnetwork
-
-The implementation `CompileNetwork` is fully device-specific.
-
-### `CompileNetwork()`
-
-The function accepts a const shared pointer to `ngraph::Function` object and performs the following steps:
-
-1. Applies nGraph passes using `TransformNetwork` function, which defines plugin-specific conversion pipeline. To support low precision inference, the pipeline can include Low Precision Transformations. These transformations are usually hardware specific. You can find how to use and configure Low Precisions Transformations in [Low Precision Transformations](@ref openvino_docs_OV_UG_lpt) guide.
-2. Maps the transformed graph to a backend specific graph representation (for example, to CPU plugin internal graph representation).
-3. Allocates and fills memory for graph weights, backend specific memory handles and so on.
-
-@snippet src/compiled_model.cpp executable_network:map_graph
-
-> **NOTE**: After all these steps, the backend specific graph is ready to create inference requests and perform inference.
-
-### `ExecutableNetwork` Constructor Importing from Stream
-
-This constructor creates a backend specific graph by importing from a stream object:
-
-> **NOTE**: The export of backend specific graph is done in the `Export` method, and data formats must be the same for both import and export.
-
-### `Export()`
-
-The implementation of the method should write all data to the `model` stream, which is required to import a backend specific graph later in the `Plugin::Import` method:
-
-@snippet src/compiled_model.cpp executable_network:export
-
-### `CreateInferRequest()`
-
-The method creates an asynchronous inference request and returns it. While the public Inference Engine API has a single interface for inference request, which can be executed in synchronous and asynchronous modes, a plugin library implementation has two separate classes:
-
- [Synchronous inference request](@ref openvino_docs_ie_plugin_dg_infer_request), which defines pipeline stages and runs them synchronously in the `Infer` method.
- [Asynchronous inference request](@ref openvino_docs_ie_plugin_dg_async_infer_request), which is a wrapper for a synchronous inference request and can run a pipeline asynchronously. Depending on a device pipeline structure, it can has one or several stages:
-   - For single-stage pipelines, there is no need to define this method and create a class derived from InferenceEngine::AsyncInferRequestThreadSafeDefault. For single stage pipelines, a default implementation of this method creates InferenceEngine::AsyncInferRequestThreadSafeDefault wrapping a synchronous inference request and runs it asynchronously in the `_taskExecutor` executor.
-   - For pipelines with multiple stages, such as performing some preprocessing on host, uploading input data to a device, running inference on a device, or downloading and postprocessing output data, schedule stages on several task executors to achieve better device use and performance. You can do it by creating a sufficient number of inference requests running in parallel. In this case, device stages of different inference requests are overlapped with preprocessing and postprocessing stage giving better performance.
-   > **IMPORTANT**: It is up to you to decide how many task executors you need to optimally execute a device pipeline.
-
-@snippet src/compiled_model.cpp executable_network:create_infer_request
-
-### `GetMetric()`
-
-Returns a metric value for a metric with the name `name`.  A metric is a static type of information about an executable network. Examples of metrics:
-
- EXEC_NETWORK_METRIC_KEY(NETWORK_NAME) - name of an executable network
- EXEC_NETWORK_METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS) - heuristic to denote an optimal (or at least sub-optimal) number of inference requests needed to run asynchronously to use the current device fully
- Any other executable network metric specific for a particular device. Such metrics and possible values must be declared in a plugin configuration public header, for example, `template/config.hpp`
-
-The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
-
-### `GetConfig()`
-
-Returns a current value for a configuration key with the name `name`. The method extracts configuration values an executable network is compiled with.
-
-@snippet src/compiled_model.cpp executable_network:get_config
-
-This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the [Compile tool](@ref openvino_inference_engine_tools_compile_tool_README).
-
-The next step in plugin library implementation is the [Synchronous Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) class.
--- a/docs/IE_PLUGIN_DG/InferRequest.md
+++ b/docs/IE_PLUGIN_DG/InferRequest.md
@ -1,83 +1,84 @@
-# Synchronous Inference Request {#openvino_docs_ie_plugin_dg_infer_request}
+# Synchronous Inference Request {#openvino_docs_ov_plugin_dg_infer_request}

 `InferRequest` class functionality:
- Allocate input and output blobs needed for a backend-dependent network inference.
+- Allocate input and output tensors needed for a backend-dependent network inference.
 - Define functions for inference process stages (for example, `preprocess`, `upload`, `infer`, `download`, `postprocess`). These functions can later be used to define an execution pipeline during [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) implementation.
 - Call inference stages one by one synchronously.

-`InferRequest` Class
+InferRequest Class
 ------------------------

-Inference Engine Plugin API provides the helper InferenceEngine::IInferRequestInternal class recommended 
-to use as a base class for a synchronous inference request implementation. Based of that, a declaration 
+OpenVINO Plugin API provides the interface ov::ISyncInferRequest which should be 
+used as a base class for a synchronous inference request implementation. Based of that, a declaration 
 of a synchronous request class can look as follows: 

@snippet src/sync_infer_request.hpp infer_request:header

-#### Class Fields
+### Class Fields

 The example class has several fields:

- `_executableNetwork` - reference to an executable network instance. From this reference, an inference request instance can take a task executor, use counter for a number of created inference requests, and so on.
- `_profilingTask` - array of the `std::array<InferenceEngine::ProfilingTask, numOfStages>` type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
- `_durations` - array of durations of each pipeline stage.
- `_networkInputBlobs` - input blob map.
- `_networkOutputBlobs` - output blob map.
- `_parameters` - `ngraph::Function` parameter operations.
- `_results` - `ngraph::Function` result operations.
+- `m_profiling_task` - array of the `std::array<openvino::itt::handle_t, numOfStages>` type. Defines names for pipeline stages. Used to profile an inference pipeline execution with the Intel® instrumentation and tracing technology (ITT).
+- `m_durations` - array of durations of each pipeline stage.
 - backend specific fields:
-	- `_inputTensors` - inputs tensors which wrap `_networkInputBlobs` blobs. They are used as inputs to backend `_executable` computational graph.
-	- `_outputTensors` - output tensors which wrap `_networkOutputBlobs` blobs. They are used as outputs from backend `_executable` computational graph.
-	- `_executable` - an executable object / backend computational graph.
+    - `m_backend_input_tensors` - input backend tensors.
+    - `m_backend_output_tensors` - output backend tensors.
+	- `m_executable` - an executable object / backend computational graph.

-### `InferRequest` Constructor
+### InferRequest Constructor

-The constructor initializes helper fields and calls methods which allocate blobs:
+The constructor initializes helper fields and calls methods which allocate tensors:

@snippet src/sync_infer_request.cpp infer_request:ctor

-> **NOTE**: Call InferenceEngine::CNNNetwork::getInputsInfo and InferenceEngine::CNNNetwork::getOutputsInfo to specify both layout and precision of blobs, which you can set with InferenceEngine::InferRequest::SetBlob and get with InferenceEngine::InferRequest::GetBlob. A plugin uses these hints to determine its internal layouts and precisions for input and output blobs if needed. 
+> **NOTE**: Use inputs/outputs information from the compiled model to understand shape and element type of tensors, which you can set with ov::InferRequest::set_tensor and get with ov::InferRequest::get_tensor. A plugin uses these hints to determine its internal layouts and element types for input and output tensors if needed. 

-### `~InferRequest` Destructor
+### ~InferRequest Destructor

-Decrements a number of created inference requests: 
+Destructor can contain plugin specific logic to finish and destroy infer request.

@snippet src/sync_infer_request.cpp infer_request:dtor

-### `InferImpl()`
+### set_tensors_impl()

-**Implementation details:** Base IInferRequestInternal class implements the public InferenceEngine::IInferRequestInternal::Infer method as following:
- Checks blobs set by users
- Calls the `InferImpl` method defined in a derived class to call actual pipeline stages synchronously
+The method allows to set batched tensors in case if the plugin supports it.

-@snippet src/sync_infer_request.cpp infer_request:infer_impl
+@snippet src/sync_infer_request.cpp infer_request:set_tensors_impl

-#### 1. `inferPreprocess`
+### query_state()

-Below is the code of the `inferPreprocess` method to demonstrate Inference Engine common preprocessing step handling:
+The method returns variable states from the model.
+
+@snippet src/sync_infer_request.cpp infer_request:query_state
+
+### infer()
+
+The method calls actual pipeline stages synchronously. Inside the method plugin should check input/output tensors, move external tensors to backend and run the inference.
+
+@snippet src/sync_infer_request.cpp infer_request:infer
+
+#### 1. infer_preprocess()
+
+Below is the code of the `infer_preprocess()` method. The method checks user input/output tensors and demonstrates conversion from user tensor to backend specific representation:

@snippet src/sync_infer_request.cpp infer_request:infer_preprocess

-**Details:**
-* `InferImpl` must call the InferenceEngine::IInferRequestInternal::execDataPreprocessing function, which executes common Inference Engine preprocessing step (for example, applies resize or color conversion operations) if it is set by the user. The output dimensions, layout and precision matches the input information set via InferenceEngine::CNNNetwork::getInputsInfo.
-* If `inputBlob` passed by user differs in terms of precisions from precision expected by plugin, `blobCopy` is performed which does actual precision conversion.
+#### 2. start_pipeline()

-#### 2. `startPipeline`
-
-Executes a pipeline synchronously using `_executable` object:
+Executes a pipeline synchronously using `m_executable` object:

@snippet src/sync_infer_request.cpp infer_request:start_pipeline

-#### 3. `inferPostprocess`
+#### 3. infer_postprocess()

-Converts output blobs if precisions of backend output blobs and blobs passed by user are different:
+Converts backend specific tensors to tensors passed by user:

@snippet src/sync_infer_request.cpp infer_request:infer_postprocess

-### `GetPerformanceCounts()`
+### get_profiling_info()

-The method sets performance counters which were measured during pipeline stages execution:
+The method returns the profiling info which was measured during pipeline stages execution:

-@snippet src/sync_infer_request.cpp infer_request:get_performance_counts
+@snippet src/sync_infer_request.cpp infer_request:get_profiling_info

 The next step in the plugin library implementation is the [Asynchronous Inference Request](@ref openvino_docs_ie_plugin_dg_async_infer_request) class.
--- a/docs/IE_PLUGIN_DG/Intro.md
+++ b/docs/IE_PLUGIN_DG/Intro.md
@ -7,12 +7,12 @@
   :caption: Converting and Preparing Models
   :hidden:

-   Implement Plugin Functionality <openvino_docs_ie_plugin_dg_plugin>
-   Implement Executable Network Functionality <openvino_docs_ie_plugin_dg_executable_network>
-   Implement Synchronous Inference Request <openvino_docs_ie_plugin_dg_infer_request>
+   Implement Plugin Functionality <openvino_docs_ov_plugin_dg_plugin>
+   Implement Compiled Model Functionality <openvino_docs_ov_plugin_dg_compiled_model>
+   Implement Synchronous Inference Request <openvino_docs_ov_plugin_dg_infer_request>
   Implement Asynchronous Inference Request <openvino_docs_ie_plugin_dg_async_infer_request>
-   openvino_docs_ie_plugin_dg_plugin_build
-   openvino_docs_ie_plugin_dg_plugin_testing
+   openvino_docs_ov_plugin_dg_plugin_build
+   openvino_docs_ov_plugin_dg_plugin_testing
   openvino_docs_ie_plugin_detailed_guides
   openvino_docs_ie_plugin_api_references

@ -27,23 +27,23 @@ OpenVINO Plugin Library

 OpenVINO plugin dynamic library consists of several main components:

-1. [Plugin class](@ref openvino_docs_ie_plugin_dg_plugin):
+1. [Plugin class](@ref openvino_docs_ov_plugin_dg_plugin):
 	- Provides information about devices of a specific type.
-	- Can create an [compiled model](@ref openvino_docs_ie_plugin_dg_executable_network) instance which represents a Neural 
+	- Can create an [compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) instance which represents a Neural 
 	Network backend specific graph structure for a particular device in opposite to the ov::Model 
 	which is backend-independent.
 	- Can import an already compiled graph structure from an input stream to an 
-	[compiled model](@ref openvino_docs_ie_plugin_dg_executable_network) object.
-2. [Compiled Modek class](@ref openvino_docs_ie_plugin_dg_executable_network):
+	[compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) object.
+2. [Compiled Modek class](@ref openvino_docs_ov_plugin_dg_compiled_model):
 	- Is an execution configuration compiled for a particular device and takes into account its capabilities.
 	- Holds a reference to a particular device and a task executor for this device.
-	- Can create several instances of [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request).
+	- Can create several instances of [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request).
 	- Can export an internal backend specific graph structure to an output stream.
-3. [Inference Request class](@ref openvino_docs_ie_plugin_dg_infer_request):
+3. [Inference Request class](@ref openvino_docs_ov_plugin_dg_infer_request):
    - Runs an inference pipeline serially.
    - Can extract performance counters for an inference pipeline execution profiling.
 4. [Asynchronous Inference Request class](@ref openvino_docs_ie_plugin_dg_async_infer_request):
-    - Wraps the [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) class and runs pipeline stages in parallel 
+    - Wraps the [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) class and runs pipeline stages in parallel 
 	on several task executors based on a device-specific pipeline structure.

 > **NOTE**: This documentation is written based on the `Template` plugin, which demonstrates plugin 
@ -55,8 +55,8 @@ at `<openvino source dir>/src/plugins/template`.
 Detailed guides
 -----------------------

-* [Build](@ref openvino_docs_ie_plugin_dg_plugin_build) a plugin library using CMake
-* Plugin and its components [testing](@ref openvino_docs_ie_plugin_dg_plugin_testing)
+* [Build](@ref openvino_docs_ov_plugin_dg_plugin_build) a plugin library using CMake
+* Plugin and its components [testing](@ref openvino_docs_ov_plugin_dg_plugin_testing)
 * [Quantized networks](@ref openvino_docs_ie_plugin_dg_quantized_networks)
 * [Low precision transformations](@ref openvino_docs_OV_UG_lpt) guide
 * [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide
--- a/docs/IE_PLUGIN_DG/Plugin.md
+++ b/docs/IE_PLUGIN_DG/Plugin.md
@ -1,4 +1,4 @@
-# Plugin {#openvino_docs_ie_plugin_dg_plugin}
+# Plugin {#openvino_docs_ov_plugin_dg_plugin}

 OpenVINO Plugin usually represents a wrapper around a backend. Backends can be:
 - OpenCL-like backend (e.g. clDNN library) for GPU devices.
@ -8,7 +8,7 @@ OpenVINO Plugin usually represents a wrapper around a backend. Backends can be:
 The responsibility of OpenVINO Plugin:
 - Initializes a backend and throw exception in `Engine` constructor if backend cannot be initialized.
 - Provides information about devices enabled by a particular backend, e.g. how many devices, their properties and so on.
- Loads or imports [compiled model](@ref openvino_docs_ie_plugin_dg_executable_network) objects.
+- Loads or imports [compiled model](@ref openvino_docs_ov_plugin_dg_compiled_model) objects.

 In addition to the OpenVINO Public API, the OpenVINO provides the Plugin API, which is a set of functions and helper classes that simplify new plugin development:

@ -16,7 +16,7 @@ In addition to the OpenVINO Public API, the OpenVINO provides the Plugin API, wh
 - implementations in the `src/inference/src/dev/` directory
 - symbols in the OpenVINO shared library

-To build an OpenVINO plugin with the Plugin API, see the [OpenVINO Plugin Building](@ref openvino_docs_ie_plugin_dg_plugin_build) guide.
+To build an OpenVINO plugin with the Plugin API, see the [OpenVINO Plugin Building](@ref openvino_docs_ov_plugin_dg_plugin_build) guide.

 Plugin Class
 ------------------------
@ -39,7 +39,7 @@ The provided plugin class also has several fields:
 As an example, a plugin configuration has three value parameters:

 - `device_id` - particular device ID to work with. Applicable if a plugin supports more than one `Template` device. In this case, some plugin methods, like `set_property`, `query_model`, and `compile_model`, must support the ov::device::id property. 
- `perf_counts` - boolean value to identify whether to collect performance counters during [Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) execution.
+- `perf_counts` - boolean value to identify whether to collect performance counters during [Inference Request](@ref openvino_docs_ov_plugin_dg_infer_request) execution.
 - `streams_executor_config` - configuration of `ov::threading::IStreamsExecutor` to handle settings of multi-threaded context.
 - `performance_mode` - configuration of `ov::hint::PerformanceMode` to set the performance mode.

@ -75,7 +75,7 @@ which holds a backend-dependent compiled model in an internal representation:
 Before a creation of an `CompiledModel` instance via a constructor, a plugin may check if a provided 
 ov::Model object is supported by a device if it is needed.

-Actual model compilation is done in the `CompiledModel` constructor. Refer to the [CompiledModel Implementation Guide](@ref openvino_docs_ie_plugin_dg_executable_network) for details.
+Actual model compilation is done in the `CompiledModel` constructor. Refer to the [CompiledModel Implementation Guide](@ref openvino_docs_ov_plugin_dg_compiled_model) for details.

 > **NOTE**: Actual configuration map used in `CompiledModel` is constructed as a base plugin 
 > configuration set via `Plugin::set_property`, where some values are overwritten with `config` passed to `Plugin::compile_model`. 
@ -130,7 +130,7 @@ key value to the ov::Any and returns it.
 ### import_model()

 The importing of compiled model mechanism allows to import a previously exported backend specific model and wrap it 
-using an [CompiledModel](@ref openvino_docs_ie_plugin_dg_executable_network) object. This functionality is useful if 
+using an [CompiledModel](@ref openvino_docs_ov_plugin_dg_compiled_model) object. This functionality is useful if 
 backend specific model compilation takes significant time and/or cannot be done on a target host 
 device due to other reasons.

@ -167,4 +167,4 @@ OpenVINO plugin library must export only one function creating a plugin instance

@snippet template/src/plugin.cpp plugin:create_plugin_engine

-Next step in a plugin library implementation is the [CompiledModel](@ref openvino_docs_ie_plugin_dg_executable_network) class.
+Next step in a plugin library implementation is the [CompiledModel](@ref openvino_docs_ov_plugin_dg_compiled_model) class.
--- a/docs/IE_PLUGIN_DG/PluginTesting.md
+++ b/docs/IE_PLUGIN_DG/PluginTesting.md
@ -1,10 +1,10 @@
-# Plugin Testing {#openvino_docs_ie_plugin_dg_plugin_testing}
+# Plugin Testing {#openvino_docs_ov_plugin_dg_plugin_testing}

 OpenVINO tests infrastructure provides a predefined set of functional tests and utilities. They are used to verify a plugin using the OpenVINO public API.
 All the tests are written in the [Google Test C++ framework](https://github.com/google/googletest).

 OpenVINO Plugin tests are included in the `openvino::funcSharedTests` CMake target which is built within the OpenVINO repository
-(see [Build Plugin Using CMake](@ref openvino_docs_ie_plugin_dg_plugin_build) guide). This library contains tests definitions (the tests bodies) which can be parametrized and instantiated in plugins depending on whether a plugin supports a particular feature, specific sets of parameters for test on supported operation set and so on.
+(see [Build Plugin Using CMake](@ref openvino_docs_ov_plugin_dg_plugin_build) guide). This library contains tests definitions (the tests bodies) which can be parametrized and instantiated in plugins depending on whether a plugin supports a particular feature, specific sets of parameters for test on supported operation set and so on.

 Test definitions are split into tests class declaration (see `src/tests/functional/plugin/shared/include`) and tests class implementation (see `src/tests/functional/plugin/shared/src`) and include the following scopes of plugin conformance tests:

@ -35,7 +35,7 @@ To use these tests for your own plugin development, link the `openvino::funcShar
 > **NOTE**: A plugin may contain its own tests for use cases that are specific to hardware or need to be extensively tested.

 To build test binaries together with other build artifacts, use the `make all` command. For details, see
-[Build Plugin Using CMake*](@ref openvino_docs_ie_plugin_dg_plugin_build).
+[Build Plugin Using CMake*](@ref openvino_docs_ov_plugin_dg_plugin_build).

 ### How to Extend OpenVINO Plugin Tests

--- a/docs/OV_Runtime_UG/Model_caching_overview.md
+++ b/docs/OV_Runtime_UG/Model_caching_overview.md
@ -1,16 +1,18 @@
 # Model Caching Overview {#openvino_docs_OV_UG_Model_caching_overview}

-As described in the [Integrate OpenVINO™ with Your Application](integrate_with_your_application.md), a common application flow consists of the following steps:
+@sphinxdirective
+
+As described in the :doc:`Integrate OpenVINO™ with Your Application <openvino_docs_OV_UG_Integrate_OV_with_your_application>`, a common application flow consists of the following steps:

 1. **Create a Core object**: First step to manage available devices and read model objects

-2. **Read the Intermediate Representation**: Read an Intermediate Representation file into an object of the `ov::Model`
+2. **Read the Intermediate Representation**: Read an Intermediate Representation file into an object of the `ov::Model <classov_1_1Model.html#doxid-classov-1-1-model>`__

 3. **Prepare inputs and outputs**: If needed, manipulate precision, memory layout, size or color format

 4. **Set configuration**: Pass device-specific loading configurations to the device

-5. **Compile and Load Network to device**: Use the `ov::Core::compile_model()` method with a specific device
+5. **Compile and Load Network to device**: Use the `ov::Core::compile_model() <classov_1_1Core.html#doxid-classov-1-1-core-1a46555f0803e8c29524626be08e7f5c5a>`__ method with a specific device

 6. **Set input data**: Specify input tensor

@ -18,14 +20,14 @@ As described in the [Integrate OpenVINO™ with Your Application](integrate_with

 Step 5 can potentially perform several time-consuming device-specific optimizations and network compilations,
 and such delays can lead to a bad user experience on application startup. To avoid this, some devices offer
-import/export network capability, and it is possible to either use the [Compile tool](../../tools/compile_tool/README.md)
+import/export network capability, and it is possible to either use the :doc:`Compile tool <openvino_inference_engine_tools_compile_tool_README>`
 or enable model caching to export compiled model automatically. Reusing cached model can significantly reduce compile model time.

-### Set "cache_dir" config option to enable model caching
+Set "cache_dir" config option to enable model caching
+++++++++++++++++++++++++++++++++++++++++++++++++++++

 To enable model caching, the application must specify a folder to store cached blobs, which is done like this:

-@sphinxdirective

 .. tab:: C++

@ -39,23 +41,24 @@ To enable model caching, the application must specify a folder to store cached b
         :language: python
         :fragment: [ov:caching:part0]

-@endsphinxdirective

-With this code, if the device specified by `device_name` supports import/export model capability, a cached blob is automatically created inside the `/path/to/cache/dir` folder.
+With this code, if the device specified by ``device_name`` supports import/export model capability, a cached blob is automatically created inside the ``/path/to/cache/dir`` folder.
 If the device does not support import/export capability, cache is not created and no error is thrown.

 Depending on your device, total time for compiling model on application startup can be significantly reduced.
-Also note that the very first `compile_model` (when cache is not yet created) takes slightly longer time to "export" the compiled blob into a cache file:
+Also note that the very first ``compile_model`` (when cache is not yet created) takes slightly longer time to "export" the compiled blob into a cache file:

-![](../img/caching_enabled.svg)

-### Even faster: use compile_model(modelPath)
+.. image:: _static/images/caching_enabled.svg
+
+
+Even faster: use compile_model(modelPath)
+++++++++++++++++++++++++++++++++++++++++

 In some cases, applications do not need to customize inputs and outputs every time. Such application always
-call `model = core.read_model(...)`, then `core.compile_model(model, ..)` and it can be further optimized.
+call ``model = core.read_model(...)``, then ``core.compile_model(model, ..)`` and it can be further optimized.
 For these cases, there is a more convenient API to compile the model in a single call, skipping the read step:

-@sphinxdirective

 .. tab:: C++

@ -69,11 +72,9 @@ For these cases, there is a more convenient API to compile the model in a single
         :language: python
         :fragment: [ov:caching:part1]

-@endsphinxdirective

-With model caching enabled, total load time is even smaller, if `read_model` is optimized as well.
+With model caching enabled, total load time is even smaller, if ``read_model`` is optimized as well.

-@sphinxdirective

 .. tab:: C++

@ -87,16 +88,15 @@ With model caching enabled, total load time is even smaller, if `read_model` is
         :language: python
         :fragment: [ov:caching:part2]

-@endsphinxdirective

-![](../img/caching_times.svg)
+.. image:: _static/images/caching_times.svg

-### Advanced Examples
+Advanced Examples
++++++++++++++++++++

 Not every device supports network import/export capability. For those that don't, enabling caching has no effect.
 To check in advance if a particular device supports model caching, your application can use the following code:

-@sphinxdirective

 .. tab:: C++

@ -110,8 +110,9 @@ To check in advance if a particular device supports model caching, your applicat
         :language: python
         :fragment: [ov:caching:part3]

-@endsphinxdirective

-> **NOTE**: For GPU, model caching is currently implemented as a preview feature. Before it is fully supported, kernel caching can be used in the same manner: by setting the CACHE_DIR configuration key to a folder where the cache should be stored (see the [GPU plugin documentation](supported_plugins/GPU.md)).
-> To activate the preview feature of model caching, set the OV_GPU_CACHE_MODEL environment variable to 1.
- 
+.. note::
+
+   For GPU, model caching is currently implemented as a preview feature. Before it is fully supported, kernel caching can be used in the same manner: by setting the CACHE_DIR configuration key to a folder where the cache should be stored (see the :doc:`GPU plugin documentation <openvino_docs_OV_UG_supported_plugins_GPU>`). To activate the preview feature of model caching, set the OV_GPU_CACHE_MODEL environment variable to 1.
+
+@endsphinxdirective
--- a/docs/OV_Runtime_UG/Python_API_exclusives.md
+++ b/docs/OV_Runtime_UG/Python_API_exclusives.md
@ -2,111 +2,179 @@

 OpenVINO™ Runtime Python API offers additional features and helpers to enhance user experience. The main goal of Python API is to provide user-friendly and simple yet powerful tool for Python users.

-## Easier Model Compilation 
+Easier Model Compilation
+########################

-`CompiledModel` can be easily created with the helper method. It hides the creation of `Core` and applies `AUTO` inference mode by default.
+``CompiledModel`` can be easily created with the helper method. It hides the creation of ``Core`` and applies ``AUTO`` inference mode by default.

-@snippet docs/snippets/ov_python_exclusives.py auto_compilation

-## Model/CompiledModel Inputs and Outputs
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [auto_compilation]

-Besides functions aligned to C++ API, some of them have their Python counterparts or extensions. For example, `Model` and `CompiledModel` inputs/outputs can be accessed via properties.

-@snippet docs/snippets/ov_python_exclusives.py properties_example
+Model/CompiledModel Inputs and Outputs
+######################################
+
+Besides functions aligned to C++ API, some of them have their Python counterparts or extensions. For example, ``Model`` and ``CompiledModel`` inputs/outputs can be accessed via properties.
+
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [properties_example]
+

 Refer to Python API documentation on which helper functions or properties are available for different classes.

-## Working with Tensor
+Working with Tensor
+####################

-Python API allows passing data as tensors. The `Tensor` object holds a copy of the data from the given array. The `dtype` of *numpy* arrays is converted to OpenVINO™ types automatically.
+Python API allows passing data as tensors. The ``Tensor`` object holds a copy of the data from the given array. The ``dtype`` of *numpy* arrays is converted to OpenVINO™ types automatically.

-@snippet docs/snippets/ov_python_exclusives.py tensor_basics

-### Shared Memory Mode
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [tensor_basics]

-`Tensor` objects can share the memory with *numpy* arrays. By specifying the `shared_memory` argument, the `Tensor` object does not copy data. Instead, it has access to the memory of the *numpy* array.

-@snippet docs/snippets/ov_python_exclusives.py tensor_shared_mode
+Shared Memory Mode
++++++++++++++++++

-## Running Inference
+``Tensor`` objects can share the memory with *numpy* arrays. By specifying the ``shared_memory`` argument, the ``Tensor`` object does not copy data. Instead, it has access to the memory of the *numpy* array.
+
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [tensor_shared_mode]
+
+
+Running Inference
+####################

 Python API supports extra calling methods to synchronous and asynchronous modes for inference.

 All infer methods allow users to pass data as popular *numpy* arrays, gathered in either Python dicts or lists.

-@snippet docs/snippets/ov_python_exclusives.py passing_numpy_array
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [passing_numpy_array]
+

 Results from inference can be obtained in various ways:

-@snippet docs/snippets/ov_python_exclusives.py getting_results

-### Synchronous Mode - Extended
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [getting_results]
+
+
+Synchronous Mode - Extended
+++++++++++++++++++++++++++

 Python API provides different synchronous calls to infer model, which block the application execution. Additionally, these calls return results of inference:

-@snippet docs/snippets/ov_python_exclusives.py sync_infer

-### AsyncInferQueue
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [sync_infer]

-Asynchronous mode pipelines can be supported with a wrapper class called `AsyncInferQueue`. This class automatically spawns the pool of `InferRequest` objects (also called "jobs") and provides synchronization mechanisms to control the flow of the pipeline.

-Each job is distinguishable by a unique `id`, which is in the range from 0 up to the number of jobs specified in the `AsyncInferQueue` constructor.
+AsyncInferQueue
++++++++++++++++++++

-The `start_async` function call is not required to be synchronized - it waits for any available job if the queue is busy/overloaded. Every `AsyncInferQueue` code block should end with the `wait_all` function which provides the "global" synchronization of all jobs in the pool and ensure that access to them is safe.
+Asynchronous mode pipelines can be supported with a wrapper class called ``AsyncInferQueue``. This class automatically spawns the pool of ``InferRequest`` objects (also called "jobs") and provides synchronization mechanisms to control the flow of the pipeline.

-@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue
+Each job is distinguishable by a unique ``id``, which is in the range from 0 up to the number of jobs specified in the ``AsyncInferQueue`` constructor.

-#### Acquiring Results from Requests
+The ``start_async`` function call is not required to be synchronized - it waits for any available job if the queue is busy/overloaded. Every ``AsyncInferQueue`` code block should end with the ``wait_all`` function which provides the "global" synchronization of all jobs in the pool and ensure that access to them is safe.

-After the call to `wait_all`, jobs and their data can be safely accessed. Acquiring a specific job with `[id]` will return the `InferRequest` object, which will result in seamless retrieval of the output data.

-@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue_access
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [asyncinferqueue]

-#### Setting Callbacks

-Another feature of `AsyncInferQueue` is the ability to set callbacks. When callback is set, any job that ends inference calls upon the Python function. The callback function must have two arguments: one is the request that calls the callback, which provides the `InferRequest` API; the other is called "userdata", which provides the possibility of passing runtime values. Those values can be of any Python type and later used within the callback function.
+Acquiring Results from Requests
+-------------------------------

-The callback of `AsyncInferQueue` is uniform for every job. When executed, GIL is acquired to ensure safety of data manipulation inside the function.
+After the call to ``wait_all``, jobs and their data can be safely accessed. Acquiring a specific job with ``[id]`` will return the ``InferRequest`` object, which will result in seamless retrieval of the output data.

-@snippet docs/snippets/ov_python_exclusives.py asyncinferqueue_set_callback

-### Working with u1, u4 and i4 Element Types
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [asyncinferqueue_access]
+
+
+Setting Callbacks
+--------------------
+
+Another feature of ``AsyncInferQueue`` is the ability to set callbacks. When callback is set, any job that ends inference calls upon the Python function. The callback function must have two arguments: one is the request that calls the callback, which provides the ``InferRequest`` API; the other is called "userdata", which provides the possibility of passing runtime values. Those values can be of any Python type and later used within the callback function.
+
+The callback of ``AsyncInferQueue`` is uniform for every job. When executed, GIL is acquired to ensure safety of data manipulation inside the function.
+
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [asyncinferqueue_set_callback]
+
+
+Working with u1, u4 and i4 Element Types
++++++++++++++++++++++++++++++++++++++++

 Since OpenVINO™ supports low precision element types, there are a few ways to handle them in Python.
 To create an input tensor with such element types, you may need to pack your data in the new *numpy* array, with which the byte size matches the original input size:
-@snippet docs/snippets/ov_python_exclusives.py packing_data
+
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [packing_data]
+

 To extract low precision values from a tensor into the *numpy* array, you can use the following helper:
-@snippet docs/snippets/ov_python_exclusives.py unpacking

-### Release of GIL 
+
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [unpacking]
+
+
+Release of GIL
++++++++++++++++++++

 Some functions in Python API release the Global Lock Interpreter (GIL) while running work-intensive code. This can help you achieve more parallelism in your application, using Python threads. For more information about GIL, refer to the Python documentation.

-@snippet docs/snippets/ov_python_exclusives.py releasing_gil

-> **NOTE**: While GIL is released, functions can still modify and/or operate on Python objects in C++. Hence, there is no reference counting. You should pay attention to thread safety in case sharing of these objects with another thread occurs. It might affect code only if multiple threads are spawned in Python.
+.. doxygensnippet:: docs/snippets/ov_python_exclusives.py
+   :language: cpp
+   :fragment: [releasing_gil]

-#### List of Functions that Release the GIL
- openvino.runtime.AsyncInferQueue.start_async
- openvino.runtime.AsyncInferQueue.is_ready
- openvino.runtime.AsyncInferQueue.wait_all
- openvino.runtime.AsyncInferQueue.get_idle_request_id
- openvino.runtime.CompiledModel.create_infer_request
- openvino.runtime.CompiledModel.infer_new_request
- openvino.runtime.CompiledModel.__call__
- openvino.runtime.CompiledModel.export
- openvino.runtime.CompiledModel.get_runtime_model
- openvino.runtime.Core.compile_model
- openvino.runtime.Core.read_model
- openvino.runtime.Core.import_model
- openvino.runtime.Core.query_model
- openvino.runtime.Core.get_available_devices
- openvino.runtime.InferRequest.infer
- openvino.runtime.InferRequest.start_async
- openvino.runtime.InferRequest.wait
- openvino.runtime.InferRequest.wait_for
- openvino.runtime.InferRequest.get_profiling_info
- openvino.runtime.InferRequest.query_state
- openvino.runtime.Model.reshape
- openvino.preprocess.PrePostProcessor.build
+
+.. note:: While GIL is released, functions can still modify and/or operate on Python objects in C++. Hence, there is no reference counting. You should pay attention to thread safety in case sharing of these objects with another thread occurs. It might affect code only if multiple threads are spawned in Python.
+
+
+List of Functions that Release the GIL
+--------------------------------------
+
+* openvino.runtime.AsyncInferQueue.start_async
+* openvino.runtime.AsyncInferQueue.is_ready
+* openvino.runtime.AsyncInferQueue.wait_all
+* openvino.runtime.AsyncInferQueue.get_idle_request_id
+* openvino.runtime.CompiledModel.create_infer_request
+* openvino.runtime.CompiledModel.infer_new_request
+* openvino.runtime.CompiledModel.__call__
+* openvino.runtime.CompiledModel.export
+* openvino.runtime.CompiledModel.get_runtime_model
+* openvino.runtime.Core.compile_model
+* openvino.runtime.Core.read_model
+* openvino.runtime.Core.import_model
+* openvino.runtime.Core.query_model
+* openvino.runtime.Core.get_available_devices
+* openvino.runtime.InferRequest.infer
+* openvino.runtime.InferRequest.start_async
+* openvino.runtime.InferRequest.wait
+* openvino.runtime.InferRequest.wait_for
+* openvino.runtime.InferRequest.get_profiling_info
+* openvino.runtime.InferRequest.query_state
+* openvino.runtime.Model.reshape
+* openvino.preprocess.PrePostProcessor.build
--- a/docs/OV_Runtime_UG/img/BASIC_FLOW_IE_C.svg
+++ b/docs/OV_Runtime_UG/img/BASIC_FLOW_IE_C.svg
@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:ccc7704d2a27f7491729767443f3d2bdd0ccc930f16fde631a7f9c67d158297a
-size 71369
--- a/docs/OV_Runtime_UG/lowlatency2.md
+++ b/docs/OV_Runtime_UG/lowlatency2.md
@ -90,7 +90,7 @@ Steps to Apply LowLatency2
      }


-4. Use state API. See the :ref:`OpenVINO state API <openvino-state-api>` and the `Example of stateful network inference <example-of-stateful-network-inference>` sections.
+4. Use state API. See the :ref:`OpenVINO state API <openvino-state-api>` and the `Example of stateful network inference <example-of-stateful-model-inference>` sections.

 Known Limitations
 ####################
--- a/docs/OV_Runtime_UG/lowlatency_deprecated.md
+++ b/docs/OV_Runtime_UG/lowlatency_deprecated.md
@ -72,7 +72,7 @@ Steps to Apply LowLatency
      }


-4. Use state API. See the :ref:`OpenVINO state API <openvino-state-api>` and the :ref:`Example of stateful network inference <example-of-stateful-network-inference>` sections.
+4. Use state API. See the :ref:`OpenVINO state API <openvino-state-api>` and the :ref:`Example of stateful network inference <example-of-stateful-model-inference>` sections.

 Known Limitations for the LowLatency
 ####################################
--- a/docs/OV_Runtime_UG/network_state_intro.md
+++ b/docs/OV_Runtime_UG/network_state_intro.md
@ -1,4 +1,4 @@
-# Stateful models {#openvino_docs_OV_UG_network_state_intro}
+# Stateful models {#openvino_docs_OV_UG_model_state_intro}

@sphinxdirective

@ -11,22 +11,22 @@


 Several use cases require processing of data sequences. When length of a sequence is known and small enough, 
-it can be processed with RNN like networks that contain a cycle inside. However, in some cases (e.g., online speech recognition of time series 
+it can be processed with RNN like models that contain a cycle inside. However, in some cases (e.g., online speech recognition of time series 
 forecasting) length of data sequence is unknown. Then, data can be divided in small portions and processed step-by-step. The dependency 
-between data portions should be addressed. For that, networks save some data between inferences - a state. When one dependent sequence is over,
+between data portions should be addressed. For that, models save some data between inferences - a state. When one dependent sequence is over,
 a state should be reset to initial value and a new sequence can be started.

-Several frameworks have special APIs for states in networks. For example, Keras has ``stateful`` - a special option for RNNs, that turns on saving a state between inferences. Kaldi contains special ``Offset`` specifier to define time offset in a network.
+Several frameworks have special APIs for states in model. For example, Keras has ``stateful`` - a special option for RNNs, that turns on saving a state between inferences. Kaldi contains special ``Offset`` specifier to define time offset in a model.

-OpenVINO also contains a special API to simplify work with networks with states. A state is automatically saved between inferences, 
+OpenVINO also contains a special API to simplify work with models with states. A state is automatically saved between inferences, 
 and there is a way to reset a state when needed. A state can also be read or set to some new value between inferences.

 OpenVINO State Representation
 #############################

-OpenVINO contains the ``Variable``, a special abstraction to represent a state in a network. There are two operations: :doc:`Assign <openvino_docs_ops_infrastructure_Assign_3>` - to save a value in a state and :doc:`ReadValue <openvino_docs_ops_infrastructure_ReadValue_3>` - to read a value saved on previous iteration.
+OpenVINO contains the ``Variable``, a special abstraction to represent a state in a model. There are two operations: :doc:`Assign <openvino_docs_ops_infrastructure_Assign_3>` - to save a value in a state and :doc:`ReadValue <openvino_docs_ops_infrastructure_ReadValue_3>` - to read a value saved on previous iteration.

-To get a model with states ready for inference, convert a model from another framework to OpenVINO IR with Model Optimizer or create an nGraph function. 
+To get a model with states ready for inference, convert a model from another framework to OpenVINO IR with Model Optimizer or create an OpenVINO model. 
 (For more information, refer to the :doc:`Build OpenVINO Model section <openvino_docs_OV_UG_Model_Representation>`). 

 Below is the graph in both forms:
@ -47,7 +47,7 @@ The ``bin`` file for this graph should contain ``float 0`` in binary form. The c
   .. code-block:: xml

      <?xml version="1.0" ?>
-      <net name="summator" version="10">
+      <net name="summator" version="11">
        <layers>
          <layer id="0" name="init_value" type="Const" version="opset6">
            <data element_type="f32" offset="0" shape="1,1" size="4"/>
@ -154,65 +154,44 @@ The ``bin`` file for this graph should contain ``float 0`` in binary form. The c
      </net>


-Example of Creating Model nGraph API
-++++++++++++++++++++++++++++++++++++
+Example of Creating Model OpenVINO API
++++++++++++++++++++++++++++++++++++++++

-In the following example, the ``SinkVector`` is used to create the `ngraph::Function <classngraph.html#doxid-classngraph-1a14d7fe7c605267b52c145579e12d2a5f>`__. For a network with states, except inputs and outputs, the ``Assign`` nodes should also point to the ``Function`` to avoid deleting it during graph transformations. Use the constructor to do it, as shown in the example, or with the special ``add_sinks(const SinkVector& sinks)`` method. After deleting the node from the graph with the ``delete_sink()`` method, a sink can be deleted from ``ngraph::Function``.
-
-.. code-block:: cpp
-
-   #include <ngraph/opsets/opset6.hpp>
-   #include <ngraph/op/util/variable.hpp>
-   // ...
-
-   auto arg = make_shared<ngraph::opset6::Parameter>(element::f32, Shape{1, 1});
-   auto init_const = ngraph::opset6::Constant::create(element::f32, Shape{1, 1}, {0});
-
-   // The ReadValue/Assign operations must be used in pairs in the network.
-   // For each such a pair, its own variable object must be created.
-   const std::string variable_name("variable0");
-   auto variable = std::make_shared<ngraph::Variable>(VariableInfo{PartialShape::dynamic(), element::dynamic, variable_name});
-
-   // Creating ngraph::function
-   auto read = make_shared<ngraph::opset6::ReadValue>(init_const, variable);
-   std::vector<shared_ptr<ngraph::Node>> args = {arg, read};
-   auto add = make_shared<ngraph::opset6::Add>(arg, read);
-   auto assign = make_shared<ngraph::opset6::Assign>(add, variable);
-   auto add2 = make_shared<ngraph::opset6::Add>(add, read);
-   auto res = make_shared<ngraph::opset6::Result>(add2);
-
-   auto f = make_shared<Function>(ResultVector({res}), ParameterVector({arg}), SinkVector({assign}));
+In the following example, the ``SinkVector`` is used to create the ``ov::Model``. For a model with states, except inputs and outputs, the ``Assign`` nodes should also point to the ``Model`` to avoid deleting it during graph transformations. Use the constructor to do it, as shown in the example, or with the special ``add_sinks(const SinkVector& sinks)`` method. After deleting the node from the graph with the ``delete_sink()`` method, a sink can be deleted from ``ov::Model``.

+.. doxygensnippet:: docs/snippets/ov_model_with_state_infer.cpp
+   :language: cpp
+   :fragment: [model_create]

 .. _openvino-state-api:

 OpenVINO State API
 ####################

-Inference Engine has the ``InferRequest::QueryState`` method to get the list of states from a network and ``IVariableState`` interface to operate with states. Below is a brief description of methods and the example of how to use this interface.
+OpenVINO has the ``InferRequest::query_state`` method to get the list of states from a model and ``ov::IVariableState`` interface to operate with states. Below is a brief description of methods and the example of how to use this interface.

-* ``std::string GetName() const`` - returns the name (variable_id) of a corresponding Variable.
-* ``void Reset()`` - resets a state to a default value.
-* ``void SetState(Blob::Ptr newState)`` - sets a new value for a state.
-* ``Blob::CPtr GetState() const`` - returns current value of state.
+* ``std::string get_name() const`` - returns the name (variable_id) of a corresponding Variable.
+* ``void reset()`` - resets a state to a default value.
+* ``void set_state(const ov::Tensor& state)`` - sets a new value for a state.
+* ``const ov::Tensor& get_state() const`` - returns current value of state.


-.. _example-of-stateful-network-inference:
+.. _example-of-stateful-model-inference:

-Example of Stateful Network Inference
+Example of Stateful Model Inference
 #####################################

 Based on the IR from the previous section, the example below demonstrates inference of two independent sequences of data. A state should be reset between these sequences.

-One infer request and one thread will be used in this example. Using several threads is possible if there are several independent sequences. Then, each sequence can be processed in its own infer request. Inference of one sequence in several infer requests is not recommended. In one infer request, a state will be saved automatically between inferences, but if the first step is done in one infer request and the second in another, a state should be set in a new infer request manually (using the ``IVariableState::SetState`` method).
+One infer request and one thread will be used in this example. Using several threads is possible if there are several independent sequences. Then, each sequence can be processed in its own infer request. Inference of one sequence in several infer requests is not recommended. In one infer request, a state will be saved automatically between inferences, but if the first step is done in one infer request and the second in another, a state should be set in a new infer request manually (using the ``ov::IVariableState::set_state`` method).


-.. doxygensnippet:: docs/snippets/InferenceEngine_network_with_state_infer.cpp
+.. doxygensnippet:: docs/snippets/ov_model_with_state_infer.cpp
   :language: cpp
   :fragment: [part1]


-For more elaborate examples demonstrating how to work with networks with states, 
+For more elaborate examples demonstrating how to work with models with states, 
 refer to the speech sample and a demo in the :doc:`Samples Overview <openvino_docs_OV_UG_Samples_Overview>`.

 LowLatency Transformations
--- a/docs/OV_Runtime_UG/openvino_intro.md
+++ b/docs/OV_Runtime_UG/openvino_intro.md
@ -13,23 +13,22 @@
   openvino_docs_OV_UG_Working_with_devices
   openvino_docs_OV_UG_ShapeInference
   openvino_docs_OV_UG_DynamicShapes
-   openvino_docs_OV_UG_network_state_intro
-   
-@endsphinxdirective
+   openvino_docs_OV_UG_model_state_intro
+

 OpenVINO Runtime is a set of C++ libraries with C and Python bindings providing a common API to deliver inference solutions on the platform of your choice. Use the OpenVINO Runtime API to read an Intermediate Representation (IR), TensorFlow, ONNX, or PaddlePaddle model and execute it on preferred devices.

 OpenVINO Runtime uses a plugin architecture. Its plugins are software components that contain complete implementation for inference on a particular Intel® hardware device: CPU, GPU, GNA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs for configuring devices or API interoperability between OpenVINO Runtime and underlying plugin backend.
- 
-The scheme below illustrates the typical workflow for deploying a trained deep learning model: 

-<!-- TODO: need to update the picture below with PDPD files -->
-![](img/BASIC_FLOW_IE_C.svg)
+The scheme below illustrates the typical workflow for deploying a trained deep learning model:


-## Video
+.. image:: _static/images/BASIC_FLOW_IE_C.svg
+
+
+Video
+####################

-@sphinxdirective

 .. list-table::

@ -39,5 +38,5 @@ The scheme below illustrates the typical workflow for deploying a trained deep l
           src="https://www.youtube.com/embed/e6R13V8nbak">
           </iframe>
   * - **OpenVINO Runtime Concept**. Duration: 3:43
-     
+
@endsphinxdirective
--- a/docs/OV_Runtime_UG/performance_hints.md
+++ b/docs/OV_Runtime_UG/performance_hints.md
@ -1,47 +1,56 @@
 # High-level Performance Hints {#openvino_docs_OV_UG_Performance_Hints}

-Even though all [supported devices](supported_plugins/Device_Plugins.md) in OpenVINO™ offer low-level performance settings, utilizing them is not recommended outside of very few cases. 
-The preferred way to configure performance in OpenVINO Runtime is using performance hints. This is a future-proof solution fully compatible with the [automatic device selection inference mode](./auto_device_selection.md) and designed with *portability* in mind. 
+@sphinxdirective
+
+Even though all :doc:`supported devices <openvino_docs_OV_UG_Working_with_devices>` in OpenVINO™ offer low-level performance settings, utilizing them is not recommended outside of very few cases. 
+The preferred way to configure performance in OpenVINO Runtime is using performance hints. This is a future-proof solution fully compatible with the :doc:`automatic device selection inference mode <openvino_docs_OV_UG_supported_plugins_AUTO>` and designed with *portability* in mind. 

 The hints also set the direction of the configuration in the right order. Instead of mapping the application needs to the low-level performance settings, and keeping an associated application logic to configure each possible device separately, the hints express a target scenario with a single config key and let the *device* configure itself in response.

-Previously, a certain level of automatic configuration was the result of the *default* values of the parameters. For example, the number of CPU streams was deduced from the number of CPU cores, when `ov::streams::AUTO` (`CPU_THROUGHPUT_AUTO` in the pre-API 2.0 terminology) was set. However, the resulting number of streams did not account for actual compute requirements of the model to be inferred.
+Previously, a certain level of automatic configuration was the result of the *default* values of the parameters. For example, the number of CPU streams was deduced from the number of CPU cores, when `ov::streams::AUTO <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gaddb29425af71fbb6ad3379c59342ff0e>`__ (``CPU_THROUGHPUT_AUTO`` in the pre-API 2.0 terminology) was set. However, the resulting number of streams did not account for actual compute requirements of the model to be inferred.
 The hints, in contrast, respect the actual model, so the parameters for optimal throughput are calculated for each model individually (based on its compute versus memory bandwidth requirements and capabilities of the device).

-## Performance Hints: Latency and Throughput
-As discussed in the [Optimization Guide](../optimization_guide/dldt_deployment_optimization_guide.md) there are a few different metrics associated with inference speed.
+Performance Hints: Latency and Throughput
+#########################################
+
+As discussed in the :doc:`Optimization Guide <openvino_docs_deployment_optimization_guide_dldt_optimization_guide>` there are a few different metrics associated with inference speed.
 Throughput and latency are some of the most widely used metrics that measure the overall performance of an application.

-Therefore, in order to ease the configuration of the device, OpenVINO offers two dedicated hints, namely `ov::hint::PerformanceMode::THROUGHPUT` and `ov::hint::PerformanceMode::LATENCY`.
-A special `ov::hint::PerformanceMode::UNDEFINED` hint acts the same as specifying no hint.
+Therefore, in order to ease the configuration of the device, OpenVINO offers two dedicated hints, namely `ov::hint::PerformanceMode::THROUGHPUT <enumov_1_1hint_1_1PerformanceMode.html#doxid-group-ov-runtime-cpp-prop-api-1gga032aa530efa40760b79af14913d48d73a50f9b1f40c078d242af7ec323ace44b3>`__ and `ov::hint::PerformanceMode::LATENCY <enumov_1_1hint_1_1PerformanceMode.html#doxid-group-ov-runtime-cpp-prop-api-1gga032aa530efa40760b79af14913d48d73a501069dd75f76384ba18f133fdce99c2>`__.
+A special `ov::hint::PerformanceMode::UNDEFINED <enumov_1_1hint_1_1PerformanceMode.html#doxid-group-ov-runtime-cpp-prop-api-1gga032aa530efa40760b79af14913d48d73a0db45d2a4141101bdfe48e3314cfbca3>`__ hint acts the same as specifying no hint.

-For more information on conducting performance measurements with the `benchmark_app`, refer to the last section in this document.
+For more information on conducting performance measurements with the ``benchmark_app``, refer to the last section in this document.

-Keep in mind that a typical model may take significantly more time to load with the `ov::hint::PerformanceMode::THROUGHPUT` and consume much more memory, compared to the `ov::hint::PerformanceMode::LATENCY`.
+Keep in mind that a typical model may take significantly more time to load with the ``ov::hint::PerformanceMode::THROUGHPUT`` and consume much more memory, compared to the ``ov::hint::PerformanceMode::LATENCY``.
+
+Performance Hints: How It Works
+###############################

-## Performance Hints: How It Works
 Internally, every device "translates" the value of the hint to the actual performance settings.
-For example, the `ov::hint::PerformanceMode::THROUGHPUT` selects the number of CPU or GPU streams.
-Additionally, the optimal batch size is selected for the GPU and the [automatic batching](../OV_Runtime_UG/automatic_batching.md) is applied whenever possible. To check whether the device supports it, refer to the [devices/features support matrix](./supported_plugins/Device_Plugins.md) article.
+For example, the ``ov::hint::PerformanceMode::THROUGHPUT`` selects the number of CPU or GPU streams.
+Additionally, the optimal batch size is selected for the GPU and the :doc:`automatic batching <openvino_docs_OV_UG_Automatic_Batching>` is applied whenever possible. To check whether the device supports it, refer to the :doc:`devices/features support matrix <openvino_docs_OV_UG_Working_with_devices>` article.

-The resulting (device-specific) settings can be queried back from the instance of the `ov:Compiled_Model`.  
-Be aware that the `benchmark_app` outputs the actual settings for the `THROUGHPUT` hint. See the example of the output below:
+The resulting (device-specific) settings can be queried back from the instance of the ``ov:Compiled_Model``.
+Be aware that the ``benchmark_app`` outputs the actual settings for the ``THROUGHPUT`` hint. See the example of the output below:

-   ```
-    $benchmark_app -hint tput -d CPU -m 'path to your favorite model'
-    ...
-    [Step 8/11] Setting optimal runtime parameters
-    [ INFO ] Device: CPU
-    [ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
-    ...
-    [ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
-    [ INFO ]   { NUM_STREAMS , 4 }
-    ...
-   ```
+   .. code-block:: sh
+
+   $benchmark_app -hint tput -d CPU -m 'path to your favorite model'
+   ...
+   [Step 8/11] Setting optimal runtime parameters
+   [ INFO ] Device: CPU
+   [ INFO ]   { PERFORMANCE_HINT , THROUGHPUT }
+   ...
+   [ INFO ]   { OPTIMAL_NUMBER_OF_INFER_REQUESTS , 4 }
+   [ INFO ]   { NUM_STREAMS , 4 }
+   ...
+
+
+Using the Performance Hints: Basic API
+######################################
+
+In the example code snippet below, ``ov::hint::PerformanceMode::THROUGHPUT`` is specified for the ``ov::hint::performance_mode`` property for ``compile_model``:

-## Using the Performance Hints: Basic API
-In the example code snippet below, `ov::hint::PerformanceMode::THROUGHPUT` is specified for the `ov::hint::performance_mode` property for `compile_model`:
-@sphinxdirective

 .. tab:: C++

@ -55,12 +64,13 @@ In the example code snippet below, `ov::hint::PerformanceMode::THROUGHPUT` is sp
       :language: python
       :fragment: [compile_model]

-@endsphinxdirective

-## Additional (Optional) Hints from the App
-For an application that processes 4 video streams, the most future-proof way to communicate the limitation of the parallel slack is to equip the performance hint with the optional `ov::hint::num_requests` configuration key set to 4. 
-As mentioned earlier, this will limit the batch size for the GPU and the number of inference streams for the CPU. Thus, each device uses the `ov::hint::num_requests` while converting the hint to the actual device configuration options:
-@sphinxdirective
+Additional (Optional) Hints from the App
+########################################
+
+For an application that processes 4 video streams, the most future-proof way to communicate the limitation of the parallel slack is to equip the performance hint with the optional ``ov::hint::num_requests`` configuration key set to 4. 
+As mentioned earlier, this will limit the batch size for the GPU and the number of inference streams for the CPU. Thus, each device uses the ``ov::hint::num_requests`` while converting the hint to the actual device configuration options:
+

 .. tab:: C++

@ -74,11 +84,12 @@ As mentioned earlier, this will limit the batch size for the GPU and the number
       :language: python
       :fragment: [hint_num_requests]

-@endsphinxdirective

-## Optimal Number of Inference Requests
-The hints are used on the presumption that the application queries `ov::optimal_number_of_infer_requests` to create and run the returned number of requests simultaneously:
-@sphinxdirective
+Optimal Number of Inference Requests
+####################################
+
+The hints are used on the presumption that the application queries ``ov::optimal_number_of_infer_requests`` to create and run the returned number of requests simultaneously:
+

 .. tab:: C++

@ -92,21 +103,24 @@ The hints are used on the presumption that the application queries `ov::optimal_
       :language: python
       :fragment: [query_optimal_num_requests]

-@endsphinxdirective

-While an application is free to create more requests if needed (for example to support asynchronous inputs population) **it is very important to at least run the `ov::optimal_number_of_infer_requests` of the inference requests in parallel**. It is recommended for efficiency, or device utilization, reasons. 
+While an application is free to create more requests if needed (for example to support asynchronous inputs population) **it is very important to at least run the ``ov::optimal_number_of_infer_requests`` of the inference requests in parallel**. It is recommended for efficiency, or device utilization, reasons. 

-Keep in mind that `ov::hint::PerformanceMode::LATENCY` does not necessarily imply using single inference request. For example, multi-socket CPUs can deliver as many requests at the same minimal latency as the number of NUMA nodes in the system.
-To make your application fully scalable, make sure to query the `ov::optimal_number_of_infer_requests` directly.
+Keep in mind that ``ov::hint::PerformanceMode::LATENCY`` does not necessarily imply using single inference request. For example, multi-socket CPUs can deliver as many requests at the same minimal latency as the number of NUMA nodes in the system.
+To make your application fully scalable, make sure to query the ``ov::optimal_number_of_infer_requests`` directly.
+
+Prefer Async API
+################
+
+The API of the inference requests offers Sync and Async execution. The ``ov::InferRequest::infer()`` is inherently synchronous and simple to operate (as it serializes the execution flow in the current application thread). The Async "splits" the ``infer()`` into ``ov::InferRequest::start_async()`` and ``ov::InferRequest::wait()`` (or callbacks). For more information, refer to the doc:`API examples <openvino_docs_OV_UG_Infer_request>`.
+Although the Synchronous API can be somewhat easier to start with, it is recommended to use the Asynchronous (callbacks-based) API in the production code. It is the most general and scalable way to implement the flow control for any possible number of requests (and thus both latency and throughput scenarios).
+
+Combining the Hints and Individual Low-Level Settings
+#####################################################

-## Prefer Async API
-The API of the inference requests offers Sync and Async execution. The `ov::InferRequest::infer()` is inherently synchronous and simple to operate (as it serializes the execution flow in the current application thread). The Async "splits" the `infer()` into `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` (or callbacks). For more information, refer to the [API examples](../OV_Runtime_UG/ov_infer_request.md).
- Although the Synchronous API can be somewhat easier to start with, it is recommended to use the Asynchronous (callbacks-based) API in the production code. It is the most general and scalable way to implement the flow control for any possible number of requests (and thus both latency and throughput scenarios).
- 
-## Combining the Hints and Individual Low-Level Settings
 While sacrificing the portability to some extent, it is possible to combine the hints with individual device-specific settings. 
-For example, use `ov::hint::PerformanceMode::THROUGHPUT` to prepare a general configuration and override any of its specific values:  
-@sphinxdirective
+For example, use ``ov::hint::PerformanceMode::THROUGHPUT`` to prepare a general configuration and override any of its specific values:
+

 .. tab:: C++

@ -121,15 +135,22 @@ For example, use `ov::hint::PerformanceMode::THROUGHPUT` to prepare a general co
       :fragment: [hint_plus_low_level]


+Testing Performance of the Hints with the Benchmark_App
+#######################################################
+
+The ``benchmark_app``, that exists in both  :doc:`C++ <openvino_inference_engine_samples_benchmark_app_README>` and :doc:`Python <openvino_inference_engine_tools_benchmark_tool_README>` versions, is the best way to evaluate the functionality of the performance hints for a particular device:
+
+* benchmark_app **-hint tput** -d 'device' -m 'path to your model'
+* benchmark_app **-hint latency** -d 'device' -m 'path to your model'
+
+Disabling the hints to emulate the pre-hints era (highly recommended before trying the individual low-level settings, such as the number of streams as below, threads, etc):
+
+* benchmark_app **-hint none -nstreams 1**  -d 'device' -m 'path to your model'
+
+
+Additional Resources
+####################
+
+* :doc:`Supported Devices <openvino_docs_OV_UG_Working_with_devices>`
+
@endsphinxdirective
-
-## Testing Performance of the Hints with the Benchmark_App
-The `benchmark_app`, that exists in both  [C++](../../samples/cpp/benchmark_app/README.md) and [Python](../../tools/benchmark_tool/README.md) versions, is the best way to evaluate the functionality of the performance hints for a particular device:
- - benchmark_app **-hint tput** -d 'device' -m 'path to your model'
- - benchmark_app **-hint latency** -d 'device' -m 'path to your model'
-  Disabling the hints to emulate the pre-hints era (highly recommended before trying the individual low-level settings, such as the number of streams as below, threads, etc):
- - benchmark_app **-hint none -nstreams 1**  -d 'device' -m 'path to your model'
- 
-
-### Additional Resources
-* [Supported Devices](./supported_plugins/Supported_Devices.md)
--- a/docs/OV_Runtime_UG/supported_plugins/CPU.md
+++ b/docs/OV_Runtime_UG/supported_plugins/CPU.md
@ -279,7 +279,7 @@ Stateful Models

 The CPU plugin supports stateful models without any limitations.

-For details, see :doc:`stateful models guide <openvino_docs_OV_UG_network_state_intro>`.
+For details, see :doc:`stateful models guide <openvino_docs_OV_UG_model_state_intro>`.

 Supported Properties
 ###########################################################
@ -398,8 +398,7 @@ weights are loaded from DDR/L3 cache in the packed format this significantly dec
 and as a consequence improve inference performance.

 To use this feature, the user is provided with property ``sparse_weights_decompression_rate``, which can take 
-values from the interval \[0.5, 1\] (values from \[0, 0.5\] are not supported in current implementation, 
-see limitations below). ``sparse_weights_decompression_rate`` defines sparse rate threashold: only operations 
+values from the interval \[0, 1\]. ``sparse_weights_decompression_rate`` defines sparse rate threashold: only operations 
 with higher sparse rate will be executed using ``sparse weights decompression feature``. The default value is ``1``, 
 which means the option is disabled.

--- a/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md
+++ b/docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md
@ -48,7 +48,7 @@ The table below demonstrates support of key features by OpenVINO device plugins.
  :doc:`Dynamic shapes <openvino_docs_OV_UG_DynamicShapes>`                                 Yes             Partial         No              No                      
  :doc:`Import/Export <openvino_inference_engine_tools_compile_tool_README>`                Yes             No              Yes             No                      
  :doc:`Preprocessing acceleration <openvino_docs_OV_UG_Preprocessing_Overview>`            Yes             Yes             No              Partial                 
-  :doc:`Stateful models <openvino_docs_OV_UG_network_state_intro>`                          Yes             No              Yes             No                      
+  :doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>`                            Yes             No              Yes             No                      
  :doc:`Extensibility <openvino_docs_Extensibility_UG_Intro>`                               Yes             Yes             No              No                      
 ========================================================================================= =============== =============== =============== ======================== 

--- a/docs/OV_Runtime_UG/supported_plugins/GNA.md
+++ b/docs/OV_Runtime_UG/supported_plugins/GNA.md
@ -209,7 +209,7 @@ To compile a model, use either :doc:`compile Tool <openvino_inference_engine_too
 Stateful Models
 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

-GNA plugin natively supports stateful models. For more details on such models, refer to the :doc:`Stateful models <openvino_docs_OV_UG_network_state_intro>`.
+GNA plugin natively supports stateful models. For more details on such models, refer to the :doc:`Stateful models <openvino_docs_OV_UG_model_state_intro>`.

 .. note:: 

--- a/docs/_static/css/custom.css
+++ b/docs/_static/css/custom.css
@ -271,6 +271,20 @@ div.highlight {
    color: #fefefe;
 }

+.chart-wrap {
+    display: grid;
+    grid-template-columns: minmax(0, 1fr) 4fr;
+    padding-left: 15px;
+    padding-right: 15px;
+}
+
+.graph-item {
+    display: flex;
+    flex-direction: column;
+    flex: 1;
+    min-width: 0;
+}
+
 .graph-chart-title-header {
    font-size: 1.4rem;
    line-height: 2rem;
@ -293,16 +307,21 @@ div.highlight {
    padding: 12px 0;
 }

-.chart-column-header-container {
+.chart-graphs-container {
    padding-top: 8px;
    display: flex;
    flex-direction: row;
    width: 100%;
+    min-width: 0;
 }

 .chart-column-title {
    min-width: 20%;
    flex-grow: 0 1;
+    white-space: nowrap;
+    display: flex;
+    flex-direction: row;
+    align-items: flex-start;
 }

 .chart-column-title .icon {
@ -328,17 +347,21 @@ div.highlight {
 }

 .chart-labels-container {
-    width: 18%;
+    padding-top: 8px;
 }

-.chart-labels-container .title {
-    text-align: right;
+.chart-labels-item {
+    width: 100%;
+}
+
+.chart-labels-item .title {
    text-overflow: ellipsis;
    overflow: hidden;
    white-space: nowrap;
    display: block;
    font-size: .8rem;
-    line-height: 3.42rem;
+    line-height: 55px;
+    height: 55px;
    color: gray;
 }

@ -545,6 +568,7 @@ div.highlight {

 .graph-row {
    display: flex;
+    flex-direction: column;
    padding-top: 10px;
    padding-bottom: 20px;
 }
@ -554,7 +578,12 @@ div.highlight {
 }

 .graph-row-column {
-    width: 20%;
+    width: 100%;
+}
+
+.graph-legend-container {
+    display: flex;
+    flex-direction: column;
 }

@media screen and (max-width:768px) {
@ -566,6 +595,12 @@ div.highlight {
 }

@media screen and (max-width: 530px) {
+    .modal-content {
+        width: 100vw;
+        height: 100vh;
+        max-height: 100%;
+    }
+
    .buttons-nav {
        margin-top: 0.125rem;
        margin-bottom: 0.125rem;
@ -867,4 +902,4 @@ table#model-accuracy-and-perf-int8-fp32-table td.data {
 #performance-information-frequently-asked-questions section table {
  display: none;
  padding-left: 30px;
-}
+}
--- a/docs/_static/html/modal.html
+++ b/docs/_static/html/modal.html
@ -72,7 +72,6 @@
        <div class="modal-line-divider"></div>
        <div class="modal-footer-content">
            <div class="modal-disclaimer-box"></div>
-            <button class="close-btn">Close</button>
        </div>
    </div>
-</div>
+</div>
--- a/docs/_static/images/BASIC_FLOW_IE_C.svg
+++ b/docs/_static/images/BASIC_FLOW_IE_C.svg
--- a/docs/_static/images/caching_enabled.svg
+++ b/docs/_static/images/caching_enabled.svg
--- a/docs/_static/images/caching_times.svg
+++ b/docs/_static/images/caching_times.svg
--- a/docs/_static/images/gapi_development_workflow.png
+++ b/docs/_static/images/gapi_development_workflow.png
--- a/docs/_static/images/gapi_face_analytics_pipeline.png
+++ b/docs/_static/images/gapi_face_analytics_pipeline.png
--- a/docs/_static/images/gapi_face_beautification_algorithm.png
+++ b/docs/_static/images/gapi_face_beautification_algorithm.png
--- a/docs/_static/images/gapi_face_beautification_example.jpg
+++ b/docs/_static/images/gapi_face_beautification_example.jpg
--- a/docs/_static/images/gapi_kernel_implementation_hierarchy.png
+++ b/docs/_static/images/gapi_kernel_implementation_hierarchy.png
--- a/docs/_static/images/gapi_programming_model.png
+++ b/docs/_static/images/gapi_programming_model.png
--- a/docs/_static/images/synch-vs-asynch.svg
+++ b/docs/_static/images/synch-vs-asynch.svg
--- a/docs/_static/js/graphs.js
+++ b/docs/_static/js/graphs.js
@ -338,6 +338,13 @@ class Graph {
    }
 }

+class ChartDisplay {
+    constructor(mode, numberOfCharts) {
+        this.mode = mode;
+        this.numberOfChartsInRow = numberOfCharts;
+    }
+}
+
 $(document).ready(function () {

    $('.ov-toolkit-benchmark-results').on('click', showModal);
@ -357,13 +364,13 @@ $(document).ready(function () {
        $('.graph-chart-title-header').on('click', (event) => {
            var parent = event.target.parentElement;

-            if ($(parent).children('.chart-wrap.container,.empty-chart-container').is(":visible")) {
-                $(parent).children('.chart-wrap.container,.empty-chart-container').hide();
+            if ($(parent).children('.chart-wrap,.empty-chart-container').is(":visible")) {
+                $(parent).children('.chart-wrap,.empty-chart-container').hide();
                $(parent).children('.chevron-right-btn').show();
                $(parent).children('.chevron-down-btn').hide();
                $
            } else {
-                $(parent).children('.chart-wrap.container,.empty-chart-container').show();
+                $(parent).children('.chart-wrap,.empty-chart-container').show();
                $(parent).children('.chevron-down-btn').show();
                $(parent).children('.chevron-right-btn').hide();
            }
@ -649,11 +656,93 @@ $(document).ready(function () {
        });
    }

-    function getChartOptions(title) {
+    // =================== HTMLLEGEND =========================
+
+    const getOrCreateLegendList = (chart, id) => {
+      const legendContainer = document.getElementById(id);
+      let listContainer = legendContainer.querySelector('ul');
+
+      if (!listContainer) {
+        listContainer = document.createElement('ul');
+        listContainer.style.display = 'flex';
+        listContainer.style.flexDirection = 'column';
+        listContainer.style.margin = 0;
+        listContainer.style.padding = 0;
+        listContainer.style.paddingLeft = '10px';
+
+        legendContainer.appendChild(listContainer);
+      }
+
+      return listContainer;
+    };
+
+    const htmlLegendPlugin = {
+      id: 'htmlLegend',
+      afterUpdate(chart, args, options) {
+        const ul = getOrCreateLegendList(chart, chart.options.plugins.htmlLegend.containerID);
+
+        // Remove old legend items
+        while (ul.firstChild) {
+          ul.firstChild.remove();
+        }
+
+        // Reuse the built-in legendItems generator
+        const items = chart.legend.legendItems;
+
+        items.forEach(item => {
+          const li = document.createElement('li');
+          li.style.alignItems = 'center';
+          li.style.display = 'flex';
+          li.style.flexDirection = 'row';
+          li.style.marginLeft = '10px';
+
+          li.onclick = () => {
+            const {type} = chart.config;
+            if (type === 'pie' || type === 'doughnut') {
+              // Pie and doughnut charts only have a single dataset and visibility is per item
+              chart.toggleDataVisibility(item.index);
+            } else {
+              chart.setDatasetVisibility(item.datasetIndex, !chart.isDatasetVisible(item.datasetIndex));
+            }
+            chart.update();
+          };
+
+          // Color box
+          const boxSpan = document.createElement('span');
+          boxSpan.style.background = item.fillStyle;
+          boxSpan.style.borderColor = item.strokeStyle;
+          boxSpan.style.borderWidth = item.lineWidth + 'px';
+          boxSpan.style.display = 'inline-block';
+          boxSpan.style.height = '12px';
+          boxSpan.style.marginRight = '10px';
+          boxSpan.style.width = '30px';
+
+          // Text
+          const textContainer = document.createElement('p');
+          textContainer.style.color = item.fontColor;
+          textContainer.style.margin = 0;
+          textContainer.style.padding = 0;
+        //   textContainer.style.fontFamily = 'Roboto';
+          textContainer.style.fontSize = '0.8rem';
+          textContainer.style.textDecoration = item.hidden ? 'line-through' : '';
+
+          const text = document.createTextNode(item.text);
+          textContainer.appendChild(text);
+
+          li.appendChild(boxSpan);
+          li.appendChild(textContainer);
+          ul.appendChild(li);
+        });
+      }
+    };
+
+    // ====================================================
+
+    function getChartOptions(title, containerId) {
        return {
            responsive: true,
            maintainAspectRatio: false,
-            legend: { display: true, position: 'bottom' },
+            legend: {display: false},
            title: {
                display: false,
                text: title
@ -672,17 +761,9 @@ $(document).ready(function () {
                }]
            },
            plugins: {
-                datalabels: {
-                    color: "#4A4A4A",
-                    anchor: "end",
-                    align: "end",
-                    clamp: false,
-                    offset: 0,
-                    display: true,
-                    font: {
-                        size: 8,
-                        family: 'Roboto'
-                    }
+                htmlLegend: {
+                // ID of the container to put the legend in
+                    containerID: containerId,
                }
            }
        }
@ -708,6 +789,8 @@ $(document).ready(function () {

        $('.chart-placeholder').empty();
        $('.modal-disclaimer-box').empty();
+        const display = new ChartDisplay(getChartsDisplayMode(kpis.length), kpis.length);
+
        networkModels.forEach((networkModel) => {
            var chartName = networkModel;
            var chartSlug = chartName.replace(')', '').replace(' (', '-');
@ -716,13 +799,13 @@ $(document).ready(function () {
            var chevronDown = '<span class="chevron-down-btn"></span>';
            var chevronRight = '<span style="display:none" class="chevron-right-btn"></span>';
            $(chevronRight).hide();
-            var chartContainerHeader = $('<span class="graph-chart-title">' + networkModel + '</span>' + chevronDown + chevronRight);
+
+            var chartContainerHeader = $(chevronDown + chevronRight + '<span class="graph-chart-title">' + networkModel + '</span>');
            chartContainerHeader.addClass('graph-chart-title-header');
            chartContainer.prepend(chartContainerHeader);
            chartContainer.attr('id', 'ov-chart-container-' + chartSlug);

            chartContainer.addClass('chart-container');
-            chartContainer.addClass('container');

            var filteredNetworkModels = Filter.FilterByNetworkModel(graph.data, [networkModel]);
            var filteredIeTypes = Filter.FilterByIeType(filteredNetworkModels, ietype);
@ -730,7 +813,7 @@ $(document).ready(function () {

            $('.chart-placeholder').append(chartContainer);
            if (filteredGraphData.length > 0) {
-                createChartWithNewData(filteredGraphData, chartContainer, kpis, ietype, precisions);
+                createChartWithNewData(filteredGraphData, chartContainer, kpis, ietype, precisions, display);
            } else {
              createEmptyChartContainer(chartContainer);
            }
@ -740,19 +823,20 @@ $(document).ready(function () {
            if (chartDisclaimers[kpi])
                $('.modal-disclaimer-box').append($('<p>').text(chartDisclaimers[kpi]))
        }
+
+        $(window).off('resize');
+        $(window).resize(() => resetChartsDisplay(display));
    };

    function createEmptyChartContainer(chartContainer) {
      chartContainer.append($('<div>').addClass('empty-chart-container').text('No data for this configuration.'));
    }

-
    // this function should take the final data set and turn it into graphs
    // params: GraphData, unused, chartContainer
-    function createChartWithNewData(model, chartContainer, kpis, ietype, precisions) {
+    function createChartWithNewData(model, chartContainer, kpis, ietype, precisions, display) {
        var chartWrap = $('<div>');
        chartWrap.addClass('chart-wrap');
-        chartWrap.addClass('container');
        chartContainer.append(chartWrap);
        var labels = Graph.getPlatformNames(model);

@ -771,12 +855,20 @@ $(document).ready(function () {
            return config;
        });

+        // get the client platform labels and create labels for all the graphs
+        var labelsContainer = $('<div>');
+        labelsContainer.addClass('chart-labels-container');
+        chartWrap.append(labelsContainer);

        // get the kpi title's and create headers for the graphs 
-        var chartColumnHeaderContainer = $('<div>');
-        chartColumnHeaderContainer.addClass('chart-column-header-container');
-        chartColumnHeaderContainer.append($('<div class="chart-column-title"></div>'));
-        graphConfigs.forEach((graphConfig) => {
+        var chartGraphsContainer = $('<div>');
+        chartGraphsContainer.addClass('chart-graphs-container');
+        chartWrap.append(chartGraphsContainer);
+
+        graphConfigs.forEach((graphConfig, index) => {
+            const id = getRandomNumber();
+            var graphItem = $(`<div id=${id}>`);
+            graphItem.addClass('graph-item');
            var columnHeaderContainer = $('<div>');
            columnHeaderContainer.addClass('chart-column-title');
            var columnIcon = $('<div class="icon">');
@ -786,53 +878,134 @@ $(document).ready(function () {
            columnHeader.append($('<div class="title">' + graphConfig.chartTitle + '</div>'));
            columnHeader.append($('<div class="title">' + Graph.getGraphPlatformText(ietype) + '</div>'));
            columnHeader.append($('<div class="subtitle">' + graphConfig.chartSubtitle + '</div>'));
+            
            columnHeaderContainer.append(columnHeader);
-            chartColumnHeaderContainer.append(columnHeaderContainer);
+            chartGraphsContainer.append(graphItem);
+            var graphClass = $('<div>');
+            graphClass.addClass('graph-row');
+            
+            graphItem.append(columnHeaderContainer);
+            graphItem.append(graphClass);
+            processMetricNew(labels, graphConfig.datasets, graphConfig.chartTitle, graphClass, 'graph-row-column', id);
+            
+            window.setTimeout(() => {
+                const topPadding = getLabelsTopPadding(display.mode);
+                const labelsHeight = (labels.length * 55);
+                const chartHeight = $(graphItem).outerHeight();
+                const bottomPadding = (chartHeight - (topPadding + labelsHeight));
+                
+                var labelsItem = $('<div>');
+                labelsItem.addClass('chart-labels-item');
+                
+                labels.forEach((label) => {
+                    labelsItem.append($('<div class="title">' + label + '</div>'));
+                });
+                
+                labelsItem.css('padding-top', topPadding + 'px');
+                labelsItem.css('padding-bottom', bottomPadding + 'px');
+                setInitialItemsVisibility(labelsItem, index, display.mode);
+                labelsContainer.append(labelsItem);
+            });
        });
-
-        // get the client platform labels and create labels for all the graphs
-
-        var labelsContainer = $('<div>');
-        labelsContainer.addClass('chart-labels-container');
-
-        labels.forEach((label) => {
-            labelsContainer.append($('<div class="title">' + label + '</div>'));
-        });
-
-        // get the legend and create legends for each graph
-
-        var graphClass = $('<div>');
-        graphClass.addClass('graph-row');
-        chartWrap.append(chartColumnHeaderContainer);
-        graphClass.append(labelsContainer);
-        chartWrap.append(graphClass);
-
-        graphConfigs.forEach((graphConfig) => {
-            processMetricNew(labels, graphConfig.datasets, graphConfig.chartTitle, graphClass, 'graph-row-column');
-        });
-
-        // might need this line for multiple graphs on a page
-        // var displayWidth = $(window).width();
-
+        setChartsDisplayDirection(display.mode);
+        adjustHeaderIcons(display.mode);
    }

-    function processMetricNew(labels, datasets, chartTitle, container, widthClass, displayLabels) {
+    function processMetricNew(labels, datasets, chartTitle, container, widthClass, id) {
        // ratio for consistent chart label height
-        var heightRatio = ((labels.length * 55 + 20) / labels.length) + (labels.length * 55);
+        var heightRatio = (30 + (labels.length * 55));
        var chart = $('<div>');
+        const containerId = `legend-container-${id}`;
+        const legend = $(`<div id="${containerId}">`);
+        legend.addClass('graph-legend-container');
        chart.addClass('chart');
        chart.addClass(widthClass);
        chart.height(heightRatio);
        var canvas = $('<canvas>');
        chart.append(canvas);
        container.append(chart);
+        container.append(legend);
        var context = canvas.get(0).getContext('2d');
        context.canvas.height = heightRatio;
-        new Chart(context, {
+        window.setTimeout(() => {
+            new Chart(context, {
            type: 'horizontalBar',
            data: getChartDataNew(labels, datasets),
-            options: getChartOptions(chartTitle, displayLabels)
+            options: getChartOptions(chartTitle, containerId),
+            plugins: [htmlLegendPlugin]
+            });
        });
    }

-});
+    function getRandomNumber() {
+        return Math.floor(Math.random() * 100000);
+    }
+
+    function resetChartsDisplay(currentDisplay) {
+        const newDisplayMode = getChartsDisplayMode(currentDisplay.numberOfChartsInRow);
+        if (currentDisplay.mode != newDisplayMode) {
+            currentDisplay.mode = newDisplayMode;
+            setChartsDisplayDirection(currentDisplay.mode);
+            adjustLabels(currentDisplay.mode);
+            adjustHeaderIcons(currentDisplay.mode);
+        }
+    }
+
+    function adjustLabels(displayMode) {
+        const firstLabels = $('.chart-labels-container').find('.chart-labels-item:first-child');
+        const labels = $('.chart-labels-container').find('.chart-labels-item');
+        labels.css('padding-top', getLabelsTopPadding(displayMode));
+        if (displayMode == 'column') {
+            labels.show();
+        }
+        else {
+            labels.hide()
+            firstLabels.show();
+        }
+    }
+
+    function adjustHeaderIcons(displayMode) {
+        const icons = $('.graph-item').find('.chart-column-title');
+        if (displayMode == 'rowCompact')
+            icons.css('flex-direction', 'column')
+        else
+            icons.css('flex-direction', 'row')
+    }
+    
+    function getLabelsTopPadding(displayMode) {
+        return (displayMode == 'rowCompact') ? 105.91 : 83.912;
+    }
+
+    function setChartsDisplayDirection(displayMode) {
+        const container = $('.chart-placeholder').find('.chart-graphs-container');
+        if (displayMode == 'column') {
+            container.css('flex-direction', 'column');
+        }
+        else {
+            container.css('flex-direction', 'row');
+        }
+    }
+
+    function setInitialItemsVisibility(item, count, displayMode) {
+        if (count == 0 || displayMode == 'column') item.show();
+        else item.hide();
+    }
+
+    function getChartsDisplayMode(numberOfCharts) {
+        switch (numberOfCharts) {
+            case 4:
+                return window.matchMedia('(max-width: 721px)').matches ? 'column'
+                        : window.matchMedia('(max-width: 830px)').matches ? 'rowCompact'
+                        : 'row';
+            case 3:
+                return window.matchMedia('(max-width: 569px)').matches ? 'column'
+                        : window.matchMedia('(max-width: 649px)').matches ? 'rowCompact'
+                        : 'row';
+            case 2:
+                return window.matchMedia('(max-width: 500px)').matches ? 'column'
+                        : 'row';
+            default:
+                return 'row';
+        }
+    }
+});
--- a/docs/dlstreamer.md
+++ b/docs/dlstreamer.md
@ -1,5 +1,7 @@
 # Intel® Deep Learning Streamer (Intel® DL Streamer) {#openvino_docs_dlstreamer}

+@sphinxdirective
+
 Intel® DL Streamer is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.  

 Intel® DL Streamer makes Media analytics easy: 
@ -9,10 +11,13 @@ Intel® DL Streamer makes Media analytics easy:
 * Analyze video and audio streams, create actionable results, capture results, and send them to the cloud 
 * Leverage the efficiency and computational power of Intel hardware platforms 

-Go to [Intel® DL Streamer documentation website](https://dlstreamer.github.io) for information on how to download, install, and use.  
+Go to `Intel® DL Streamer documentation website <https://dlstreamer.github.io>`__ for information on how to download, install, and use.  

 **Media analytics** is the analysis of audio & video streams to detect, classify, track, identify and count objects, events and people. The analyzed results can be used to take actions, coordinate events, identify patterns and gain insights across multiple domains. 

 **Media analytics pipelines** transform media streams into insights through audio / video processing, inference, and analytics operations across multiple IP blocks. 
 
-\* Other names and brands may be claimed as the property of others. 
+\* Other names and brands may be claimed as the property of others.
+
+@endsphinxdirective
+
--- a/docs/gapi/face_beautification.md
+++ b/docs/gapi/face_beautification.md
@ -1,434 +1,510 @@
 # Implementing a Face Beautification Algorithm {#openvino_docs_gapi_face_beautification}

-## Introduction
+@sphinxdirective
+
+Introduction
+############
+
 In this tutorial you will learn:

 * Basics of a sample face beautification algorithm;
 * How to infer different networks inside a pipeline with G-API;
 * How to run a G-API pipeline on a video stream.

-## Prerequisites
+Prerequisites
+#############
+
 This sample requires:

 * PC with GNU/Linux or Microsoft Windows (Apple macOS is supported but was not tested)
-* OpenCV 4.2 or higher built with [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) (building with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is a plus)
-* The following pre-trained models from the [Open Model Zoo](@ref omz_models_group_intel)
-      * [face-detection-adas-0001](@ref omz_models_model_face_detection_adas_0001)
-      * [facial-landmarks-35-adas-0002](@ref omz_models_model_facial_landmarks_35_adas_0002)
+* OpenCV 4.2 or higher built with `Intel® Distribution of OpenVINO™ Toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__ (building with `Intel® TBB <https://www.threadingbuildingblocks.org/intel-tbb-tutorial>`__ is a plus)
+* The following pre-trained models from the :doc:`Open Model Zoo <omz_models_group_intel>`

-To download the models from the Open Model Zoo, use the [Model Downloader](@ref omz_tools_downloader) tool.
+  * `face-detection-adas-0001 <https://docs.openvino.ai/latest/omz_models_model_face_detection_adas_0001.html#doxid-omz-models-model-face-detection-adas-0001>`__
+  * `facial-landmarks-35-adas-0002 <https://docs.openvino.ai/latest/omz_models_model_facial_landmarks_35_adas_0002.html#doxid-omz-models-model-facial-landmarks-35-adas-0002>`__
+
+To download the models from the Open Model Zoo, use the :doc:`Model Downloader <omz_tools_downloader>` tool.
+
+Face Beautification Algorithm
+#############################

-## Face Beautification Algorithm
 We will implement a simple face beautification algorithm using a combination of modern Deep Learning techniques and traditional Computer Vision. The general idea behind the algorithm is to make face skin smoother while preserving face features like eyes or a mouth contrast. The algorithm identifies parts of the face using a DNN inference, applies different filters to the parts found, and then combines it into the final result using basic image arithmetics:

-![Face Beautification Algorithm](../img/gapi_face_beautification_algorithm.png)
-
+.. image:: _static/images/gapi_face_beautification_algorithm.png

 Briefly the algorithm is described as follows:
- Input image \f$I\f$ is passed to unsharp mask and bilateral filters
-  (\f$U\f$ and \f$L\f$ respectively);
- Input image \f$I\f$ is passed to an SSD-based face detector;
- SSD result (a \f$[1 \times 1 \times 200 \times 7]\f$ blob) is parsed and converted to an array of faces;
+
+- Input image :math:`I` is passed to unsharp mask and bilateral filters
+  (\f$U\f$ and :math:`L` respectively);
+- Input image :math:`I` is passed to an SSD-based face detector;
+- SSD result (a :math:`[1 \times 1 \times 200 \times 7]` blob) is parsed and converted to an array of faces;
 - Every face is passed to a landmarks detector;
 - Based on landmarks found for every face, three image masks are generated:
-  - A background mask \f$b\f$ -- indicating which areas from the original image to keep as-is;
-  - A face part mask \f$p\f$ -- identifying regions to preserve (sharpen).
-  - A face skin mask \f$s\f$ -- identifying regions to blur;
- The final result \f$O\f$ is a composition of features above calculated as \f$O = b*I + p*U + s*L\f$.
+  
+  - A background mask :math:`b` -- indicating which areas from the original image to keep as-is;
+  - A face part mask :math:`p` -- identifying regions to preserve (sharpen).
+  - A face skin mask :math:`s` -- identifying regions to blur;
+- The final result :math:`O` is a composition of features above calculated as :math:`O = b\*I + p\*U + s\*L`.

 Generating face element masks based on a limited set of features (just 35 per face, including all its parts) is not very trivial and is described in the sections below.

-## Constructing a G-API Pipeline
+Constructing a G-API Pipeline
+#############################
+
+Declare Deep Learning Topologies
++++++++++++++++++++++++++++++++

-### Declare Deep Learning Topologies
 This sample is using two DNN detectors. Every network takes one input and produces one output. In G-API, networks are defined with macro G_API_NET():
-```cpp
-G_API_NET(FaceDetector,  <cv::GMat(cv::GMat)>, "face_detector");
-G_API_NET(LandmDetector, <cv::GMat(cv::GMat)>, "landm_detector");
-```
+
+.. code-block:: cpp
+   
+   G_API_NET(FaceDetector,  <cv::GMat(cv::GMat)>, "face_detector");
+   G_API_NET(LandmDetector, <cv::GMat(cv::GMat)>, "landm_detector");
+
 To get more information, see Declaring Deep Learning topologies described in the "Face Analytics pipeline" tutorial.

-### Describe the Processing Graph
+Describe the Processing Graph
+++++++++++++++++++++++++++++
+
 The code below generates a graph for the algorithm above:
-```cpp
-cv::GComputation pipeline([=]()
-{
-    cv::GMat  gimgIn;                                                                           // input
-    cv::GMat  faceOut  = cv::gapi::infer<custom::FaceDetector>(gimgIn);
-    GArrayROI garRects = custom::GFacePostProc::on(faceOut, gimgIn, config::kConfThresh);       // post-proc
-    cv::GArray<cv::GMat> landmOut  = cv::gapi::infer<custom::LandmDetector>(garRects, gimgIn);
-    cv::GArray<Landmarks> garElems;                                                             // |
-    cv::GArray<Contour>   garJaws;                                                              // |output arrays
-    std::tie(garElems, garJaws)    = custom::GLandmPostProc::on(landmOut, garRects);            // post-proc
-    cv::GArray<Contour> garElsConts;                                                            // face elements
-    cv::GArray<Contour> garFaceConts;                                                           // whole faces
-    std::tie(garElsConts, garFaceConts) = custom::GGetContours::on(garElems, garJaws);          // interpolation
-    cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
-    cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
-                                                      config::kGSigma);                         // |
-    cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
-    cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
-                                                      config::kGSigma);                         // |draw masks
-    // The first argument in mask() is Blur as we want to subtract from                         // |
-    // BlurG the next step:                                                                     // |
-    cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
-    cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
-    cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
-    cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
-    cv::GMat gimgBilat       = custom::GBilatFilter::on(gimgIn, config::kBSize,
-                                                        config::kBSigmaCol, config::kBSigmaSp);
-    cv::GMat gimgSharp       = custom::unsharpMask(gimgIn, config::kUnshSigma,
-                                                   config::kUnshStrength);
-    // Applying the masks
-    // Custom function mask3C() should be used instead of just gapi::mask()
-    //  as mask() provides CV_8UC1 source only (and we have CV_8U3C)
-    cv::GMat gimgBilatMasked = custom::mask3C(gimgBilat, mskBlurFinal);
-    cv::GMat gimgSharpMasked = custom::mask3C(gimgSharp, mskSharpG);
-    cv::GMat gimgInMasked    = custom::mask3C(gimgIn,    mskNoFaces);
-    cv::GMat gimgBeautif = gimgBilatMasked + gimgSharpMasked + gimgInMasked;
-    return cv::GComputation(cv::GIn(gimgIn), cv::GOut(gimgBeautif,
-                                                      cv::gapi::copy(gimgIn),
-                                                      garFaceConts,
-                                                      garElsConts,
-                                                      garRects));
-});
-```
-The resulting graph is a mixture of G-API's standard operations, user-defined operations (namespace custom::), and DNN inference. The generic function `cv::gapi::infer<>()` allows you to trigger inference within the pipeline; networks to infer are specified as template parameters. The sample code is using two versions of `cv::gapi::infer<>()`:
+
+.. code-block:: cpp
+   
+   cv::GComputation pipeline([=]()
+   {
+       cv::GMat  gimgIn;                                                                           // input
+       cv::GMat  faceOut  = cv::gapi::infer<custom::FaceDetector>(gimgIn);
+       GArrayROI garRects = custom::GFacePostProc::on(faceOut, gimgIn, config::kConfThresh);       // post-proc
+       cv::GArray<cv::GMat> landmOut  = cv::gapi::infer<custom::LandmDetector>(garRects, gimgIn);
+       cv::GArray<Landmarks> garElems;                                                             // |
+       cv::GArray<Contour>   garJaws;                                                              // |output arrays
+       std::tie(garElems, garJaws)    = custom::GLandmPostProc::on(landmOut, garRects);            // post-proc
+       cv::GArray<Contour> garElsConts;                                                            // face elements
+       cv::GArray<Contour> garFaceConts;                                                           // whole faces
+       std::tie(garElsConts, garFaceConts) = custom::GGetContours::on(garElems, garJaws);          // interpolation
+       cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
+       cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
+                                                         config::kGSigma);                         // |
+       cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
+       cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
+                                                         config::kGSigma);                         // |draw masks
+       // The first argument in mask() is Blur as we want to subtract from                         // |
+       // BlurG the next step:                                                                     // |
+       cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
+       cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
+       cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
+       cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
+       cv::GMat gimgBilat       = custom::GBilatFilter::on(gimgIn, config::kBSize,
+                                                           config::kBSigmaCol, config::kBSigmaSp);
+       cv::GMat gimgSharp       = custom::unsharpMask(gimgIn, config::kUnshSigma,
+                                                      config::kUnshStrength);
+       // Applying the masks
+       // Custom function mask3C() should be used instead of just gapi::mask()
+       //  as mask() provides CV_8UC1 source only (and we have CV_8U3C)
+       cv::GMat gimgBilatMasked = custom::mask3C(gimgBilat, mskBlurFinal);
+       cv::GMat gimgSharpMasked = custom::mask3C(gimgSharp, mskSharpG);
+       cv::GMat gimgInMasked    = custom::mask3C(gimgIn,    mskNoFaces);
+       cv::GMat gimgBeautif = gimgBilatMasked + gimgSharpMasked + gimgInMasked;
+       return cv::GComputation(cv::GIn(gimgIn), cv::GOut(gimgBeautif,
+                                                         cv::gapi::copy(gimgIn),
+                                                         garFaceConts,
+                                                         garElsConts,
+                                                         garRects));
+   });
+
+
+The resulting graph is a mixture of G-API's standard operations, user-defined operations (namespace custom::), and DNN inference. The generic function ``cv::gapi::infer<>()`` allows you to trigger inference within the pipeline; networks to infer are specified as template parameters. The sample code is using two versions of ``cv::gapi::infer<>()``:

 * A frame-oriented one is used to detect faces on the input frame.
-* An ROI-list oriented one is used to run landmarks inference on a list of faces – this version produces an array of landmarks per every face.
-More on this in "Face Analytics pipeline" ([Building a GComputation](@ref gapi_ifd_gcomputation) section).
+* An ROI-list oriented one is used to run landmarks inference on a list of faces – this version produces an array of landmarks per every face. More on this in "Face Analytics pipeline" (:ref:`Building a GComputation <gapi_ifd_gcomputation>` section).

-### Unsharp mask in G-API
-The unsharp mask \f$U\f$ for image \f$I\f$ is defined as:
+Unsharp mask in G-API
+++++++++++++++++++++

-\f[U = I - s * L(M(I)),\f]
+The unsharp mask :math:`U` for image :math:`I` is defined as:

-where \f$M()\f$ is a median filter, \f$L()\f$ is the Laplace operator, and \f$s\f$ is a strength coefficient. While G-API doesn't provide this function out-of-the-box, it is expressed naturally with the existing G-API operations:
+.. math::
+   
+   U = I - s \* L(M(I))
+
+where :math:`M()` is a median filter, :math:`L()` is the Laplace operator, and :math:`s` is a strength coefficient. While G-API doesn't provide this function out-of-the-box, it is expressed naturally with the existing G-API operations:
+
+.. code-block:: cpp
+   
+   inline cv::GMat custom::unsharpMask(const cv::GMat &src,
+                                       const int       sigma,
+                                       const float     strength)
+   {
+       cv::GMat blurred   = cv::gapi::medianBlur(src, sigma);
+       cv::GMat laplacian = custom::GLaplacian::on(blurred, CV_8U);
+       return (src - (laplacian \* strength));
+   }

-```cpp
-inline cv::GMat custom::unsharpMask(const cv::GMat &src,
-                                    const int       sigma,
-                                    const float     strength)
-{
-    cv::GMat blurred   = cv::gapi::medianBlur(src, sigma);
-    cv::GMat laplacian = custom::GLaplacian::on(blurred, CV_8U);
-    return (src - (laplacian * strength));
-}
-```
 Note that the code snipped above is a regular C++ function defined with G-API types. Users can write functions like this to simplify graph construction; when called, this function just puts the relevant nodes to the pipeline it is used in.

-## Custom Operations
+Custom Operations
+#################
+
 The face beautification graph is using custom operations extensively. This chapter focuses on the most interesting kernels, refer to G-API Kernel API for general information on defining operations and implementing kernels in G-API.

-### Face detector post-processing
+Face detector post-processing
+++++++++++++++++++++++++++++
+
 A face detector output is converted to an array of faces with the following kernel:

-```cpp
-using VectorROI = std::vector<cv::Rect>;
-GAPI_OCV_KERNEL(GCPUFacePostProc, GFacePostProc)
-{
-    static void run(const cv::Mat   &inDetectResult,
-                    const cv::Mat   &inFrame,
-                    const float      faceConfThreshold,
-                          VectorROI &outFaces)
-    {
-        const int kObjectSize  = 7;
-        const int imgCols = inFrame.size().width;
-        const int imgRows = inFrame.size().height;
-        const cv::Rect borders({0, 0}, inFrame.size());
-        outFaces.clear();
-        const int    numOfDetections = inDetectResult.size[2];
-        const float *data            = inDetectResult.ptr<float>();
-        for (int i = 0; i < numOfDetections; i++)
-        {
-            const float faceId         = data[i * kObjectSize + 0];
-            if (faceId < 0.f)  // indicates the end of detections
-            {
-                break;
-            }
-            const float faceConfidence = data[i * kObjectSize + 2];
-            // We can cut detections by the `conf` field
-            //  to avoid mistakes of the detector.
-            if (faceConfidence > faceConfThreshold)
-            {
-                const float left   = data[i * kObjectSize + 3];
-                const float top    = data[i * kObjectSize + 4];
-                const float right  = data[i * kObjectSize + 5];
-                const float bottom = data[i * kObjectSize + 6];
-                // These are normalized coordinates and are between 0 and 1;
-                //  to get the real pixel coordinates we should multiply it by
-                //  the image sizes respectively to the directions:
-                cv::Point tl(toIntRounded(left   * imgCols),
-                             toIntRounded(top    * imgRows));
-                cv::Point br(toIntRounded(right  * imgCols),
-                             toIntRounded(bottom * imgRows));
-                outFaces.push_back(cv::Rect(tl, br) & borders);
-            }
-        }
-    }
-};
-```
+.. code-block:: cpp
+   
+   using VectorROI = std::vector<cv::Rect>;
+   GAPI_OCV_KERNEL(GCPUFacePostProc, GFacePostProc)
+   {
+       static void run(const cv::Mat   &inDetectResult,
+                       const cv::Mat   &inFrame,
+                       const float      faceConfThreshold,
+                             VectorROI &outFaces)
+       {
+           const int kObjectSize  = 7;
+           const int imgCols = inFrame.size().width;
+           const int imgRows = inFrame.size().height;
+           const cv::Rect borders({0, 0}, inFrame.size());
+           outFaces.clear();
+           const int    numOfDetections = inDetectResult.size[2];
+           const float \*data            = inDetectResult.ptr<float>();
+           for (int i = 0; i < numOfDetections; i++)
+           {
+               const float faceId         = data[i \* kObjectSize + 0];
+               if (faceId < 0.f)  // indicates the end of detections
+               {
+                   break;
+               }
+               const float faceConfidence = data[i \* kObjectSize + 2];
+               // We can cut detections by the `conf` field
+               //  to avoid mistakes of the detector.
+               if (faceConfidence > faceConfThreshold)
+               {
+                   const float left   = data[i \* kObjectSize + 3];
+                   const float top    = data[i \* kObjectSize + 4];
+                   const float right  = data[i \* kObjectSize + 5];
+                   const float bottom = data[i \* kObjectSize + 6];
+                   // These are normalized coordinates and are between 0 and 1;
+                   //  to get the real pixel coordinates we should multiply it by
+                   //  the image sizes respectively to the directions:
+                   cv::Point tl(toIntRounded(left   \* imgCols),
+                                toIntRounded(top    \* imgRows));
+                   cv::Point br(toIntRounded(right  \* imgCols),
+                                toIntRounded(bottom \* imgRows));
+                   outFaces.push_back(cv::Rect(tl, br) & borders);
+               }
+           }
+       }
+   };
+
+Facial Landmarks Post-Processing
++++++++++++++++++++++++++++++++

-### Facial Landmarks Post-Processing
 The algorithm infers locations of face elements (like the eyes, the mouth and the head contour itself) using a generic facial landmarks detector (details) from OpenVINO™ Open Model Zoo. However, the detected landmarks as-is are not enough to generate masks — this operation requires regions of interest on the face represented by closed contours, so some interpolation is applied to get them. This landmarks processing and interpolation is performed by the following kernel:
-```cpp
-GAPI_OCV_KERNEL(GCPUGetContours, GGetContours)
-{
-    static void run(const std::vector<Landmarks> &vctPtsFaceElems,  // 18 landmarks of the facial elements
-                    const std::vector<Contour>   &vctCntJaw,        // 17 landmarks of a jaw
-                          std::vector<Contour>   &vctElemsContours,
-                          std::vector<Contour>   &vctFaceContours)
-    {
-        size_t numFaces = vctCntJaw.size();
-        CV_Assert(numFaces == vctPtsFaceElems.size());
-        CV_Assert(vctElemsContours.size() == 0ul);
-        CV_Assert(vctFaceContours.size()  == 0ul);
-        // vctFaceElemsContours will store all the face elements' contours found
-        //  in an input image, namely 4 elements (two eyes, nose, mouth) for every detected face:
-        vctElemsContours.reserve(numFaces * 4);
-        // vctFaceElemsContours will store all the faces' contours found in an input image:
-        vctFaceContours.reserve(numFaces);
-        Contour cntFace, cntLeftEye, cntRightEye, cntNose, cntMouth;
-        cntNose.reserve(4);
-        for (size_t i = 0ul; i < numFaces; i++)
-        {
-            // The face elements contours
-            // A left eye:
-            // Approximating the lower eye contour by half-ellipse (using eye points) and storing in cntLeftEye:
-            cntLeftEye = getEyeEllipse(vctPtsFaceElems[i][1], vctPtsFaceElems[i][0]);
-            // Pushing the left eyebrow clock-wise:
-            cntLeftEye.insert(cntLeftEye.end(), {vctPtsFaceElems[i][12], vctPtsFaceElems[i][13],
-                                                 vctPtsFaceElems[i][14]});
-            // A right eye:
-            // Approximating the lower eye contour by half-ellipse (using eye points) and storing in vctRightEye:
-            cntRightEye = getEyeEllipse(vctPtsFaceElems[i][2], vctPtsFaceElems[i][3]);
-            // Pushing the right eyebrow clock-wise:
-            cntRightEye.insert(cntRightEye.end(), {vctPtsFaceElems[i][15], vctPtsFaceElems[i][16],
-                                                   vctPtsFaceElems[i][17]});
-            // A nose:
-            // Storing the nose points clock-wise
-            cntNose.clear();
-            cntNose.insert(cntNose.end(), {vctPtsFaceElems[i][4], vctPtsFaceElems[i][7],
-                                           vctPtsFaceElems[i][5], vctPtsFaceElems[i][6]});
-            // A mouth:
-            // Approximating the mouth contour by two half-ellipses (using mouth points) and storing in vctMouth:
-            cntMouth = getPatchedEllipse(vctPtsFaceElems[i][8], vctPtsFaceElems[i][9],
-                                         vctPtsFaceElems[i][10], vctPtsFaceElems[i][11]);
-            // Storing all the elements in a vector:
-            vctElemsContours.insert(vctElemsContours.end(), {cntLeftEye, cntRightEye, cntNose, cntMouth});
-            // The face contour:
-            // Approximating the forehead contour by half-ellipse (using jaw points) and storing in vctFace:
-            cntFace = getForeheadEllipse(vctCntJaw[i][0], vctCntJaw[i][16], vctCntJaw[i][8]);
-            // The ellipse is drawn clock-wise, but jaw contour points goes vice versa, so it's necessary to push
-            //  cntJaw from the end to the begin using a reverse iterator:
-            std::copy(vctCntJaw[i].crbegin(), vctCntJaw[i].crend(), std::back_inserter(cntFace));
-            // Storing the face contour in another vector:
-            vctFaceContours.push_back(cntFace);
-        }
-    }
-};
-```
+
+.. code-block:: cpp
+   
+   GAPI_OCV_KERNEL(GCPUGetContours, GGetContours)
+   {
+       static void run(const std::vector<Landmarks> &vctPtsFaceElems,  // 18 landmarks of the facial elements
+                       const std::vector<Contour>   &vctCntJaw,        // 17 landmarks of a jaw
+                             std::vector<Contour>   &vctElemsContours,
+                             std::vector<Contour>   &vctFaceContours)
+       {
+           size_t numFaces = vctCntJaw.size();
+           CV_Assert(numFaces == vctPtsFaceElems.size());
+           CV_Assert(vctElemsContours.size() == 0ul);
+           CV_Assert(vctFaceContours.size()  == 0ul);
+           // vctFaceElemsContours will store all the face elements' contours found
+           //  in an input image, namely 4 elements (two eyes, nose, mouth) for every detected face:
+           vctElemsContours.reserve(numFaces \* 4);
+           // vctFaceElemsContours will store all the faces' contours found in an input image:
+           vctFaceContours.reserve(numFaces);
+           Contour cntFace, cntLeftEye, cntRightEye, cntNose, cntMouth;
+           cntNose.reserve(4);
+           for (size_t i = 0ul; i < numFaces; i++)
+           {
+               // The face elements contours
+               // A left eye:
+               // Approximating the lower eye contour by half-ellipse (using eye points) and storing in cntLeftEye:
+               cntLeftEye = getEyeEllipse(vctPtsFaceElems[i][1], vctPtsFaceElems[i][0]);
+               // Pushing the left eyebrow clock-wise:
+               cntLeftEye.insert(cntLeftEye.end(), {vctPtsFaceElems[i][12], vctPtsFaceElems[i][13],
+                                                    vctPtsFaceElems[i][14]});
+               // A right eye:
+               // Approximating the lower eye contour by half-ellipse (using eye points) and storing in vctRightEye:
+               cntRightEye = getEyeEllipse(vctPtsFaceElems[i][2], vctPtsFaceElems[i][3]);
+               // Pushing the right eyebrow clock-wise:
+               cntRightEye.insert(cntRightEye.end(), {vctPtsFaceElems[i][15], vctPtsFaceElems[i][16],
+                                                      vctPtsFaceElems[i][17]});
+               // A nose:
+               // Storing the nose points clock-wise
+               cntNose.clear();
+               cntNose.insert(cntNose.end(), {vctPtsFaceElems[i][4], vctPtsFaceElems[i][7],
+                                              vctPtsFaceElems[i][5], vctPtsFaceElems[i][6]});
+               // A mouth:
+               // Approximating the mouth contour by two half-ellipses (using mouth points) and storing in vctMouth:
+               cntMouth = getPatchedEllipse(vctPtsFaceElems[i][8], vctPtsFaceElems[i][9],
+                                            vctPtsFaceElems[i][10], vctPtsFaceElems[i][11]);
+               // Storing all the elements in a vector:
+               vctElemsContours.insert(vctElemsContours.end(), {cntLeftEye, cntRightEye, cntNose, cntMouth});
+               // The face contour:
+               // Approximating the forehead contour by half-ellipse (using jaw points) and storing in vctFace:
+               cntFace = getForeheadEllipse(vctCntJaw[i][0], vctCntJaw[i][16], vctCntJaw[i][8]);
+               // The ellipse is drawn clock-wise, but jaw contour points goes vice versa, so it's necessary to push
+               //  cntJaw from the end to the begin using a reverse iterator:
+               std::copy(vctCntJaw[i].crbegin(), vctCntJaw[i].crend(), std::back_inserter(cntFace));
+               // Storing the face contour in another vector:
+               vctFaceContours.push_back(cntFace);
+           }
+       }
+   };
+
+
 The kernel takes two arrays of denormalized landmarks coordinates and returns an array of elements' closed contours and an array of faces' closed contours; in other words, outputs are, the first, an array of contours of image areas to be sharpened and, the second, another one to be smoothed.

-Here and below `Contour` is a vector of points.
+Here and below ``Contour`` is a vector of points.
+
+Get an Eye Contour
+------------------

-#### Get an Eye Contour
 Eye contours are estimated with the following function:
-```cpp
-inline int custom::getLineInclinationAngleDegrees(const cv::Point &ptLeft, const cv::Point &ptRight)
-{
-    const cv::Point residual = ptRight - ptLeft;
-    if (residual.y == 0 && residual.x == 0)
-        return 0;
-    else
-        return toIntRounded(atan2(toDouble(residual.y), toDouble(residual.x)) * 180.0 / CV_PI);
-}
-inline Contour custom::getEyeEllipse(const cv::Point &ptLeft, const cv::Point &ptRight)
-{
-    Contour cntEyeBottom;
-    const cv::Point ptEyeCenter((ptRight + ptLeft) / 2);
-    const int angle = getLineInclinationAngleDegrees(ptLeft, ptRight);
-    const int axisX = toIntRounded(cv::norm(ptRight - ptLeft) / 2.0);
-    // According to research, in average a Y axis of an eye is approximately
-    //  1/3 of an X one.
-    const int axisY = axisX / 3;
-    // We need the lower part of an ellipse:
-    static constexpr int kAngEyeStart = 0;
-    static constexpr int kAngEyeEnd   = 180;
-    cv::ellipse2Poly(ptEyeCenter, cv::Size(axisX, axisY), angle, kAngEyeStart, kAngEyeEnd, config::kAngDelta,
-                     cntEyeBottom);
-    return cntEyeBottom;
-}
-```
-Briefly, this function restores the bottom side of an eye by a half-ellipse based on two points in left and right eye corners. In fact, `cv::ellipse2Poly()` is used to approximate the eye region, and the function only defines ellipse parameters based on just two points: 
- The ellipse center and the \f$X\f$ half-axis calculated by two eye Points.
- The \f$Y\f$ half-axis calculated according to the assumption that an average eye width is \f$1/3\f$ of its length.
- The start and the end angles which are 0 and 180 (refer to `cv::ellipse()` documentation).
+
+.. code-block:: cpp
+   
+   inline int custom::getLineInclinationAngleDegrees(const cv::Point &ptLeft, const cv::Point &ptRight)
+   {
+       const cv::Point residual = ptRight - ptLeft;
+       if (residual.y == 0 && residual.x == 0)
+           return 0;
+       else
+           return toIntRounded(atan2(toDouble(residual.y), toDouble(residual.x)) \* 180.0 / CV_PI);
+   }
+   inline Contour custom::getEyeEllipse(const cv::Point &ptLeft, const cv::Point &ptRight)
+   {
+       Contour cntEyeBottom;
+       const cv::Point ptEyeCenter((ptRight + ptLeft) / 2);
+       const int angle = getLineInclinationAngleDegrees(ptLeft, ptRight);
+       const int axisX = toIntRounded(cv::norm(ptRight - ptLeft) / 2.0);
+       // According to research, in average a Y axis of an eye is approximately
+       //  1/3 of an X one.
+       const int axisY = axisX / 3;
+       // We need the lower part of an ellipse:
+       static constexpr int kAngEyeStart = 0;
+       static constexpr int kAngEyeEnd   = 180;
+       cv::ellipse2Poly(ptEyeCenter, cv::Size(axisX, axisY), angle, kAngEyeStart, kAngEyeEnd, config::kAngDelta,
+                        cntEyeBottom);
+       return cntEyeBottom;
+   }
+
+Briefly, this function restores the bottom side of an eye by a half-ellipse based on two points in left and right eye corners. In fact, ``cv::ellipse2Poly()`` is used to approximate the eye region, and the function only defines ellipse parameters based on just two points: 
+
+- The ellipse center and the :math:`X` half-axis calculated by two eye Points.
+- The :math:`Y` half-axis calculated according to the assumption that an average eye width is :math:`1/3` of its length.
+- The start and the end angles which are 0 and 180 (refer to ``cv::ellipse()`` documentation).
 - The angle delta: how much points to produce in the contour.
 - The inclination angle of the axes.

-The use of the `atan2()` instead of just `atan()` in function `custom::getLineInclinationAngleDegrees()` is essential as it allows to return a negative value depending on the `x` and the `y` signs so we can get the right angle even in case of upside-down face arrangement (if we put the points in the right order, of course).
+The use of the ``atan2()`` instead of just ``atan()`` in function ``custom::getLineInclinationAngleDegrees()`` is essential as it allows to return a negative value depending on the ``x`` and the ``y`` signs so we can get the right angle even in case of upside-down face arrangement (if we put the points in the right order, of course).
+
+Get a Forehead Contour
+----------------------

-#### Get a Forehead Contour
 The function approximates the forehead contour:
-```cpp
-inline Contour custom::getForeheadEllipse(const cv::Point &ptJawLeft,
-                                          const cv::Point &ptJawRight,
-                                          const cv::Point &ptJawLower)
-{
-    Contour cntForehead;
-    // The point amid the top two points of a jaw:
-    const cv::Point ptFaceCenter((ptJawLeft + ptJawRight) / 2);
-    // This will be the center of the ellipse.
-    // The angle between the jaw and the vertical:
-    const int angFace = getLineInclinationAngleDegrees(ptJawLeft, ptJawRight);
-    // This will be the inclination of the ellipse
-    // Counting the half-axis of the ellipse:
-    const double jawWidth  = cv::norm(ptJawLeft - ptJawRight);
-    // A forehead width equals the jaw width, and we need a half-axis:
-    const int axisX        = toIntRounded(jawWidth / 2.0);
-    const double jawHeight = cv::norm(ptFaceCenter - ptJawLower);
-    // According to research, in average a forehead is approximately 2/3 of
-    //  a jaw:
-    const int axisY        = toIntRounded(jawHeight * 2 / 3.0);
-    // We need the upper part of an ellipse:
-    static constexpr int kAngForeheadStart = 180;
-    static constexpr int kAngForeheadEnd   = 360;
-    cv::ellipse2Poly(ptFaceCenter, cv::Size(axisX, axisY), angFace, kAngForeheadStart, kAngForeheadEnd,
-                     config::kAngDelta, cntForehead);
-    return cntForehead;
-}
-```
-As we have only jaw points in our detected landmarks, we have to get a half-ellipse based on three points of a jaw: the leftmost, the rightmost and the lowest one. The jaw width is assumed to be equal to the forehead width and the latter is calculated using the left and the right points. Speaking of the \f$Y\f$ axis, we have no points to get it directly, and instead assume that the forehead height is about \f$2/3\f$ of the jaw height, which can be figured out from the face center (the middle between the left and right points) and the lowest jaw point.

-### Draw Masks
+.. code-block:: cpp
+   
+   inline Contour custom::getForeheadEllipse(const cv::Point &ptJawLeft,
+                                             const cv::Point &ptJawRight,
+                                             const cv::Point &ptJawLower)
+   {
+       Contour cntForehead;
+       // The point amid the top two points of a jaw:
+       const cv::Point ptFaceCenter((ptJawLeft + ptJawRight) / 2);
+       // This will be the center of the ellipse.
+       // The angle between the jaw and the vertical:
+       const int angFace = getLineInclinationAngleDegrees(ptJawLeft, ptJawRight);
+       // This will be the inclination of the ellipse
+       // Counting the half-axis of the ellipse:
+       const double jawWidth  = cv::norm(ptJawLeft - ptJawRight);
+       // A forehead width equals the jaw width, and we need a half-axis:
+       const int axisX        = toIntRounded(jawWidth / 2.0);
+       const double jawHeight = cv::norm(ptFaceCenter - ptJawLower);
+       // According to research, in average a forehead is approximately 2/3 of
+       //  a jaw:
+       const int axisY        = toIntRounded(jawHeight \* 2 / 3.0);
+       // We need the upper part of an ellipse:
+       static constexpr int kAngForeheadStart = 180;
+       static constexpr int kAngForeheadEnd   = 360;
+       cv::ellipse2Poly(ptFaceCenter, cv::Size(axisX, axisY), angFace, kAngForeheadStart, kAngForeheadEnd,
+                        config::kAngDelta, cntForehead);
+       return cntForehead;
+   }
+
+
+As we have only jaw points in our detected landmarks, we have to get a half-ellipse based on three points of a jaw: the leftmost, the rightmost and the lowest one. The jaw width is assumed to be equal to the forehead width and the latter is calculated using the left and the right points. Speaking of the :math:`Y` axis, we have no points to get it directly, and instead assume that the forehead height is about :math:`2/3` of the jaw height, which can be figured out from the face center (the middle between the left and right points) and the lowest jaw point.
+
+Draw Masks
++++++++++
+
 When we have all the contours needed, you are able to draw masks:

-```cpp
-cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
-cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
-                                                  config::kGSigma);                         // |
-cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
-cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
-                                                  config::kGSigma);                         // |draw masks
-// The first argument in mask() is Blur as we want to subtract from                         // |
-// BlurG the next step:                                                                     // |
-cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
-cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
-cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
-cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
-```
+.. code-block:: cpp
+   
+   cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
+   cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
+                                                     config::kGSigma);                         // |
+   cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
+   cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
+                                                     config::kGSigma);                         // |draw masks
+   // The first argument in mask() is Blur as we want to subtract from                         // |
+   // BlurG the next step:                                                                     // |
+   cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
+   cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
+   cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
+   cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
+   

 The steps to get the masks are:
-* the "sharp" mask calculation:
-    * fill the contours that should be sharpened;
-    * blur that to get the "sharp" mask (`mskSharpG`);
-* the "bilateral" mask calculation:
-    * fill all the face contours fully;
-    * blur that;
-    * subtract areas which intersect with the "sharp" mask --- and get the "bilateral" mask (`mskBlurFinal`);
-* the background mask calculation:
-    * add two previous masks
-    * set all non-zero pixels of the result as 255 (by `cv::gapi::threshold()`)
-    * revert the output (by `cv::gapi::bitwise_not`) to get the background mask (`mskNoFaces`).

-## Configuring and Running the Pipeline
+* the "sharp" mask calculation:
+    
+  * fill the contours that should be sharpened;
+  * blur that to get the "sharp" mask (``mskSharpG``);
+* the "bilateral" mask calculation:
+    
+  * fill all the face contours fully;
+  * blur that;
+  * subtract areas which intersect with the "sharp" mask --- and get the "bilateral" mask (``mskBlurFinal``);
+* the background mask calculation:
+    
+  * add two previous masks
+  * set all non-zero pixels of the result as 255 (by ``cv::gapi::threshold()``)
+  * revert the output (by ``cv::gapi::bitwise_not``) to get the background mask (``mskNoFaces``).
+
+Configuring and Running the Pipeline
+####################################
+
 Once the graph is fully expressed, we can finally compile it and run on real data. G-API graph compilation is the stage where the G-API framework actually understands which kernels and networks to use. This configuration happens via G-API compilation arguments.

-### DNN Parameters
+DNN Parameters
++++++++++++++
+
 This sample is using OpenVINO™ Toolkit OpenVINO Runtime backend for DL inference, which is configured the following way:
-```cpp
-auto faceParams  = cv::gapi::ie::Params<custom::FaceDetector>
-{
-    /*std::string*/ faceXmlPath,
-    /*std::string*/ faceBinPath,
-    /*std::string*/ faceDevice
-};
-auto landmParams = cv::gapi::ie::Params<custom::LandmDetector>
-{
-    /*std::string*/ landmXmlPath,
-    /*std::string*/ landmBinPath,
-    /*std::string*/ landmDevice
-};
-```
-Every `cv::gapi::ie::Params<>` object is related to the network specified in its template argument. We should pass there the network type we have defined in `G_API_NET()` in the early beginning of the tutorial.

-Network parameters are then wrapped in `cv::gapi::NetworkPackage`:
-```cpp
-auto networks      = cv::gapi::networks(faceParams, landmParams);
-```
+.. code-block:: cpp
+   
+   auto faceParams  = cv::gapi::ie::Params<custom::FaceDetector>
+   {
+       /\*std::string\*/ faceXmlPath,
+       /\*std::string\*/ faceBinPath,
+       /\*std::string\*/ faceDevice
+   };
+   auto landmParams = cv::gapi::ie::Params<custom::LandmDetector>
+   {
+       /\*std::string\*/ landmXmlPath,
+       /\*std::string\*/ landmBinPath,
+       /\*std::string\*/ landmDevice
+   };

-More details in "Face Analytics Pipeline" ([Configuring the Pipeline](@ref gapi_ifd_configuration) section).
+Every ``cv::gapi::ie::Params<>`` object is related to the network specified in its template argument. We should pass there the network type we have defined in ``G_API_NET()`` in the early beginning of the tutorial.
+
+Network parameters are then wrapped in ``cv::gapi::NetworkPackage``:
+
+.. code-block:: cpp
+   
+   auto networks      = cv::gapi::networks(faceParams, landmParams);
+
+
+More details in "Face Analytics Pipeline" (:ref:`Configuring the Pipeline <gapi_ifd_configuration>` section).
+
+Kernel Packages
+++++++++++++++

-### Kernel Packages
 In this example we use a lot of custom kernels, in addition to that we use Fluid backend to optimize out memory for G-API's standard kernels where applicable. The resulting kernel package is formed like this:
-```cpp
-auto customKernels = cv::gapi::kernels<custom::GCPUBilateralFilter,
-                                       custom::GCPULaplacian,
-                                       custom::GCPUFillPolyGContours,
-                                       custom::GCPUPolyLines,
-                                       custom::GCPURectangle,
-                                       custom::GCPUFacePostProc,
-                                       custom::GCPULandmPostProc,
-                                       custom::GCPUGetContours>();
-auto kernels       = cv::gapi::combine(cv::gapi::core::fluid::kernels(),
-                                           customKernels);
-```

-### Compiling the Streaming Pipeline
+.. code-block:: cpp
+   
+   auto customKernels = cv::gapi::kernels<custom::GCPUBilateralFilter,
+                                          custom::GCPULaplacian,
+                                          custom::GCPUFillPolyGContours,
+                                          custom::GCPUPolyLines,
+                                          custom::GCPURectangle,
+                                          custom::GCPUFacePostProc,
+                                          custom::GCPULandmPostProc,
+                                          custom::GCPUGetContours>();
+   auto kernels       = cv::gapi::combine(cv::gapi::core::fluid::kernels(),
+                                              customKernels);
+
+
+Compiling the Streaming Pipeline
++++++++++++++++++++++++++++++++
+
 G-API optimizes execution for video streams when compiled in the "Streaming" mode.

-```cpp
-cv::GStreamingCompiled stream = pipeline.compileStreaming(cv::compile_args(kernels, networks));
-```
-More on this in "Face Analytics Pipeline" ([Configuring the pipeline](@ref gapi_ifd_configuration) section).
+.. code-block:: cpp
+   
+   cv::GStreamingCompiled stream = pipeline.compileStreaming(cv::compile_args(kernels, networks));
+
+More on this in "Face Analytics Pipeline" (:ref:`Configuring the Pipeline <gapi_ifd_configuration>` section).
+
+Running the streaming pipeline
++++++++++++++++++++++++++++++
+
+
+In order to run the G-API streaming pipeline, all we need is to specify the input video source, call ``cv::GStreamingCompiled::start()``, and then fetch the pipeline processing results:
+
+.. code-block:: cpp
+   
+   if (parser.has("input"))
+   {
+       stream.setSource(cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(parser.get<cv::String>("input")));
+   }
+       auto out_vector = cv::gout(imgBeautif, imgShow, vctFaceConts,
+                                  vctElsConts, vctRects);
+       stream.start();
+       avg.start();
+       while (stream.running())
+       {
+           if (!stream.try_pull(std::move(out_vector)))
+           {
+               // Use a try_pull() to obtain data.
+               // If there's no data, let UI refresh (and handle keypress)
+               if (cv::waitKey(1) >= 0) break;
+               else continue;
+           }
+           frames++;
+           // Drawing face boxes and landmarks if necessary:
+           if (flgLandmarks == true)
+           {
+               cv::polylines(imgShow, vctFaceConts, config::kClosedLine,
+                             config::kClrYellow);
+               cv::polylines(imgShow, vctElsConts, config::kClosedLine,
+                             config::kClrYellow);
+           }
+           if (flgBoxes == true)
+               for (auto rect : vctRects)
+                   cv::rectangle(imgShow, rect, config::kClrGreen);
+           cv::imshow(config::kWinInput,              imgShow);
+           cv::imshow(config::kWinFaceBeautification, imgBeautif);
+       }
+   

-### Running the streaming pipeline
-In order to run the G-API streaming pipeline, all we need is to specify the input video source, call `cv::GStreamingCompiled::start()`, and then fetch the pipeline processing results:
-```cpp
-if (parser.has("input"))
-{
-    stream.setSource(cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(parser.get<cv::String>("input")));
-}
-    auto out_vector = cv::gout(imgBeautif, imgShow, vctFaceConts,
-                               vctElsConts, vctRects);
-    stream.start();
-    avg.start();
-    while (stream.running())
-    {
-        if (!stream.try_pull(std::move(out_vector)))
-        {
-            // Use a try_pull() to obtain data.
-            // If there's no data, let UI refresh (and handle keypress)
-            if (cv::waitKey(1) >= 0) break;
-            else continue;
-        }
-        frames++;
-        // Drawing face boxes and landmarks if necessary:
-        if (flgLandmarks == true)
-        {
-            cv::polylines(imgShow, vctFaceConts, config::kClosedLine,
-                          config::kClrYellow);
-            cv::polylines(imgShow, vctElsConts, config::kClosedLine,
-                          config::kClrYellow);
-        }
-        if (flgBoxes == true)
-            for (auto rect : vctRects)
-                cv::rectangle(imgShow, rect, config::kClrGreen);
-        cv::imshow(config::kWinInput,              imgShow);
-        cv::imshow(config::kWinFaceBeautification, imgBeautif);
-    }
-```
 Once results are ready and can be pulled from the pipeline we display it on the screen and handle GUI events.

-See [Running the pipeline](@ref gapi_ifd_running) section in the "Face Analytics Pipeline" tutorial for more details.
+See :ref:`Running the pipeline <gapi_ifd_running>` section in the "Face Analytics Pipeline" tutorial for more details.
+
+Conclusion
+##########

-## Conclusion
 The tutorial has two goals: to show the use of brand new features of G-API introduced in OpenCV 4.2, and give a basic understanding on a sample face beautification algorithm.

 The result of the algorithm application:

-![Face Beautification example](../img/gapi_face_beautification_example.jpg)
+.. image:: _static/images/gapi_face_beautification_example.jpg
+
+On the test machine (Intel® Core™ i7-8700) the G-API-optimized video pipeline outperforms its serial (non-pipelined) version by a factor of 2.7 – meaning that for such a non-trivial graph, the proper pipelining can bring almost 3x increase in performance.
+
+@endsphinxdirective

-On the test machine (Intel® Core™ i7-8700) the G-API-optimized video pipeline outperforms its serial (non-pipelined) version by a factor of 2.7 – meaning that for such a non-trivial graph, the proper pipelining can bring almost 3x increase in performance.
--- a/docs/gapi/gapi_face_analytics_pipeline.md
+++ b/docs/gapi/gapi_face_analytics_pipeline.md
@ -1,152 +1,177 @@
 # Building a Face Analytics Pipeline {#openvino_docs_gapi_gapi_face_analytics_pipeline}

-## Overview
+@sphinxdirective
+
+Overview
+########
+
 In this tutorial you will learn:

 * How to integrate Deep Learning inference in a G-API graph.
 * How to run a G-API graph on a video stream and obtain data from it.

-## Prerequisites
+Prerequisites
+#############
+
 This sample requires:

 * PC with GNU/Linux or Microsoft Windows (Apple macOS is supported but was not tested)
-* OpenCV 4.2 or higher built with [Intel® Distribution of OpenVINO™ Toolkit](https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html) (building with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) is a plus)
-* The following pre-trained models from the [Open Model Zoo](@ref omz_models_group_intel):
-    * [face-detection-adas-0001](@ref omz_models_model_face_detection_adas_0001)
-    * [age-gender-recognition-retail-0013](@ref omz_models_model_age_gender_recognition_retail_0013)
-    * [emotions-recognition-retail-0003](@ref omz_models_model_emotions_recognition_retail_0003)
+* OpenCV 4.2 or higher built with `Intel® Distribution of OpenVINO™ Toolkit <https://software.intel.com/content/www/us/en/develop/tools/openvino-toolkit.html>`__ (building with `Intel® TBB <https://www.threadingbuildingblocks.org/intel-tbb-tutorial>`__ is a plus)
+* The following pre-trained models from the :doc:`Open Model Zoo <omz_models_group_intel>`

-To download the models from the Open Model Zoo, use the [Model Downloader](@ref omz_tools_downloader) tool.
+  * `face-detection-adas-0001 <https://docs.openvino.ai/latest/omz_models_model_face_detection_adas_0001.html#doxid-omz-models-model-face-detection-adas-0001>`__
+  * `age-gender-recognition-retail-0013 <https://docs.openvino.ai/latest/omz_models_model_age_gender_recognition_retail_0013.html#doxid-omz-models-model-age-gender-recognition-retail-0013>`__
+  * `emotions-recognition-retail-0003 <https://docs.openvino.ai/latest/omz_models_model_emotions_recognition_retail_0003.html#doxid-omz-models-model-emotions-recognition-retail-0003>`__
+
+To download the models from the Open Model Zoo, use the :doc:`Model Downloader <omz_tools_downloader>` tool.
+
+Introduction: Why G-API
+#######################

-## Introduction: Why G-API
 Many computer vision algorithms run on a video stream rather than on individual images. Stream processing usually consists of multiple steps – like decode, preprocessing, detection, tracking, classification (on detected objects), and visualization – forming a *video processing pipeline*. Moreover, many these steps of such pipeline can run in parallel – modern platforms have different hardware blocks on the same chip like decoders and GPUs, and extra accelerators can be plugged in as extensions for deep learning offload.

 Given all this manifold of options and a variety in video analytics algorithms, managing such pipelines effectively quickly becomes a problem. For sure it can be done manually, but this approach doesn't scale: if a change is required in the algorithm (e.g. a new pipeline step is added), or if it is ported on a new platform with different capabilities, the whole pipeline needs to be re-optimized.

 Starting with version 4.2, OpenCV offers a solution to this problem. OpenCV G-API now can manage Deep Learning inference (a cornerstone of any modern analytics pipeline) with a traditional Computer Vision as well as video capturing/decoding, all in a single pipeline. G-API takes care of pipelining itself – so if the algorithm or platform changes, the execution model adapts to it automatically.

-## Pipeline Overview
-Our sample application is based on [Interactive Face Detection](@ref omz_demos_interactive_face_detection_demo_cpp) demo from Open Model Zoo. A simplified pipeline consists of the following steps:
+Pipeline Overview
+#################
+
+Our sample application is based on `Interactive Face Detection <https://docs.openvino.ai/latest/omz_demos_interactive_face_detection_demo_cpp.html#doxid-omz-demos-interactive-face-detection-demo-cpp>`__ demo from Open Model Zoo. A simplified pipeline consists of the following steps:

 1. Image acquisition and decode
 2. Detection with preprocessing
 3. Classification with preprocessing for every detected object with two networks
 4. Visualization

-![Face Analytics Pipeline Overview](../img/gapi_face_analytics_pipeline.png)
+.. image:: _static/images/gapi_face_analytics_pipeline.png

-## Construct a pipeline {#gapi_ifd_constructing}
+.. _gapi_ifd_constructing:

-Constructing a G-API graph for a video streaming case does not differ much from a [regular usage](https://docs.opencv.org/4.5.0/d0/d1e/gapi.html#gapi_example) of G-API -- it is still about defining graph *data* (with cv::GMat, `cv::GScalar`, and `cv::GArray`) and *operations* over it. Inference also becomes an operation in the graph, but is defined in a little bit different way.
+Construct a pipeline
+####################

-### Declare Deep Learning topologies {#gapi_ifd_declaring_nets}
+Constructing a G-API graph for a video streaming case does not differ much from a `regular usage <https://docs.opencv.org/4.5.0/d0/d1e/gapi.html#gapi_example>`__ of G-API -- it is still about defining graph *data* (with cv::GMat, ``cv::GScalar``, and ``cv::GArray``) and *operations* over it. Inference also becomes an operation in the graph, but is defined in a little bit different way.

-In contrast with traditional CV functions (see [core](https://docs.opencv.org/4.5.0/df/d1f/group__gapi__core.html) and [imgproc](https://docs.opencv.org/4.5.0/d2/d00/group__gapi__imgproc.html)) where G-API declares distinct operations for every function, inference in G-API is a single generic operation `cv::gapi::infer<>`. As usual, it is just an interface and it can be implemented in a number of ways under the hood. In OpenCV 4.2, only OpenVINO™ Runtime-based backend is available, and OpenCV's own DNN module-based backend is to come.
+.. _gapi_ifd_declaring_nets:

-`cv::gapi::infer<>` is _parametrized_ by the details of a topology we are going to execute. Like operations, topologies in G-API are strongly typed and are defined with a special macro `G_API_NET()`:
+Declare Deep Learning topologies
++++++++++++++++++++++++++++++++

-```cpp
-// Face detector: takes one Mat, returns another Mat
-G_API_NET(Faces, <cv::GMat(cv::GMat)>, "face-detector");
-// Age/Gender recognition - takes one Mat, returns two:
-// one for Age and one for Gender. In G-API, multiple-return-value operations
-// are defined using std::tuple<>.
-using AGInfo = std::tuple<cv::GMat, cv::GMat>;
-G_API_NET(AgeGender, <AGInfo(cv::GMat)>,   "age-gender-recoginition");
-// Emotion recognition - takes one Mat, returns another.
-G_API_NET(Emotions, <cv::GMat(cv::GMat)>, "emotions-recognition");
-```
+In contrast with traditional CV functions (see `core <https://docs.opencv.org/4.5.0/df/d1f/group__gapi__core.html>`__ and `imgproc <https://docs.opencv.org/4.5.0/d2/d00/group__gapi__imgproc.html>`__) where G-API declares distinct operations for every function, inference in G-API is a single generic operation ``cv::gapi::infer<>``. As usual, it is just an interface and it can be implemented in a number of ways under the hood. In OpenCV 4.2, only OpenVINO™ Runtime-based backend is available, and OpenCV's own DNN module-based backend is to come.
+
+The ``cv::gapi::infer<>`` is _parametrized_ by the details of a topology we are going to execute. Like operations, topologies in G-API are strongly typed and are defined with a special macro ``G_API_NET()``:
+
+.. code-block:: cpp
+   
+   // Face detector: takes one Mat, returns another Mat
+   G_API_NET(Faces, <cv::GMat(cv::GMat)>, "face-detector");
+   // Age/Gender recognition - takes one Mat, returns two:
+   // one for Age and one for Gender. In G-API, multiple-return-value operations
+   // are defined using std::tuple<>.
+   using AGInfo = std::tuple<cv::GMat, cv::GMat>;
+   G_API_NET(AgeGender, <AGInfo(cv::GMat)>,   "age-gender-recoginition");
+   // Emotion recognition - takes one Mat, returns another.
+   G_API_NET(Emotions, <cv::GMat(cv::GMat)>, "emotions-recognition");
+
+Similar to how operations are defined with ``G_API_OP()``, network description requires three parameters:

-Similar to how operations are defined with `G_API_OP()`, network description requires three parameters:
 1. A type name. Every defined topology is declared as a distinct C++ type which is used further in the program -- see below.
-2. A `std::function<>`-like API signature. G-API traits networks as regular "functions" which take and return data. Here network `Faces` (a detector) takes a `cv::GMat` and returns a `cv::GMat`, while network `AgeGender` is known to provide two outputs (age and gender blobs, respectively) -- so its has a `std::tuple<>` as a return type.
+2. A ``std::function<>``-like API signature. G-API traits networks as regular "functions" which take and return data. Here network ``Faces`` (a detector) takes a ``cv::GMat`` and returns a ``cv::GMat``, while network ``AgeGender`` is known to provide two outputs (age and gender blobs, respectively) -- so its has a ``std::tuple<>`` as a return type.
 3. A topology name -- can be any non-empty string, G-API is using these names to distinguish networks inside. Names should be unique in the scope of a single graph.

-## Building a GComputation {#gapi_ifd_gcomputation}
+.. _gapi_ifd_gcomputation:
+
+Building a GComputation
+#######################

 Now the above pipeline is expressed in G-API like this:

-```cpp
-cv::GComputation pp([]() {
-    // Declare an empty GMat - the beginning of the pipeline.
-    cv::GMat in;
-    // Run face detection on the input frame. Result is a single GMat,
-    // internally representing an 1x1x200x7 SSD output.
-    // This is a single-patch version of infer:
-    // - Inference is running on the whole input image;
-    // - Image is converted and resized to the network's expected format
-    //   automatically.
-    cv::GMat detections = cv::gapi::infer<custom::Faces>(in);
-    // Parse SSD output to a list of ROI (rectangles) using
-    // a custom kernel. Note: parsing SSD may become a "standard" kernel.
-    cv::GArray<cv::Rect> faces = custom::PostProc::on(detections, in);
-    // Now run Age/Gender model on every detected face. This model has two
-    // outputs (for age and gender respectively).
-    // A special ROI-list-oriented form of infer<>() is used here:
-    // - First input argument is the list of rectangles to process,
-    // - Second one is the image where to take ROI from;
-    // - Crop/Resize/Layout conversion happens automatically for every image patch
-    //   from the list
-    // - Inference results are also returned in form of list (GArray<>)
-    // - Since there're two outputs, infer<> return two arrays (via std::tuple).
-    cv::GArray<cv::GMat> ages;
-    cv::GArray<cv::GMat> genders;
-    std::tie(ages, genders) = cv::gapi::infer<custom::AgeGender>(faces, in);
-    // Recognize emotions on every face.
-    // ROI-list-oriented infer<>() is used here as well.
-    // Since custom::Emotions network produce a single output, only one
-    // GArray<> is returned here.
-    cv::GArray<cv::GMat> emotions = cv::gapi::infer<custom::Emotions>(faces, in);
-    // Return the decoded frame as a result as well.
-    // Input matrix can't be specified as output one, so use copy() here
-    // (this copy will be optimized out in the future).
-    cv::GMat frame = cv::gapi::copy(in);
-    // Now specify the computation's boundaries - our pipeline consumes
-    // one images and produces five outputs.
-    return cv::GComputation(cv::GIn(in),
-                            cv::GOut(frame, faces, ages, genders, emotions));
-});
-```
+.. code-block:: cpp
+   
+   cv::GComputation pp([]() {
+       // Declare an empty GMat - the beginning of the pipeline.
+       cv::GMat in;
+       // Run face detection on the input frame. Result is a single GMat,
+       // internally representing an 1x1x200x7 SSD output.
+       // This is a single-patch version of infer:
+       // - Inference is running on the whole input image;
+       // - Image is converted and resized to the network's expected format
+       //   automatically.
+       cv::GMat detections = cv::gapi::infer<custom::Faces>(in);
+       // Parse SSD output to a list of ROI (rectangles) using
+       // a custom kernel. Note: parsing SSD may become a "standard" kernel.
+       cv::GArray<cv::Rect> faces = custom::PostProc::on(detections, in);
+       // Now run Age/Gender model on every detected face. This model has two
+       // outputs (for age and gender respectively).
+       // A special ROI-list-oriented form of infer<>() is used here:
+       // - First input argument is the list of rectangles to process,
+       // - Second one is the image where to take ROI from;
+       // - Crop/Resize/Layout conversion happens automatically for every image patch
+       //   from the list
+       // - Inference results are also returned in form of list (GArray<>)
+       // - Since there're two outputs, infer<> return two arrays (via std::tuple).
+       cv::GArray<cv::GMat> ages;
+       cv::GArray<cv::GMat> genders;
+       std::tie(ages, genders) = cv::gapi::infer<custom::AgeGender>(faces, in);
+       // Recognize emotions on every face.
+       // ROI-list-oriented infer<>() is used here as well.
+       // Since custom::Emotions network produce a single output, only one
+       // GArray<> is returned here.
+       cv::GArray<cv::GMat> emotions = cv::gapi::infer<custom::Emotions>(faces, in);
+       // Return the decoded frame as a result as well.
+       // Input matrix can't be specified as output one, so use copy() here
+       // (this copy will be optimized out in the future).
+       cv::GMat frame = cv::gapi::copy(in);
+       // Now specify the computation's boundaries - our pipeline consumes
+       // one images and produces five outputs.
+       return cv::GComputation(cv::GIn(in),
+                               cv::GOut(frame, faces, ages, genders, emotions));
+   });

-Every pipeline starts with declaring empty data objects – which act as inputs to the pipeline. Then we call a generic `cv::gapi::infer<>` specialized to Faces detection network. `cv::gapi::infer<>` inherits its signature from its template parameter – and in this case it expects one input cv::GMat and produces one output cv::GMat.
+Every pipeline starts with declaring empty data objects – which act as inputs to the pipeline. Then we call a generic ``cv::gapi::infer<>`` specialized to Faces detection network. ``cv::gapi::infer<>`` inherits its signature from its template parameter – and in this case it expects one input cv::GMat and produces one output cv::GMat.

-In this sample we use a pre-trained SSD-based network and its output needs to be parsed to an array of detections (object regions of interest, ROIs). It is done by a custom operation custom::PostProc, which returns an array of rectangles (of type `cv::GArray<cv::Rect>`) back to the pipeline. This operation also filters out results by a confidence threshold – and these details are hidden in the kernel itself. Still, at the moment of graph construction we operate with interfaces only and don't need actual kernels to express the pipeline – so the implementation of this post-processing will be listed later.
+In this sample we use a pre-trained SSD-based network and its output needs to be parsed to an array of detections (object regions of interest, ROIs). It is done by a custom operation custom::PostProc, which returns an array of rectangles (of type ``cv::GArray<cv::Rect>``) back to the pipeline. This operation also filters out results by a confidence threshold – and these details are hidden in the kernel itself. Still, at the moment of graph construction we operate with interfaces only and don't need actual kernels to express the pipeline – so the implementation of this post-processing will be listed later.

-After detection result output is parsed to an array of objects, we can run classification on any of those. G-API doesn't support syntax for in-graph loops like `for_each()` yet, but instead `cv::gapi::infer<>` comes with a special list-oriented overload.
+After detection result output is parsed to an array of objects, we can run classification on any of those. G-API doesn't support syntax for in-graph loops like ``for_each()`` yet, but instead ``cv::gapi::infer<>`` comes with a special list-oriented overload.

-User can call `cv::gapi::infer<>` with a `cv::GArray` as the first argument, so then G-API assumes it needs to run the associated network on every rectangle from the given list of the given frame (second argument). Result of such operation is also a list – a cv::GArray of `cv::GMat`.
+User can call ``cv::gapi::infer<>`` with a ``cv::GArray`` as the first argument, so then G-API assumes it needs to run the associated network on every rectangle from the given list of the given frame (second argument). Result of such operation is also a list – a cv::GArray of ``cv::GMat``.

-Since AgeGender network itself produces two outputs, it's output type for a list-based version of `cv::gapi::infer` is a tuple of arrays. We use `std::tie()` to decompose this input into two distinct objects.
+Since AgeGender network itself produces two outputs, it's output type for a list-based version of ``cv::gapi::infer`` is a tuple of arrays. We use ``std::tie()`` to decompose this input into two distinct objects.

-Emotions network produces a single output so its list-based inference's return type is `cv::GArray<cv::GMat>`.
+Emotions network produces a single output so its list-based inference's return type is ``cv::GArray<cv::GMat>``.

-## Configure the Pipeline {#gapi_ifd_configuration}
+.. _gapi_ifd_configuration:
+
+Configure the Pipeline
+######################

 G-API strictly separates construction from configuration -- with the idea to keep algorithm code itself platform-neutral. In the above listings we only declared our operations and expressed the overall data flow, but didn't even mention that we use OpenVINO™. We only described *what* we do, but not *how* we do it. Keeping these two aspects clearly separated is the design goal for G-API.

 Platform-specific details arise when the pipeline is *compiled* -- i.e. is turned from a declarative to an executable form. The way *how* to run stuff is specified via compilation arguments, and new inference/streaming features are no exception from this rule. 

-G-API is built on backends which implement interfaces (see [Architecture](https://docs.opencv.org/4.5.0/de/d4d/gapi_hld.html) and [Kernels](kernel_api.md) for details) -- thus `cv::gapi::infer<>` is a function which can be implemented by different backends. In OpenCV 4.2, only OpenVINO™ Runtime backend for inference is available. Every inference backend in G-API has to provide a special parameterizable structure to express *backend-specific* neural network parameters -- and in this case, it is `cv::gapi::ie::Params`:
+G-API is built on backends which implement interfaces (see `Architecture <https://docs.opencv.org/4.5.0/de/d4d/gapi_hld.html>`__ and :doc:`Kernels <openvino_docs_gapi_kernel_api>` for details) thus ``cv::gapi::infer<>`` is a function which can be implemented by different backends. In OpenCV 4.2, only OpenVINO™ Runtime backend for inference is available. Every inference backend in G-API has to provide a special parameterizable structure to express *backend-specific* neural network parameters and in this case, it is ``cv::gapi::ie::Params``:

-```cpp
-auto det_net = cv::gapi::ie::Params<custom::Faces> {
-    cmd.get<std::string>("fdm"),   // read cmd args: path to topology IR
-    cmd.get<std::string>("fdw"),   // read cmd args: path to weights
-    cmd.get<std::string>("fdd"),   // read cmd args: device specifier
-};
-auto age_net = cv::gapi::ie::Params<custom::AgeGender> {
-    cmd.get<std::string>("agem"),   // read cmd args: path to topology IR
-    cmd.get<std::string>("agew"),   // read cmd args: path to weights
-    cmd.get<std::string>("aged"),   // read cmd args: device specifier
-}.cfgOutputLayers({ "age_conv3", "prob" });
-auto emo_net = cv::gapi::ie::Params<custom::Emotions> {
-    cmd.get<std::string>("emom"),   // read cmd args: path to topology IR
-    cmd.get<std::string>("emow"),   // read cmd args: path to weights
-    cmd.get<std::string>("emod"),   // read cmd args: device specifier
-};
-```
+.. code-block:: cpp
+   
+   auto det_net = cv::gapi::ie::Params<custom::Faces> {
+       cmd.get<std::string>("fdm"),   // read cmd args: path to topology IR
+       cmd.get<std::string>("fdw"),   // read cmd args: path to weights
+       cmd.get<std::string>("fdd"),   // read cmd args: device specifier
+   };
+   auto age_net = cv::gapi::ie::Params<custom::AgeGender> {
+       cmd.get<std::string>("agem"),   // read cmd args: path to topology IR
+       cmd.get<std::string>("agew"),   // read cmd args: path to weights
+       cmd.get<std::string>("aged"),   // read cmd args: device specifier
+   }.cfgOutputLayers({ "age_conv3", "prob" });
+   auto emo_net = cv::gapi::ie::Params<custom::Emotions> {
+       cmd.get<std::string>("emom"),   // read cmd args: path to topology IR
+       cmd.get<std::string>("emow"),   // read cmd args: path to weights
+       cmd.get<std::string>("emod"),   // read cmd args: device specifier
+   };

-Here we define three parameter objects: `det_net`, `age_net`, and `emo_net`. Every object is a `cv::gapi::ie::Params` structure parametrization for each particular network we use. On a compilation stage, G-API automatically matches network parameters with their `cv::gapi::infer<>` calls in graph using this information.
+
+Here we define three parameter objects: ``det_net``, ``age_net``, and ``emo_net``. Every object is a ``cv::gapi::ie::Params`` structure parametrization for each particular network we use. On a compilation stage, G-API automatically matches network parameters with their ``cv::gapi::infer<>`` calls in graph using this information.

 Regardless of the topology, every parameter structure is constructed with three string arguments – specific to the OpenVINO™ Runtime:

@ -155,171 +180,188 @@ Regardless of the topology, every parameter structure is constructed with three
 * Device where to run – "CPU", "GPU", and others – based on your OpenVINO™ Toolkit installation. These arguments are taken from the command-line parser.

 Once networks are defined and custom kernels are implemented, the pipeline is compiled for streaming:
-```cpp
-// Form a kernel package (with a single OpenCV-based implementation of our
-// post-processing) and a network package (holding our three networks).
-auto kernels = cv::gapi::kernels<custom::OCVPostProc>();
-auto networks = cv::gapi::networks(det_net, age_net, emo_net);
-// Compile our pipeline and pass our kernels & networks as
-// parameters.  This is the place where G-API learns which
-// networks & kernels we're actually operating with (the graph
-// description itself known nothing about that).
-auto cc = pp.compileStreaming(cv::compile_args(kernels, networks));
-```

-`cv::GComputation::compileStreaming()` triggers a special video-oriented form of graph compilation where G-API is trying to optimize throughput. Result of this compilation is an object of special type `cv::GStreamingCompiled` – in contrast to a traditional callable `cv::GCompiled`, these objects are closer to media players in their semantics.
+.. code-block:: cpp
+   
+   // Form a kernel package (with a single OpenCV-based implementation of our
+   // post-processing) and a network package (holding our three networks).
+   auto kernels = cv::gapi::kernels<custom::OCVPostProc>();
+   auto networks = cv::gapi::networks(det_net, age_net, emo_net);
+   // Compile our pipeline and pass our kernels & networks as
+   // parameters.  This is the place where G-API learns which
+   // networks & kernels we're actually operating with (the graph
+   // description itself known nothing about that).
+   auto cc = pp.compileStreaming(cv::compile_args(kernels, networks));

-> **NOTE**: There is no need to pass metadata arguments describing the format of the input video stream in `cv::GComputation::compileStreaming()` – G-API figures automatically what are the formats of the input vector and adjusts the pipeline to these formats on-the-fly. User still can pass metadata there as with regular `cv::GComputation::compile()` in order to fix the pipeline to the specific input format.

-## Running the Pipeline  {#gapi_ifd_running}
+The ``cv::GComputation::compileStreaming()`` triggers a special video-oriented form of graph compilation where G-API is trying to optimize throughput. Result of this compilation is an object of special type ``cv::GStreamingCompiled`` – in contrast to a traditional callable ``cv::GCompiled``, these objects are closer to media players in their semantics.
+
+.. note:: 
+   There is no need to pass metadata arguments describing the format of the input video stream in ``cv::GComputation::compileStreaming()`` – G-API figures automatically what are the formats of the input vector and adjusts the pipeline to these formats on-the-fly. User still can pass metadata there as with regular ``cv::GComputation::compile()`` in order to fix the pipeline to the specific input format.
+
+.. _gapi_ifd_running:
+
+Running the Pipeline
+####################

 Pipelining optimization is based on processing multiple input video frames simultaneously, running different steps of the pipeline in parallel. This is why it works best when the framework takes full control over the video stream.

 The idea behind streaming API is that user specifies an *input source* to the pipeline and then G-API manages its execution automatically until the source ends or user interrupts the execution. G-API pulls new image data from the source and passes it to the pipeline for processing.

-Streaming sources are represented by the interface `cv::gapi::wip::IStreamSource`. Objects implementing this interface may be passed to `GStreamingCompiled` as regular inputs via `cv::gin()` helper function. In OpenCV 4.2, only one streaming source is allowed per pipeline -- this requirement will be relaxed in the future.
+Streaming sources are represented by the interface ``cv::gapi::wip::IStreamSource``. Objects implementing this interface may be passed to ``GStreamingCompiled`` as regular inputs via ``cv::gin()`` helper function. In OpenCV 4.2, only one streaming source is allowed per pipeline -- this requirement will be relaxed in the future.

-OpenCV comes with a great class cv::VideoCapture and by default G-API ships with a stream source class based on it -- `cv::gapi::wip::GCaptureSource`. Users can implement their own
-streaming sources e.g. using [VAAPI](https://01.org/vaapi) or other Media or Networking APIs.
+OpenCV comes with a great class cv::VideoCapture and by default G-API ships with a stream source class based on it -- ``cv::gapi::wip::GCaptureSource``. Users can implement their own
+streaming sources e.g. using `VAAPI <https://01.org/vaapi>`__ or other Media or Networking APIs.

 Sample application specifies the input source as follows:
-```cpp
-auto in_src = cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(input);
-cc.setSource(cv::gin(in_src));
-```

-Please note that a GComputation may still have multiple inputs like `cv::GMat`, `cv::GScalar`, or `cv::GArray` objects. User can pass their respective host-side types (`cv::Mat`, `cv::Scalar`, `std::vector<>`) in the input vector as well, but in Streaming mode these objects will create "endless" constant streams. Mixing a real video source stream and a const data stream is allowed.
+.. code-block:: cpp
+   
+   auto in_src = cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(input);
+   cc.setSource(cv::gin(in_src));

-Running a pipeline is easy – just call `cv::GStreamingCompiled::start()` and fetch your data with blocking `cv::GStreamingCompiled::pull()` or non-blocking `cv::GStreamingCompiled::try_pull()`; repeat until the stream ends:
+Please note that a GComputation may still have multiple inputs like ``cv::GMat``, ``cv::GScalar``, or ``cv::GArray`` objects. User can pass their respective host-side types (``cv::Mat``, ``cv::Scalar``, ``std::vector<>``) in the input vector as well, but in Streaming mode these objects will create "endless" constant streams. Mixing a real video source stream and a const data stream is allowed.

-```cpp
-// After data source is specified, start the execution
-cc.start();
-// Declare data objects we will be receiving from the pipeline.
-cv::Mat frame;                      // The captured frame itself
-std::vector<cv::Rect> faces;        // Array of detected faces
-std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
-std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
-std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
-// Implement different execution policies depending on the display option
-// for the best performance.
-while (cc.running()) {
-    auto out_vector = cv::gout(frame, faces, out_ages, out_genders, out_emotions);
-    if (no_show) {
-        // This is purely a video processing. No need to balance
-        // with UI rendering.  Use a blocking pull() to obtain
-        // data. Break the loop if the stream is over.
-        if (!cc.pull(std::move(out_vector)))
-            break;
-    } else if (!cc.try_pull(std::move(out_vector))) {
-        // Use a non-blocking try_pull() to obtain data.
-        // If there's no data, let UI refresh (and handle keypress)
-        if (cv::waitKey(1) >= 0) break;
-        else continue;
-    }
-    // At this point we have data for sure (obtained in either
-    // blocking or non-blocking way).
-    frames++;
-    labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
-    labels::DrawFPS(frame, frames, avg.fps(frames));
-    if (!no_show) cv::imshow("Out", frame);
-}
-```
+Running a pipeline is easy – just call ``cv::GStreamingCompiled::start()`` and fetch your data with blocking ``cv::GStreamingCompiled::pull()`` or non-blocking ``cv::GStreamingCompiled::try_pull()``; repeat until the stream ends:
+
+.. code-block:: cpp
+   
+   // After data source is specified, start the execution
+   cc.start();
+   // Declare data objects we will be receiving from the pipeline.
+   cv::Mat frame;                      // The captured frame itself
+   std::vector<cv::Rect> faces;        // Array of detected faces
+   std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
+   std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
+   std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
+   // Implement different execution policies depending on the display option
+   // for the best performance.
+   while (cc.running()) {
+       auto out_vector = cv::gout(frame, faces, out_ages, out_genders, out_emotions);
+       if (no_show) {
+           // This is purely a video processing. No need to balance
+           // with UI rendering.  Use a blocking pull() to obtain
+           // data. Break the loop if the stream is over.
+           if (!cc.pull(std::move(out_vector)))
+               break;
+       } else if (!cc.try_pull(std::move(out_vector))) {
+           // Use a non-blocking try_pull() to obtain data.
+           // If there's no data, let UI refresh (and handle keypress)
+           if (cv::waitKey(1) >= 0) break;
+           else continue;
+       }
+       // At this point we have data for sure (obtained in either
+       // blocking or non-blocking way).
+       frames++;
+       labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
+       labels::DrawFPS(frame, frames, avg.fps(frames));
+       if (!no_show) cv::imshow("Out", frame);
+   }

 The above code may look complex but in fact it handles two modes – with and without graphical user interface (GUI):

-* When a sample is running in a "headless" mode (`--pure` option is set), this code simply pulls data from the pipeline with the blocking `pull()` until it ends. This is the most performant mode of execution.
-* When results are also displayed on the screen, the Window System needs to take some time to refresh the window contents and handle GUI events. In this case, the demo pulls data with a non-blocking `try_pull()` until there is no more data available (but it does not mark end of the stream – just means new data is not ready yet), and only then displays the latest obtained result and refreshes the screen. Reducing the time spent in GUI with this trick increases the overall performance a little bit.
+* When a sample is running in a "headless" mode (``--pure`` option is set), this code simply pulls data from the pipeline with the blocking ``pull()`` until it ends. This is the most performant mode of execution.
+* When results are also displayed on the screen, the Window System needs to take some time to refresh the window contents and handle GUI events. In this case, the demo pulls data with a non-blocking ``try_pull()`` until there is no more data available (but it does not mark end of the stream – just means new data is not ready yet), and only then displays the latest obtained result and refreshes the screen. Reducing the time spent in GUI with this trick increases the overall performance a little bit.

-## Comparison with Serial Mode
-The sample can also run in a serial mode for a reference and benchmarking purposes. In this case, a regular `cv::GComputation::compile()` is used and a regular single-frame `cv::GCompiled` object is produced; the pipelining optimization is not applied within G-API; it is the user responsibility to acquire image frames from `cv::VideoCapture` object and pass those to G-API.
+Comparison with Serial Mode
+###########################

-```cpp
-cv::VideoCapture cap(input);
-cv::Mat in_frame, frame;            // The captured frame itself
-std::vector<cv::Rect> faces;        // Array of detected faces
-std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
-std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
-std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
-while (cap.read(in_frame)) {
-    pp.apply(cv::gin(in_frame),
-             cv::gout(frame, faces, out_ages, out_genders, out_emotions),
-             cv::compile_args(kernels, networks));
-    labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
-    frames++;
-    if (frames == 1u) {
-        // Start timer only after 1st frame processed -- compilation
-        // happens on-the-fly here
-        avg.start();
-    } else {
-        // Measurfe & draw FPS for all other frames
-        labels::DrawFPS(frame, frames, avg.fps(frames-1));
-    }
-    if (!no_show) {
-        cv::imshow("Out", frame);
-        if (cv::waitKey(1) >= 0) break;
-    }
-}
-```
+The sample can also run in a serial mode for a reference and benchmarking purposes. In this case, a regular ``cv::GComputation::compile()`` is used and a regular single-frame ``cv::GCompiled`` object is produced; the pipelining optimization is not applied within G-API; it is the user responsibility to acquire image frames from ``cv::VideoCapture`` object and pass those to G-API.

-On a test machine (Intel® Core™ i5-6600), with OpenCV built with [Intel® TBB](https://www.threadingbuildingblocks.org/intel-tbb-tutorial) support, detector network assigned to CPU, and classifiers to iGPU, the pipelined sample outperformes the serial one by the factor of 1.36x (thus adding +36% in overall throughput).
+.. code-block:: cpp
+   
+   cv::VideoCapture cap(input);
+   cv::Mat in_frame, frame;            // The captured frame itself
+   std::vector<cv::Rect> faces;        // Array of detected faces
+   std::vector<cv::Mat> out_ages;      // Array of inferred ages (one blob per face)
+   std::vector<cv::Mat> out_genders;   // Array of inferred genders (one blob per face)
+   std::vector<cv::Mat> out_emotions;  // Array of classified emotions (one blob per face)
+   while (cap.read(in_frame)) {
+       pp.apply(cv::gin(in_frame),
+                cv::gout(frame, faces, out_ages, out_genders, out_emotions),
+                cv::compile_args(kernels, networks));
+       labels::DrawResults(frame, faces, out_ages, out_genders, out_emotions);
+       frames++;
+       if (frames == 1u) {
+           // Start timer only after 1st frame processed -- compilation
+           // happens on-the-fly here
+           avg.start();
+       } else {
+           // Measurfe & draw FPS for all other frames
+           labels::DrawFPS(frame, frames, avg.fps(frames-1));
+       }
+       if (!no_show) {
+           cv::imshow("Out", frame);
+           if (cv::waitKey(1) >= 0) break;
+       }
+   }
+
+On a test machine (Intel® Core™ i5-6600), with OpenCV built with `Intel® TBB <https://www.threadingbuildingblocks.org/intel-tbb-tutorial>`__ support, detector network assigned to CPU, and classifiers to iGPU, the pipelined sample outperformes the serial one by the factor of 1.36x (thus adding +36% in overall throughput).
+
+Conclusion
+###########

-## Conclusion
 G-API introduces a technological way to build and optimize hybrid pipelines. Switching to a new execution model does not require changes in the algorithm code expressed with G-API – only the way how graph is triggered differs.

-## Listing: Post-Processing Kernel
-G-API gives an easy way to plug custom code into the pipeline even if it is running in a streaming mode and processing tensor data. Inference results are represented by multi-dimensional `cv::Mat` objects so accessing those is as easy as with a regular DNN module.
+Listing: Post-Processing Kernel
+###############################
+
+G-API gives an easy way to plug custom code into the pipeline even if it is running in a streaming mode and processing tensor data. Inference results are represented by multi-dimensional ``cv::Mat`` objects so accessing those is as easy as with a regular DNN module.

 The OpenCV-based SSD post-processing kernel is defined and implemented in this sample as follows:
-```cpp
-// SSD Post-processing function - this is not a network but a kernel.
-// The kernel body is declared separately, this is just an interface.
-// This operation takes two Mats (detections and the source image),
-// and returns a vector of ROI (filtered by a default threshold).
-// Threshold (or a class to select) may become a parameter, but since
-// this kernel is custom, it doesn't make a lot of sense.
-G_API_OP(PostProc, <cv::GArray<cv::Rect>(cv::GMat, cv::GMat)>, "custom.fd_postproc") {
-    static cv::GArrayDesc outMeta(const cv::GMatDesc &, const cv::GMatDesc &) {
-        // This function is required for G-API engine to figure out
-        // what the output format is, given the input parameters.
-        // Since the output is an array (with a specific type),
-        // there's nothing to describe.
-        return cv::empty_array_desc();
-    }
-};
-// OpenCV-based implementation of the above kernel.
-GAPI_OCV_KERNEL(OCVPostProc, PostProc) {
-    static void run(const cv::Mat &in_ssd_result,
-                    const cv::Mat &in_frame,
-                    std::vector<cv::Rect> &out_faces) {
-        const int MAX_PROPOSALS = 200;
-        const int OBJECT_SIZE   =   7;
-        const cv::Size upscale = in_frame.size();
-        const cv::Rect surface({0,0}, upscale);
-        out_faces.clear();
-        const float *data = in_ssd_result.ptr<float>();
-        for (int i = 0; i < MAX_PROPOSALS; i++) {
-            const float image_id   = data[i * OBJECT_SIZE + 0]; // batch id
-            const float confidence = data[i * OBJECT_SIZE + 2];
-            const float rc_left    = data[i * OBJECT_SIZE + 3];
-            const float rc_top     = data[i * OBJECT_SIZE + 4];
-            const float rc_right   = data[i * OBJECT_SIZE + 5];
-            const float rc_bottom  = data[i * OBJECT_SIZE + 6];
-            if (image_id < 0.f) {  // indicates end of detections
-                break;
-            }
-            if (confidence < 0.5f) { // a hard-coded snapshot
-                continue;
-            }
-            // Convert floating-point coordinates to the absolute image
-            // frame coordinates; clip by the source image boundaries.
-            cv::Rect rc;
-            rc.x      = static_cast<int>(rc_left   * upscale.width);
-            rc.y      = static_cast<int>(rc_top    * upscale.height);
-            rc.width  = static_cast<int>(rc_right  * upscale.width)  - rc.x;
-            rc.height = static_cast<int>(rc_bottom * upscale.height) - rc.y;
-            out_faces.push_back(rc & surface);
-        }
-    }
-};
-```
+
+.. code-block:: cpp
+   
+   // SSD Post-processing function - this is not a network but a kernel.
+   // The kernel body is declared separately, this is just an interface.
+   // This operation takes two Mats (detections and the source image),
+   // and returns a vector of ROI (filtered by a default threshold).
+   // Threshold (or a class to select) may become a parameter, but since
+   // this kernel is custom, it doesn't make a lot of sense.
+   G_API_OP(PostProc, <cv::GArray<cv::Rect>(cv::GMat, cv::GMat)>, "custom.fd_postproc") {
+       static cv::GArrayDesc outMeta(const cv::GMatDesc &, const cv::GMatDesc &) {
+           // This function is required for G-API engine to figure out
+           // what the output format is, given the input parameters.
+           // Since the output is an array (with a specific type),
+           // there's nothing to describe.
+           return cv::empty_array_desc();
+       }
+   };
+   // OpenCV-based implementation of the above kernel.
+   GAPI_OCV_KERNEL(OCVPostProc, PostProc) {
+       static void run(const cv::Mat &in_ssd_result,
+                       const cv::Mat &in_frame,
+                       std::vector<cv::Rect> &out_faces) {
+           const int MAX_PROPOSALS = 200;
+           const int OBJECT_SIZE   =   7;
+           const cv::Size upscale = in_frame.size();
+           const cv::Rect surface({0,0}, upscale);
+           out_faces.clear();
+           const float \*data = in_ssd_result.ptr<float>();
+           for (int i = 0; i < MAX_PROPOSALS; i++) {
+               const float image_id   = data[i \* OBJECT_SIZE + 0]; // batch id
+               const float confidence = data[i \* OBJECT_SIZE + 2];
+               const float rc_left    = data[i \* OBJECT_SIZE + 3];
+               const float rc_top     = data[i \* OBJECT_SIZE + 4];
+               const float rc_right   = data[i \* OBJECT_SIZE + 5];
+               const float rc_bottom  = data[i \* OBJECT_SIZE + 6];
+               if (image_id < 0.f) {  // indicates end of detections
+                   break;
+               }
+               if (confidence < 0.5f) { // a hard-coded snapshot
+                   continue;
+               }
+               // Convert floating-point coordinates to the absolute image
+               // frame coordinates; clip by the source image boundaries.
+               cv::Rect rc;
+               rc.x      = static_cast<int>(rc_left   \* upscale.width);
+               rc.y      = static_cast<int>(rc_top    \* upscale.height);
+               rc.width  = static_cast<int>(rc_right  \* upscale.width)  - rc.x;
+               rc.height = static_cast<int>(rc_bottom \* upscale.height) - rc.y;
+               out_faces.push_back(rc & surface);
+           }
+       }
+   };
+
+@endsphinxdirective
+
--- a/docs/gapi/gapi_intro.md
+++ b/docs/gapi/gapi_intro.md
@ -10,56 +10,64 @@
   openvino_docs_gapi_face_beautification
   openvino_docs_gapi_gapi_face_analytics_pipeline

-@endsphinxdirective
-
 OpenCV Graph API (G-API) is an OpenCV module targeted to make regular image and video processing fast and portable. G-API is a special module in OpenCV – in contrast with the majority of other main modules, this one acts as a framework rather than some specific CV algorithm. 

 G-API is positioned as a next level optimization enabler for computer vision, focusing not on particular CV functions but on the whole algorithm optimization.

 G-API provides means to define CV operations, construct graphs (in form of expressions) using it, and finally implement and run the operations for a particular backend.

-The idea behind G-API is that if an algorithm can be expressed in a special embedded language (currently in C++), the framework can catch its sense and apply a number of optimizations to the whole thing automatically. Particular optimizations are selected based on which [kernels](kernel_api.md) and [backends](https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html) are involved in the graph compilation process, for example, the graph can be offloaded to GPU via the OpenCL backend, or optimized for memory consumption with the Fluid backend. Kernels, backends, and their settings are parameters to the graph compilation, so the graph itself does not depend on any platform-specific details and can be ported easily.
+The idea behind G-API is that if an algorithm can be expressed in a special embedded language (currently in C++), the framework can catch its sense and apply a number of optimizations to the whole thing automatically. Particular optimizations are selected based on which :doc:`kernels <openvino_docs_gapi_kernel_api>` and `backends <https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html>`__ are involved in the graph compilation process, for example, the graph can be offloaded to GPU via the OpenCL backend, or optimized for memory consumption with the Fluid backend. Kernels, backends, and their settings are parameters to the graph compilation, so the graph itself does not depend on any platform-specific details and can be ported easily.

-> **NOTE**: Graph API (G-API) was introduced in the most recent major OpenCV 4.0 release and now is being actively developed. The API is volatile at the moment and there may be minor but compatibility-breaking changes in the future.
+.. note::
+   Graph API (G-API) was introduced in the most recent major OpenCV 4.0 release and now is being actively developed. The API is volatile at the moment and there may be minor but compatibility-breaking changes in the future.

-## G-API Concepts
+G-API Concepts
+##############

 * *Graphs* are built by applying operations to data objects.
-   * API itself has no "graphs", it is expression-based instead.
+  
+  * API itself has no "graphs", it is expression-based instead.
+  
 * *Data objects* do not hold actual data, only capture dependencies.
 * *Operations* consume and produce data objects.
 * A graph is defined by specifying its boundaries with data objects:
-   * What data objects are inputs to the graph?
-   * What are its outputs?
+   
+  * What data objects are inputs to the graph?
+  * What are its outputs?

 The paragraphs below explain the G-API programming model and development workflow.   

-## Programming Model
-Building graphs is easy with G-API. In fact, there is no notion of graphs exposed in the API, so the user doesn’t need to operate in terms of “nodes” and “edges” — instead, graphs are constructed implicitly via expressions in a "functional" way. Expression-based graphs are built using two major concepts: *[operations](kernel_api.md)* and *[data objects](https://docs.opencv.org/4.2.0/db/df1/group__gapi__data__objects.html)*.
+Programming Model
+#################
+
+Building graphs is easy with G-API. In fact, there is no notion of graphs exposed in the API, so the user doesn’t need to operate in terms of “nodes” and “edges” — instead, graphs are constructed implicitly via expressions in a "functional" way. Expression-based graphs are built using two major concepts: :doc:`operations <openvino_docs_gapi_kernel_api>` and `data objects <https://docs.opencv.org/4.2.0/db/df1/group__gapi__data__objects.html>`__ .

 In G-API, every graph begins and ends with data objects; data objects are passed to operations which produce (“return”) their results — new data objects, which are then passed to other operations, and so on. You can declare their own operations, G-API does not distinguish user-defined operations from its own predefined ones in any way.

-After the graph is defined, it needs to be compiled for execution. During the compilation, G-API figures out what the graph looks like, which kernels are available to run the operations in the graph, how to manage heterogeneity and to optimize the execution path. The result of graph compilation is a so-called “compiled” object. This object encapsulates the execution sequence for the graph inside and operates on real image data. You can set up the compilation process using various [compilation arguments](https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html). Backends expose some of their options as these arguments; also, actual kernels and DL network settings are passed into the framework this way.
+After the graph is defined, it needs to be compiled for execution. During the compilation, G-API figures out what the graph looks like, which kernels are available to run the operations in the graph, how to manage heterogeneity and to optimize the execution path. The result of graph compilation is a so-called “compiled” object. This object encapsulates the execution sequence for the graph inside and operates on real image data. You can set up the compilation process using various `compilation arguments <https://docs.opencv.org/4.5.0/dc/d1c/group__gapi__std__backends.html>`__. Backends expose some of their options as these arguments; also, actual kernels and DL network settings are passed into the framework this way.

 G-API supports graph compilation for two execution modes, *regular* and *streaming*, producing different types of compiled objects as the result.
-* <strong>Regular</strong> compiled objects are represented with class GCompiled, which follows functor-like semantics and has an overloaded operator(). When called for execution on the given input data, the GCompiled functor blocks the current thread and processes the data immediately — like a regular C++ function. By default, G-API tries to optimize the execution time for latency in this compilation mode.
+
+* **Regular** compiled objects are represented with class GCompiled, which follows functor-like semantics and has an overloaded operator(). When called for execution on the given input data, the GCompiled functor blocks the current thread and processes the data immediately — like a regular C++ function. By default, G-API tries to optimize the execution time for latency in this compilation mode.
 * Starting with OpenCV 4.2, G-API can also produce GStreamingCompiled objects that better fit the asynchronous pipelined execution model. This compilation mode is called **streaming mode**, and G-API tries to optimize the overall throughput by implementing the pipelining technique as described above. We will use both in our example.

 The overall process for the regular case is summarized in the diagram below:

-![G-API Programming Model](../img/gapi_programming_model.png)
+.. image:: _static/images/gapi_programming_model.png

-The graph is built with operations so having operations defined (**0**) is a basic prerequisite; a constructed expression graph (**1**) forms a `cv::GComputation` object; kernels (**2**) which implement operations are the basic requirement to the graph compilation (**3**); the actual execution (**4**) is handled by a `cv::GCompiled` object with takes input and produces output data.
+The graph is built with operations so having operations defined (**0**) is a basic prerequisite; a constructed expression graph (**1**) forms a ``cv::GComputation`` object; kernels (**2**) which implement operations are the basic requirement to the graph compilation (**3**); the actual execution (**4**) is handled by a ``cv::GCompiled`` object with takes input and produces output data.
+
+Development Workflow
+####################

-## Development Workflow
 One of the ways to organize a G-API development workflow is presented in the diagram below:

-![G-API development workflow](../img/gapi_development_workflow.png)
+.. image:: _static/images/gapi_development_workflow.png

 Basically, it is a derivative from the programming model illustrated in the previous chapter. You start with an algorithm or a data flow in mind (**0**), mapping it to a graph model (**1**), then identifying what operations you need (**2**) to construct this graph. These operations may already exist in G-API or be missing, in the latter case we implement the missing ones as kernels (**3**). Then decide which execution model fits our case better, pass kernels and DL networks as arguments to the compilation process (**4**), and finally switch to the execution (**5**). The process is iterative, so if you want to change anything based on the execution results, get back to steps (**0**) or (**1**) (a dashed line).


-
+@endsphinxdirective



--- a/docs/gapi/kernel_api.md
+++ b/docs/gapi/kernel_api.md
@ -1,188 +1,212 @@
 # Graph API Kernel API {#openvino_docs_gapi_kernel_api}

+@sphinxdirective
+
 The core idea behind Graph API (G-API) is portability – a pipeline built with G-API must be portable (or at least able to be portable). It means that either it works out-of-the box when compiled for new platform, or G-API provides necessary tools to make it running there, with little-to-no changes in the algorithm itself.

 This idea can be achieved by separating kernel interface from its implementation. Once a pipeline is built using kernel interfaces, it becomes implementation-neutral – the implementation details (i.e. which kernels to use) are passed on a separate stage (graph compilation).

 Kernel-implementation hierarchy may look like:
-![Kernel API/implementation hierarchy example](../img/gapi_kernel_implementation_hierarchy.png)

-A pipeline itself then can be expressed only in terms of `A`, `B`, and so on, and choosing which implementation to use in execution becomes an external parameter.
+.. image:: _static/images/gapi_kernel_implementation_hierarchy.png

-## Define a Kernel
-G-API provides a macro to define a new kernel interface `G_TYPED_KERNEL()`:
+A pipeline itself then can be expressed only in terms of ``A``, ``B``, and so on, and choosing which implementation to use in execution becomes an external parameter.
+
+Define a Kernel
+###############
+
+G-API provides a macro to define a new kernel interface ``G_TYPED_KERNEL()``:
+
+.. code-block:: cpp
+   
+   #include <opencv2/gapi.hpp>
+   G_TYPED_KERNEL(GFilter2D,
+                  <cv::GMat(cv::GMat,int,cv::Mat,cv::Point,double,int,cv::Scalar)>,
+                  "org.opencv.imgproc.filters.filter2D")
+   {
+       static cv::GMatDesc                 // outMeta's return value type
+       outMeta(cv::GMatDesc    in       ,  // descriptor of input GMat
+               int             ddepth   ,  // depth parameter
+               cv::Mat      /\* coeffs \*/,  // (unused)
+               cv::Point    /\* anchor \*/,  // (unused)
+               double       /\* scale  \*/,  // (unused)
+               int          /\* border \*/,  // (unused)
+               cv::Scalar   /\* bvalue \*/ ) // (unused)
+       {
+           return in.withDepth(ddepth);
+       }
+   };

-```cpp
-#include <opencv2/gapi.hpp>
-G_TYPED_KERNEL(GFilter2D,
-               <cv::GMat(cv::GMat,int,cv::Mat,cv::Point,double,int,cv::Scalar)>,
-               "org.opencv.imgproc.filters.filter2D")
-{
-    static cv::GMatDesc                 // outMeta's return value type
-    outMeta(cv::GMatDesc    in       ,  // descriptor of input GMat
-            int             ddepth   ,  // depth parameter
-            cv::Mat      /* coeffs */,  // (unused)
-            cv::Point    /* anchor */,  // (unused)
-            double       /* scale  */,  // (unused)
-            int          /* border */,  // (unused)
-            cv::Scalar   /* bvalue */ ) // (unused)
-    {
-        return in.withDepth(ddepth);
-    }
-};
-```

 This macro is a shortcut to a new type definition. It takes three arguments to register a new type, and requires type body to be present (see below). The macro arguments are:

 * Kernel interface name -- Also serves as a name of new type defined with this macro;
-* Kernel signature -- An `std::function<>`-like signature which defines API of the kernel;
+* Kernel signature -- An ``std::function<>``-like signature which defines API of the kernel;
 * Kernel's unique name -- Used to identify kernel when its type information is stripped within the system.
 * Kernel declaration may be seen as function declaration -- In both cases a new entity must be used then according to the way it was defined.

-Kernel signature defines kernel's usage syntax -- which parameters it takes during graph construction. Implementations can also use this signature to derive it into backend-specific callback signatures (see next chapter).
+Kernel signature defines kernel's usage syntax which parameters it takes during graph construction. Implementations can also use this signature to derive it into backend-specific callback signatures (see next chapter).

-Kernel may accept values of any type, and G-API dynamic types are handled in a special way. All other types are opaque to G-API and passed to kernel in `outMeta()` or in execution callbacks as-is.
+Kernel may accept values of any type, and G-API dynamic types are handled in a special way. All other types are opaque to G-API and passed to kernel in ``outMeta()`` or in execution callbacks as-is.

-Kernel's return value can only be of G-API dynamic type – `cv::GMat`, `cv::GScalar`, or `cv::GArray<T>`. If an operation has more than one output, it should be wrapped into an `std::tuple<>` (which can contain only mentioned G-API types). Arbitrary-output-number operations are not supported.
+Kernel's return value can only be of G-API dynamic type – ``cv::GMat``, ``cv::GScalar``, or ``cv::GArray<T>``. If an operation has more than one output, it should be wrapped into an ``std::tuple<>`` (which can contain only mentioned G-API types). Arbitrary-output-number operations are not supported.
+
+Once a kernel is defined, it can be used in pipelines with special, G-API-supplied method ``on()``. This method has the same signature as defined in kernel, so the following code is a perfectly legal construction:
+
+.. code-block:: cpp
+   
+   cv::GMat in;
+   cv::GMat out = GFilter2D::on(/\* GMat    \*/  in,
+                                /\* int     \*/  -1,
+                                /\* Mat     \*/  conv_kernel_mat,
+                                /\* Point   \*/  cv::Point(-1,-1),
+                                /\* double  \*/  0.,
+                                /\* int     \*/  cv::BORDER_DEFAULT,
+                                /\* Scalar  \*/  cv::Scalar(0));

-Once a kernel is defined, it can be used in pipelines with special, G-API-supplied method `on()`. This method has the same signature as defined in kernel, so the following code is a perfectly legal construction:

-```cpp
-cv::GMat in;
-cv::GMat out = GFilter2D::on(/* GMat    */  in,
-                             /* int     */  -1,
-                             /* Mat     */  conv_kernel_mat,
-                             /* Point   */  cv::Point(-1,-1),
-                             /* double  */  0.,
-                             /* int     */  cv::BORDER_DEFAULT,
-                             /* Scalar  */  cv::Scalar(0));
-```
 This example has some verbosity, though, so usually a kernel declaration comes with a C++ function wrapper ("factory method") which enables optional parameters, more compact syntax, Doxygen comments, etc.:

-```cpp
-cv::GMat filter2D(cv::GMat   in,
-                  int        ddepth,
-                  cv::Mat    k,
-                  cv::Point  anchor  = cv::Point(-1,-1),
-                  double     scale   = 0.,
-                  int        border  = cv::BORDER_DEFAULT,
-                  cv::Scalar bval    = cv::Scalar(0))
-{
-    return GFilter2D::on(in, ddepth, k, anchor, scale, border, bval);
-}
-```
+.. code-block:: cpp
+   
+   cv::GMat filter2D(cv::GMat   in,
+                     int        ddepth,
+                     cv::Mat    k,
+                     cv::Point  anchor  = cv::Point(-1,-1),
+                     double     scale   = 0.,
+                     int        border  = cv::BORDER_DEFAULT,
+                     cv::Scalar bval    = cv::Scalar(0))
+   {
+       return GFilter2D::on(in, ddepth, k, anchor, scale, border, bval);
+   }
+
+
 So now it can be used like:
-```cpp
-cv::GMat in;
-cv::GMat out = filter2D(in, -1, conv_kernel_mat);
-```

-### Extra information
-In the current version, kernel declaration body (everything within the curly braces) must contain a static function `outMeta()`. This function establishes a functional dependency between operation's input and output metadata.
+.. code-block:: cpp
+   
+   cv::GMat in;
+   cv::GMat out = filter2D(in, -1, conv_kernel_mat);

-Metadata is an information about data kernel operates on. Since non-G-API types are opaque to G-API, G-API cares only about G* data descriptors (i.e. dimensions and format of `cv::GMat`, etc).

-`outMeta()` is also an example of how kernel's signature can be transformed into a derived callback – note that in this example, outMeta() signature exactly follows the kernel signature (defined within the macro) but is different – where kernel expects `cv::GMat`, `outMeta()` takes and returns `cv::GMatDesc` (a G-API structure metadata for `cv::GMat`).
+Extra information
+++++++++++++++++

-The point of `outMeta()` is to propagate metadata information within computation from inputs to outputs and infer metadata of internal (intermediate, temporary) data objects. This information is required for further pipeline optimizations, memory allocation, and other operations done by G-API framework during graph compilation.
+In the current version, kernel declaration body (everything within the curly braces) must contain a static function ``outMeta()``. This function establishes a functional dependency between operation's input and output metadata.
+
+Metadata is an information about data kernel operates on. Since non-G-API types are opaque to G-API, G-API cares only about G* data descriptors (i.e. dimensions and format of ``cv::GMat``, etc).
+
+The ``outMeta()`` is also an example of how kernel's signature can be transformed into a derived callback – note that in this example, outMeta() signature exactly follows the kernel signature (defined within the macro) but is different – where kernel expects ``cv::GMat``, ``outMeta()`` takes and returns ``cv::GMatDesc`` (a G-API structure metadata for ``cv::GMat``).
+
+The point of ``outMeta()`` is to propagate metadata information within computation from inputs to outputs and infer metadata of internal (intermediate, temporary) data objects. This information is required for further pipeline optimizations, memory allocation, and other operations done by G-API framework during graph compilation.
+
+Implement a Kernel
+##################

-## Implement a Kernel
 Once a kernel is declared, its interface can be used to implement versions of this kernel in different backends. This concept is naturally projected from object-oriented programming "Interface/Implementation" idiom: an interface can be implemented multiple times, and different implementations of a kernel should be substitutable with each other without breaking the algorithm (pipeline) logic (Liskov Substitution Principle).

 Every backend defines its own way to implement a kernel interface. This way is regular, though – whatever plugin is, its kernel implementation must be "derived" from a kernel interface type.

-Kernel implementation are then organized into kernel packages. Kernel packages are passed to `cv::GComputation::compile()` as compile arguments, with some hints to G-API on how to select proper kernels (see more on this in "Heterogeneity"[TBD]).
+Kernel implementation are then organized into kernel packages. Kernel packages are passed to ``cv::GComputation::compile()`` as compile arguments, with some hints to G-API on how to select proper kernels.

 For example, the aforementioned Filter2D is implemented in "reference" CPU (OpenCV) plugin this way (NOTE – this is a simplified form with improper border handling):

-```cpp
-#include <opencv2/gapi/cpu/gcpukernel.hpp>     // GAPI_OCV_KERNEL()
-#include <opencv2/imgproc.hpp>                 // cv::filter2D()
-GAPI_OCV_KERNEL(GCPUFilter2D, GFilter2D)
-{
-    static void
-    run(const cv::Mat    &in,       // in - derived from GMat
-        const int         ddepth,   // opaque (passed as-is)
-        const cv::Mat    &k,        // opaque (passed as-is)
-        const cv::Point  &anchor,   // opaque (passed as-is)
-        const double      delta,    // opaque (passed as-is)
-        const int         border,   // opaque (passed as-is)
-        const cv::Scalar &,         // opaque (passed as-is)
-        cv::Mat          &out)      // out - derived from GMat (retval)
-    {
-        cv::filter2D(in, out, ddepth, k, anchor, delta, border);
-    }
-};
-```
+.. code-block:: cpp
+
+   #include <opencv2/gapi/cpu/gcpukernel.hpp>     // GAPI_OCV_KERNEL()
+   #include <opencv2/imgproc.hpp>                 // cv::filter2D()
+   GAPI_OCV_KERNEL(GCPUFilter2D, GFilter2D)
+   {
+       static void
+       run(const cv::Mat    &in,       // in - derived from GMat
+           const int         ddepth,   // opaque (passed as-is)
+           const cv::Mat    &k,        // opaque (passed as-is)
+           const cv::Point  &anchor,   // opaque (passed as-is)
+           const double      delta,    // opaque (passed as-is)
+           const int         border,   // opaque (passed as-is)
+           const cv::Scalar &,         // opaque (passed as-is)
+           cv::Mat          &out)      // out - derived from GMat (retval)
+       {
+           cv::filter2D(in, out, ddepth, k, anchor, delta, border);
+       }
+   };
+
+
 Note how CPU (OpenCV) plugin has transformed the original kernel signature:

-* Input `cv::GMat` has been substituted with `cv::Mat`, holding actual input data for the underlying OpenCV function call;
-* Output `cv::GMat `has been transformed into extra output parameter, thus `GCPUFilter2D::run()` takes one argument more than the original kernel signature.
+* Input ``cv::GMat`` has been substituted with ``cv::Mat``, holding actual input data for the underlying OpenCV function call;
+* Output ``cv::GMat`` has been transformed into extra output parameter, thus ``GCPUFilter2D::run()`` takes one argument more than the original kernel signature.

-The basic intuition for kernel developer here is not to care where that cv::Mat objects come from instead of the original `cv::GMat` – and just follow the signature conventions defined by the plugin. G-API will call this method during execution and supply all the necessary information (and forward the original opaque data as-is).
+The basic intuition for kernel developer here is not to care where that cv::Mat objects come from instead of the original ``cv::GMat`` – and just follow the signature conventions defined by the plugin. G-API will call this method during execution and supply all the necessary information (and forward the original opaque data as-is).

-## Compound Kernels
-Sometimes kernel is a single thing only on API level. It is convenient for users, but on a particular implementation side it would be better to have multiple kernels (a subgraph) doing the thing instead. An example is `goodFeaturesToTrack()` – while in OpenCV backend it may remain a single kernel, with Fluid it becomes compound – Fluid can handle Harris response calculation but can't do sparse non-maxima suppression and point extraction to an STL vector:
+Compound Kernels
+################

-A compound kernel implementation can be defined using a generic macro `GAPI_COMPOUND_KERNEL()`:
+Sometimes kernel is a single thing only on API level. It is convenient for users, but on a particular implementation side it would be better to have multiple kernels (a subgraph) doing the thing instead. An example is ``goodFeaturesToTrack()`` – while in OpenCV backend it may remain a single kernel, with Fluid it becomes compound – Fluid can handle Harris response calculation but can't do sparse non-maxima suppression and point extraction to an STL vector:
+
+A compound kernel implementation can be defined using a generic macro ``GAPI_COMPOUND_KERNEL()``:
+
+.. code-block:: cpp
+   
+   #include <opencv2/gapi/gcompoundkernel.hpp>       // GAPI_COMPOUND_KERNEL()
+   using PointArray2f = cv::GArray<cv::Point2f>;
+   G_TYPED_KERNEL(HarrisCorners,
+                  <PointArray2f(cv::GMat,int,double,double,int,double)>,
+                  "org.opencv.imgproc.harris_corner")
+   {
+       static cv::GArrayDesc outMeta(const cv::GMatDesc &,
+                                     int,
+                                     double,
+                                     double,
+                                     int,
+                                     double)
+       {
+           // No special metadata for arrays in G-API (yet)
+           return cv::empty_array_desc();
+       }
+   };
+   // Define Fluid-backend-local kernels which form GoodFeatures
+   G_TYPED_KERNEL(HarrisResponse,
+                  <cv::GMat(cv::GMat,double,int,double)>,
+                  "org.opencv.fluid.harris_response")
+   {
+       static cv::GMatDesc outMeta(const cv::GMatDesc &in,
+                                   double,
+                                   int,
+                                   double)
+       {
+           return in.withType(CV_32F, 1);
+       }
+   };
+   G_TYPED_KERNEL(ArrayNMS,
+                  <PointArray2f(cv::GMat,int,double)>,
+                  "org.opencv.cpu.nms_array")
+   {
+       static cv::GArrayDesc outMeta(const cv::GMatDesc &,
+                                     int,
+                                     double)
+       {
+           return cv::empty_array_desc();
+       }
+   };
+   GAPI_COMPOUND_KERNEL(GFluidHarrisCorners, HarrisCorners)
+   {
+       static PointArray2f
+       expand(cv::GMat in,
+              int      maxCorners,
+              double   quality,
+              double   minDist,
+              int      blockSize,
+              double   k)
+       {
+           cv::GMat response = HarrisResponse::on(in, quality, blockSize, k);
+           return ArrayNMS::on(response, maxCorners, minDist);
+       }
+   };
+   // Then implement HarrisResponse as Fluid kernel and NMSresponse
+   // as a generic (OpenCV) kernel
+
+It is important to distinguish a compound kernel from G-API high-order function, i.e. a C++ function which looks like a kernel but in fact generates a subgraph. The core difference is that a compound kernel is an *implementation detail* and a kernel implementation may be either compound or not (depending on backend capabilities), while a high-order function is a "macro" in terms of G-API and so cannot act as an interface which then needs to be implemented by a backend.
+
+@endsphinxdirective

-```cpp
-#include <opencv2/gapi/gcompoundkernel.hpp>       // GAPI_COMPOUND_KERNEL()
-using PointArray2f = cv::GArray<cv::Point2f>;
-G_TYPED_KERNEL(HarrisCorners,
-               <PointArray2f(cv::GMat,int,double,double,int,double)>,
-               "org.opencv.imgproc.harris_corner")
-{
-    static cv::GArrayDesc outMeta(const cv::GMatDesc &,
-                                  int,
-                                  double,
-                                  double,
-                                  int,
-                                  double)
-    {
-        // No special metadata for arrays in G-API (yet)
-        return cv::empty_array_desc();
-    }
-};
-// Define Fluid-backend-local kernels which form GoodFeatures
-G_TYPED_KERNEL(HarrisResponse,
-               <cv::GMat(cv::GMat,double,int,double)>,
-               "org.opencv.fluid.harris_response")
-{
-    static cv::GMatDesc outMeta(const cv::GMatDesc &in,
-                                double,
-                                int,
-                                double)
-    {
-        return in.withType(CV_32F, 1);
-    }
-};
-G_TYPED_KERNEL(ArrayNMS,
-               <PointArray2f(cv::GMat,int,double)>,
-               "org.opencv.cpu.nms_array")
-{
-    static cv::GArrayDesc outMeta(const cv::GMatDesc &,
-                                  int,
-                                  double)
-    {
-        return cv::empty_array_desc();
-    }
-};
-GAPI_COMPOUND_KERNEL(GFluidHarrisCorners, HarrisCorners)
-{
-    static PointArray2f
-    expand(cv::GMat in,
-           int      maxCorners,
-           double   quality,
-           double   minDist,
-           int      blockSize,
-           double   k)
-    {
-        cv::GMat response = HarrisResponse::on(in, quality, blockSize, k);
-        return ArrayNMS::on(response, maxCorners, minDist);
-    }
-};
-// Then implement HarrisResponse as Fluid kernel and NMSresponse
-// as a generic (OpenCV) kernel
-```
-It is important to distinguish a compound kernel from G-API high-order function, i.e. a C++ function which looks like a kernel but in fact generates a subgraph. The core difference is that a compound kernel is an *implementation detail* and a kernel implementation may be either compound or not (depending on backend capabilities), while a high-order function is a "macro" in terms of G-API and so cannot act as an interface which then needs to be implemented by a backend.
--- a/docs/install_guides/installing-model-dev-tools.md
+++ b/docs/install_guides/installing-model-dev-tools.md
@ -165,6 +165,7 @@ Get started with Python
 Try the `Python Quick Start Example <https://docs.openvino.ai/nightly/notebooks/201-vision-monodepth-with-output.html>`__ to estimate depth in a scene using an OpenVINO monodepth model in a Jupyter Notebook inside your web browser.

 Visit the :doc:`Tutorials <tutorials>` page for more Jupyter Notebooks to get you started with OpenVINO, such as:
+
 * `OpenVINO Python API Tutorial <https://docs.openvino.ai/nightly/notebooks/002-openvino-api-with-output.html>`__
 * `Basic image classification program with Hello Image Classification <https://docs.openvino.ai/nightly/notebooks/001-hello-world-with-output.html>`__
 * `Convert a PyTorch model and use it for image background removal <https://docs.openvino.ai/nightly/notebooks/205-vision-background-removal-with-output.html>`__
@ -186,16 +187,17 @@ Visit the :doc:`Samples <openvino_docs_OV_UG_Samples_Overview>` page for other C
 Learn OpenVINO Development Tools
 ++++++++++++++++++++++++++++++++

-* Explore a variety of pre-trained deep learning models in the :ref:`Open Model Zoo <model_zoo.html>` and deploy them in demo applications to see how they work.
-* Want to import a model from another framework and optimize its performance with OpenVINO? Visit the :ref:`Model Optimizer Developer Guide <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html>`.
-* Accelerate your model's speed even further with quantization and other compression techniques using :ref:`Post-Training Optimization Tool <pot_introduction.html>`.
-* Benchmark your model's inference speed with one simple command using the :ref:`Benchmark Tool <openvino_inference_engine_tools_benchmark_tool_README.html`>.
+* Explore a variety of pre-trained deep learning models in the :doc:`Open Model Zoo <model_zoo>` and deploy them in demo applications to see how they work.
+* Want to import a model from another framework and optimize its performance with OpenVINO? Visit the :doc:`Model Optimizer Developer Guide <openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide>`.
+* Accelerate your model's speed even further with quantization and other compression techniques using :doc:`Post-Training Optimization Tool <pot_introduction>`.
+* Benchmark your model's inference speed with one simple command using the :doc:`Benchmark Tool <openvino_inference_engine_tools_benchmark_tool_README>`.

-## Additional Resources
+Additional Resources
+####################

 - `Intel® Distribution of OpenVINO™ toolkit home page <https://software.intel.com/en-us/openvino-toolkit>`__
 - For IoT Libraries & Code Samples, see `Intel® IoT Developer Kit <https://github.com/intel-iot-devkit>`__ .
- `OpenVINO Installation Selector Tool <https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html`>__
+- `OpenVINO Installation Selector Tool <https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/download.html>`__

@endsphinxdirective

--- a/docs/nbdoc/consts.py
+++ b/docs/nbdoc/consts.py
@ -8,7 +8,7 @@ repo_owner = "openvinotoolkit"

 repo_name = "openvino_notebooks"

-artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20230309220806/dist/rst_files/"
+artifacts_link = "http://repository.toolbox.iotg.sclab.intel.com/projects/ov-notebook/0.1.0-latest/20230317115622/dist/rst_files/"

 blacklisted_extensions = ['.xml', '.bin']

--- a/docs/optimization_guide/dldt_deployment_optimization_common.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_common.md
@ -1,64 +1,92 @@
 # General Optimizations {#openvino_docs_deployment_optimization_guide_common}

+@sphinxdirective
+
 This article covers application-level optimization techniques, such as asynchronous execution, to improve data pipelining, pre-processing acceleration and so on. 
 While the techniques (e.g. pre-processing) can be specific to end-user applications, the associated performance improvements are general and shall improve any target scenario -- both latency and throughput.

-@anchor inputs_pre_processing
-## Inputs Pre-Processing with OpenVINO
+.. _inputs_pre_processing:
+
+Inputs Pre-Processing with OpenVINO
+###################################

 In many cases, a network expects a pre-processed image. It is advised not to perform any unnecessary steps in the code:
- Model Optimizer can efficiently incorporate the mean and normalization (scale) values into a model (for example, to the weights of the first convolution). For more details, see the [relevant Model Optimizer command-line options](../MO_DG/prepare_model/Additional_Optimizations.md).
- Let OpenVINO accelerate other means of [Image Pre-processing and Conversion](../OV_Runtime_UG/preprocessing_overview.md).
- Data which is already in the "on-device" memory can be input directly by using the [remote tensors API of the GPU Plugin](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md).

-@anchor async_api
-## Prefer OpenVINO Async API
-The API of the inference requests offers Sync and Async execution. While the `ov::InferRequest::infer()` is inherently synchronous and executes immediately (effectively serializing the execution flow in the current application thread), the Async "splits" the `infer()` into `ov::InferRequest::start_async()` and `ov::InferRequest::wait()`. For more information, see the [API examples](../OV_Runtime_UG/ov_infer_request.md).
+* Model Optimizer can efficiently incorporate the mean and normalization (scale) values into a model (for example, to the weights of the first convolution). For more details, see the :doc:`relevant Model Optimizer command-line options <openvino_docs_MO_DG_Additional_Optimization_Use_Cases>`.
+* Let OpenVINO accelerate other means of :doc:`Image Pre-processing and Conversion <openvino_docs_OV_UG_Preprocessing_Overview>`
+* Data which is already in the "on-device" memory can be input directly by using the :doc:`remote tensors API of the GPU Plugin <openvino_docs_OV_UG_supported_plugins_GPU_RemoteTensor_API>`.

-A typical use case for the `ov::InferRequest::infer()` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, parsing the results, and associated logic) is kept serial within the thread.
-In contrast, the `ov::InferRequest::start_async()` and `ov::InferRequest::wait()` allow the application to continue its activities and poll or wait for the inference completion when really needed. Therefore, one reason for using an asynchronous code is "efficiency".
+.. _async_api:
+
+Prefer OpenVINO Async API
+#########################
+
+The API of the inference requests offers Sync and Async execution. While the `ov::InferRequest::infer() <classov_1_1InferRequest.html#doxid-classov-1-1-infer-request-1abcb7facc9f7c4b9226a1fd343e56958d>`__ is inherently synchronous and executes immediately (effectively serializing the execution flow in the current application thread), the Async "splits" the ``infer()`` into ``ov::InferRequest::start_async()`` and ``ov::InferRequest::wait()``. For more information, see the :doc:`API examples <openvino_docs_OV_UG_Infer_request>`.
+
+A typical use case for the ``ov::InferRequest::infer()`` is running a dedicated application thread per source of inputs (e.g. a camera), so that every step (frame capture, processing, parsing the results, and associated logic) is kept serial within the thread.
+In contrast, the ``ov::InferRequest::start_async()`` and ``ov::InferRequest::wait()`` allow the application to continue its activities and poll or wait for the inference completion when really needed. Therefore, one reason for using an asynchronous code is "efficiency".
+
+.. note::
+
+   Although the Synchronous API can be somewhat easier to start with, prefer to use the Asynchronous (callbacks-based, below) API in the production code. The reason is that it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios).

-> **NOTE**: Although the Synchronous API can be somewhat easier to start with, prefer to use the Asynchronous (callbacks-based, below) API in the production code. The reason is that it is the most general and scalable way to implement the flow control for any possible number of requests (and hence both latency and throughput scenarios).

 The key advantage of the Async approach is that when a device is busy with the inference, the application can do other things in parallel (e.g. populating inputs or scheduling other requests) rather than wait for the current inference to complete first.

 In the example below, inference is applied to the results of the video decoding. It is possible to keep two parallel infer requests, and while the current one is processed, the input frame for the next one is being captured. This essentially hides the latency of capturing, so that the overall frame rate is rather determined only by the slowest part of the pipeline (decoding vs inference) and not by the sum of the stages.

-![Intel&reg; VTune&trade; screenshot](../img/synch-vs-asynch.svg)
+.. image:: _static/images/synch-vs-asynch.svg
+   :alt: Intel&reg; VTune&trade; screenshot

 Below are example-codes for the regular and async-based approaches to compare:

- Normally, the frame is captured with OpenCV and then immediately processed:<br>
-  @snippet snippets/dldt_optimization_guide8.cpp part8
+* Normally, the frame is captured with OpenCV and then immediately processed:<br>

- In the "true" async mode, the `NEXT` request is populated in the main (application) thread, while the `CURRENT` request is processed:<br>
-  @snippet snippets/dldt_optimization_guide9.cpp part9
+  .. doxygensnippet:: docs/snippets/dldt_optimization_guide8.cpp
+     :language: cpp
+     :fragment: [part8]
+
+* In the "true" async mode, the ``NEXT`` request is populated in the main (application) thread, while the ``CURRENT`` request is processed:<br>
+
+  .. doxygensnippet:: docs/snippets/dldt_optimization_guide9.cpp
+     :language: cpp
+     :fragment: [part9]


 The technique can be generalized to any available parallel slack. For example, you can do inference and simultaneously encode the resulting or previous frames or run further inference, like emotion detection on top of the face detection results.
-Refer to the [Object Detection С++ Demo](@ref omz_demos_object_detection_demo_cpp), [Object Detection Python Demo](@ref omz_demos_object_detection_demo_python)(latency-oriented Async API showcase) and [Benchmark App Sample](../../samples/cpp/benchmark_app/README.md) for complete examples of the Async API in action.
+Refer to the `Object Detection C++ Demo <https://docs.openvino.ai/latest/omz_demos_object_detection_demo_cpp.html>`__, `Object Detection Python Demo <https://docs.openvino.ai/latest/omz_demos_object_detection_demo_python.html>`__ (latency-oriented Async API showcase) and :doc:`Benchmark App Sample <openvino_inference_engine_samples_benchmark_app_README>` for complete examples of the Async API in action.

-> **NOTE**: Using the Asynchronous API is a must for [throughput-oriented scenarios](./dldt_deployment_optimization_tput.md).
+.. note::

-### Notes on Callbacks
-Keep in mind that the `ov::InferRequest::wait()` of the Async API waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the `ov::InferRequest::wait`. The most scalable approach is using callbacks (set via the `ov::InferRequest::set_callback`) that are executed upon completion of the request. The callback functions will be used by OpenVINO Runtime to notify you of the results (or errors). 
+   Using the Asynchronous API is a must for :doc:`throughput-oriented scenarios <openvino_docs_deployment_optimization_guide_tput>`.
+
+Notes on Callbacks
++++++++++++++++++++
+
+Keep in mind that the ``ov::InferRequest::wait()`` of the Async API waits for the specific request only. However, running multiple inference requests in parallel provides no guarantees on the completion order. This may complicate a possible logic based on the ``ov::InferRequest::wait``. The most scalable approach is using callbacks (set via the ``ov::InferRequest::set_callback``) that are executed upon completion of the request. The callback functions will be used by OpenVINO Runtime to notify you of the results (or errors). 
 This is a more event-driven approach.

 A few important points on the callbacks:
- It is the job of the application to ensure that any callback function is thread-safe.
- Although executed asynchronously by a dedicated threads, the callbacks should NOT include heavy operations (e.g. I/O) and/or blocking calls. Work done by any callback should be kept to a minimum.

-@anchor tensor_idiom
-## The "get_tensor" Idiom
+* It is the job of the application to ensure that any callback function is thread-safe.
+* Although executed asynchronously by a dedicated threads, the callbacks should NOT include heavy operations (e.g. I/O) and/or blocking calls. Work done by any callback should be kept to a minimum.
+
+.. _tensor_idiom:
+
+The "get_tensor" Idiom
+######################
+
 Each device within OpenVINO may have different internal requirements on the memory padding, alignment, etc., for intermediate tensors. The **input/output tensors** are also accessible by the application code. 
-As every `ov::InferRequest` is created by the particular instance of the `ov::CompiledModel`(that is already device-specific) the requirements are respected and the input/output tensors of the requests are still device-friendly.
+As every ``ov::InferRequest`` is created by the particular instance of the ``ov::CompiledModel`` (that is already device-specific) the requirements are respected and the input/output tensors of the requests are still device-friendly.
 To sum it up:
-* The `get_tensor` (that offers the `data()` method to get a system-memory pointer to the content of a tensor), is a recommended way to populate the inference inputs (and read back the outputs) **from/to the host memory**:
-   * For example, for the GPU device, the **input/output tensors** are mapped to the host (which is fast) only when the `get_tensor` is used, while for the `set_tensor` a copy into the internal GPU structures may happen.
-* In contrast, when the input tensors are already in the **on-device memory** (e.g. as a result of the video-decoding), prefer the `set_tensor` as a zero-copy way to proceed. For more details, see the [GPU device Remote tensors API](../OV_Runtime_UG//supported_plugins/GPU_RemoteTensor_API.md).

-@sphinxdirective
+* The ``get_tensor`` (that offers the ``data()`` method to get a system-memory pointer to the content of a tensor), is a recommended way to populate the inference inputs (and read back the outputs) **from/to the host memory**:

-Consider the :ref:`API examples <in_out_tensors>` for the `get_tensor` and `set_tensor`.
+  * For example, for the GPU device, the **input/output tensors** are mapped to the host (which is fast) only when the ``get_tensor`` is used, while for the ``set_tensor`` a copy into the internal GPU structures may happen.
+
+* In contrast, when the input tensors are already in the **on-device memory** (e.g. as a result of the video-decoding), prefer the ``set_tensor`` as a zero-copy way to proceed. For more details, see the :doc:`GPU device Remote tensors API <openvino_docs_OV_UG_supported_plugins_GPU_RemoteTensor_API>`.
+
+
+Consider the :ref:`API examples <in_out_tensors>` for the ``get_tensor`` and ``set_tensor``.

@endsphinxdirective
--- a/docs/optimization_guide/dldt_deployment_optimization_guide.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_guide.md
@ -14,55 +14,51 @@
   openvino_docs_OV_UG_Preprocessing_Overview
   openvino_docs_deployment_optimization_guide_internals

-@endsphinxdirective

 Runtime optimization, or deployment optimization, focuses on tuning inference parameters and execution means (e.g., the optimum number of requests executed simultaneously). Unlike model-level optimizations, they are highly specific to the hardware and case they are used for, and often come at a cost.
-`ov::inference_precision` is a "typical runtime configuration" which trades accuracy for performance, allowing `fp16/bf16` execution for the layers that remain in `fp32` after quantization of the original `fp32` model.
+`ov::inference_precision <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gad605a888f3c9b7598ab55023fbf44240>`__ is a "typical runtime configuration" which trades accuracy for performance, allowing ``fp16/bf16`` execution for the layers that remain in ``fp32`` after quantization of the original ``fp32`` model.

 Therefore, optimization should start with defining the use case. For example, if it is about processing millions of samples by overnight jobs in data centers, throughput could be prioritized over latency. On the other hand, real-time usages would likely trade off throughput to deliver the results at minimal latency. A combined scenario is also possible, targeting the highest possible throughput, while maintaining a specific latency threshold.

 It is also important to understand how the full-stack application would use the inference component "end-to-end." For example, to know what stages need to be orchestrated to save workload devoted to fetching and preparing input data.

-
-
-
 For more information on this topic, see the following articles:

-@sphinxdirective
 * :ref:`feature support by device <devicesupport-feature-support-matrix>`
-@endsphinxdirective
+* :ref:`Inputs Pre-processing with the OpenVINO <inputs_pre_processing>`
+* :ref:`Async API <async_api>`
+* :ref:`The 'get_tensor' Idiom <tensor_idiom>`
+* For variably-sized inputs, consider :doc:`dynamic shapes <openvino_docs_OV_UG_DynamicShapes>`


-* [Inputs Pre-processing with the OpenVINO](@ref inputs_pre_processing)
-* [Async API](@ref async_api)
-* [The 'get_tensor' Idiom](@ref tensor_idiom)
-* For variably-sized inputs, consider [dynamic shapes](../OV_Runtime_UG/ov_dynamic_shapes.md)
+See the :doc:`latency <openvino_docs_deployment_optimization_guide_latency>` and :doc:`throughput <openvino_docs_deployment_optimization_guide_tput>` optimization guides, for **use-case-specific optimizations**

+Writing Performance-Portable Inference Applications
+###################################################

-
-
-See the [latency](./dldt_deployment_optimization_latency.md) and [throughput](./dldt_deployment_optimization_tput.md) optimization guides, for **use-case-specific optimizations**
-
-## Writing Performance-Portable Inference Applications
 Although inference performed in OpenVINO Runtime can be configured with a multitude of low-level performance settings, it is not recommended in most cases. Firstly, achieving the best performance with such adjustments requires deep understanding of device architecture and the inference engine.


 Secondly, such optimization may not translate well to other device-model combinations. In other words, one set of execution parameters is likely to result in different performance when used under different conditions. For example:
-   * both the CPU and GPU support the notion of [streams](./dldt_deployment_optimization_tput_advanced.md), yet they deduce their optimal number very differently.
-   * Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU.
-   * Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization.
-   * Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, [do not always map well to the CPU](dldt_deployment_optimization_internals.md).
+
+* both the CPU and GPU support the notion of :ref:`streams <openvino_docs_deployment_optimization_guide_tput_advanced>`, yet they deduce their optimal number very differently.
+* Even among devices of the same type, different execution configurations can be considered optimal, as in the case of instruction sets or the number of cores for the CPU and the batch size for the GPU.
+* Different models have different optimal parameter configurations, considering factors such as compute vs memory-bandwidth, inference precision, and possible model quantization.
+* Execution "scheduling" impacts performance strongly and is highly device-specific, for example, GPU-oriented optimizations like batching, combining multiple inputs to achieve the optimal throughput, :doc:`do not always map well to the CPU <openvino_docs_deployment_optimization_guide_internals>`.


-To make the configuration process much easier and its performance optimization more portable, the option of [Performance Hints](../OV_Runtime_UG/performance_hints.md) has been introduced. It comprises two high-level "presets" focused on either **latency** or **throughput** and, essentially, makes execution specifics irrelevant.
+To make the configuration process much easier and its performance optimization more portable, the option of :doc:`Performance Hints <openvino_docs_OV_UG_Performance_Hints>` has been introduced. It comprises two high-level "presets" focused on either **latency** or **throughput** and, essentially, makes execution specifics irrelevant.

 The Performance Hints functionality makes configuration transparent to the application, for example, anticipates the need for explicit (application-side) batching or streams, and facilitates parallel processing of separate infer requests for different input sources


-## Additional Resources
+Additional Resources
+####################

-* [Using Async API and running multiple inference requests in parallel to leverage throughput](@ref throughput_app_design).
-* [The throughput approach implementation details for specific devices](dldt_deployment_optimization_internals.md)
-* [Details on throughput](dldt_deployment_optimization_tput.md)
-* [Details on latency](dldt_deployment_optimization_latency.md)
-* [API examples and details](../OV_Runtime_UG/performance_hints.md).
+* :ref:`Using Async API and running multiple inference requests in parallel to leverage throughput <throughput_app_design>`.
+* :doc:`The throughput approach implementation details for specific devices <openvino_docs_deployment_optimization_guide_internals>`
+* :doc:`Details on throughput <openvino_docs_deployment_optimization_guide_tput>`
+* :doc:`Details on latency <openvino_docs_deployment_optimization_guide_latency>`
+* :doc:`API examples and details <openvino_docs_OV_UG_Performance_Hints>`
+
+@endsphinxdirective
--- a/docs/optimization_guide/dldt_deployment_optimization_latency.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_latency.md
@ -8,38 +8,38 @@

   openvino_docs_OV_UG_Model_caching_overview

-@endsphinxdirective
+

 A significant portion of deep learning use cases involve applications loading a single model and using a single input at a time, which is the of typical "consumer" scenario.
-While an application can create more than one request if needed, for example to support [asynchronous inputs population](@ref async_api), its **inference performance depends on how many requests are being inferenced in parallel** on a device.
+While an application can create more than one request if needed, for example to support :ref:`asynchronous inputs population <async_api>`, its **inference performance depends on how many requests are being inferenced in parallel** on a device.

 Similarly, when multiple models are served on the same device, it is important whether the models are executed simultaneously or in a chain, for example, in the inference pipeline.
 As expected, the easiest way to achieve **low latency is by running only one inference at a time** on one device. Accordingly, any additional concurrency usually results in latency rising fast.

 However, some conventional "root" devices (i.e., CPU or GPU) can be in fact internally composed of several "sub-devices". In many cases, letting OpenVINO leverage the "sub-devices" transparently helps to improve application's throughput (e.g., serve multiple clients simultaneously) without degrading latency. For example, multi-socket CPUs can deliver as many requests at the same minimal latency as there are NUMA nodes in the system. Similarly, a multi-tile GPU, which is essentially multiple GPUs in a single package, can deliver a multi-tile scalability with the number of inference requests, while preserving the single-tile latency.

-Typically, human expertise is required to get more "throughput" out of the device, even in the inherently latency-oriented cases. OpenVINO can take this configuration burden via [high-level performance hints](../OV_Runtime_UG/performance_hints.md), the `ov::hint::PerformanceMode::LATENCY` specified for the `ov::hint::performance_mode` property for the `compile_model`.
+Typically, human expertise is required to get more "throughput" out of the device, even in the inherently latency-oriented cases. OpenVINO can take this configuration burden via :doc:`high-level performance hints <openvino_docs_OV_UG_Performance_Hints>`, the `ov::hint::PerformanceMode::LATENCY <enumov_1_1hint_1_1PerformanceMode.html#doxid-group-ov-runtime-cpp-prop-api-1gga032aa530efa40760b79af14913d48d73a501069dd75f76384ba18f133fdce99c2>`__ specified for the ``ov::hint::performance_mode`` property for the ``compile_model``.

-> **NOTE**: [OpenVINO performance hints](../OV_Runtime_UG/performance_hints.md) is a recommended way for performance configuration, which is both device-agnostic and future-proof.
+.. note::
+
+   :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` is a recommended way for performance configuration, which is both device-agnostic and future-proof.


-@sphinxdirective
-* feature support by device 
+* feature support by device


 When multiple models are to be used simultaneously, consider running inference on separate devices for each of them. Finally, when multiple models are executed in parallel on a device, using additional ``ov::hint::model_priority`` may help to define relative priorities of the models. Refer to the documentation on the :ref:`OpenVINO feature support for devices <devicesupport-feature-support-matrix>` to check if your device supports the feature.

-@endsphinxdirective
-
-
 **First-Inference Latency and Model Load/Compile Time**

 In some cases, model loading and compilation contribute to the "end-to-end" latency more than usual. 
 For example, when the model is used exactly once, or when it is unloaded and reloaded in a cycle, to free the memory for another inference due to on-device memory limitations.

 Such a "first-inference latency" scenario may pose an additional limitation on the model load\compilation time, as inference accelerators (other than the CPU) usually require a certain level of model compilation upon loading.
-The [model caching](../OV_Runtime_UG/Model_caching_overview.md) option is a way to lessen the impact over multiple application runs. If model caching is not possible, for example, it may require write permissions for the application, the CPU offers the fastest model load time almost every time. 
+The :doc:`model caching <openvino_docs_OV_UG_Model_caching_overview>` option is a way to lessen the impact over multiple application runs. If model caching is not possible, for example, it may require write permissions for the application, the CPU offers the fastest model load time almost every time. 

-Another way of dealing with first-inference latency is using the [AUTO device selection inference mode](../OV_Runtime_UG/auto_device_selection.md). It starts inference on the CPU, while waiting for the actual accelerator to load the model. At that point, it shifts to the new device seamlessly.
+Another way of dealing with first-inference latency is using the :doc:`AUTO device selection inference mode <openvino_docs_OV_UG_supported_plugins_AUTO>`. It starts inference on the CPU, while waiting for the actual accelerator to load the model. At that point, it shifts to the new device seamlessly.

-Finally, note that any [throughput-oriented options](./dldt_deployment_optimization_tput.md) may significantly increase the model uptime.
+Finally, note that any :doc:`throughput-oriented options <openvino_docs_deployment_optimization_guide_tput>` may significantly increase the model uptime.
+
+@endsphinxdirective
--- a/docs/optimization_guide/dldt_deployment_optimization_tput.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_tput.md
@ -1,50 +1,66 @@
 # Optimizing for Throughput {#openvino_docs_deployment_optimization_guide_tput}

-As described in the section on the [latency-specific considerations](./dldt_deployment_optimization_latency.md), one of the possible use cases is *delivering every single request at the minimal delay*.
+@sphinxdirective
+
+As described in the section on the :doc:`latency-specific considerations <openvino_docs_deployment_optimization_guide_latency>`, one of the possible use cases is *delivering every single request at the minimal delay*.
 Throughput, on the other hand, is about inference scenarios in which potentially **large number of inference requests are served simultaneously to improve the device utilization**.

 The associated increase in latency is not linearly dependent on the number of requests executed in parallel.
 A trade-off between overall throughput and serial performance of individual requests can be achieved with the right performance configuration of OpenVINO.

-##  Basic and Advanced Ways of Leveraging Throughput 
+Basic and Advanced Ways of Leveraging Throughput
+################################################
+
 There are two ways of leveraging throughput with individual devices:
-* **Basic (high-level)** flow with [OpenVINO performance hints](../OV_Runtime_UG/performance_hints.md) which is inherently **portable and future-proof**.
-* **Advanced (low-level)** approach of explicit  **batching** and **streams**. For more details, see the [runtime inference optimizations](dldt_deployment_optimization_tput_advanced.md).
+
+* **Basic (high-level)** flow with :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` which is inherently **portable and future-proof**.
+* **Advanced (low-level)** approach of explicit  **batching** and **streams**. For more details, see the :doc:`runtime inference optimizations <openvino_docs_deployment_optimization_guide_tput_advanced>`

 In both cases, the application should be designed to execute multiple inference requests in parallel, as described in the following section.

-@anchor throughput_app_design
-## Throughput-Oriented Application Design
+.. _throughput_app_design:
+
+Throughput-Oriented Application Design
+######################################
+
 In general, most throughput-oriented inference applications should:
+
 * Expose substantial amounts of *input* parallelism (e.g. process multiple video- or audio- sources, text documents, etc).
 * Decompose the data flow into a collection of concurrent inference requests that are aggressively scheduled to be executed in parallel:
-   * Setup the configuration for the *device* (for example, as parameters of the `ov::Core::compile_model`) via either previously introduced [low-level explicit options](dldt_deployment_optimization_tput_advanced.md) or [OpenVINO performance hints](../OV_Runtime_UG/performance_hints.md) (**preferable**):
-@sphinxdirective

-.. tab:: C++
-
-      .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp
-         :language: cpp
-         :fragment: [compile_model]
-
-.. tab:: Python
-
-      .. doxygensnippet:: docs/snippets/ov_auto_batching.py
-         :language: python
-         :fragment: [compile_model]
-
-@endsphinxdirective
+  * Setup the configuration for the *device* (for example, as parameters of the ``ov::Core::compile_model``) via either previously introduced :doc:`low-level explicit options <openvino_docs_deployment_optimization_guide_tput_advanced>` or :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` (**preferable**):


-   * Query the `ov::optimal_number_of_infer_requests` from the `ov::CompiledModel` (resulted from a compilation of the model for the device) to create the number of the requests required to saturate the device.
-* Use the Async API with callbacks, to avoid any dependency on the completion order of the requests and possible device starvation, as explained in the [common-optimizations section](@ref openvino_docs_deployment_optimization_guide_common).
+    .. tab:: C++

-## Multi-Device Execution
-OpenVINO offers the automatic, scalable [multi-device inference mode](../OV_Runtime_UG/multi_device.md), which is a simple *application-transparent* way to improve throughput. There is no need to re-architecture existing applications for any explicit multi-device support: no explicit network loading to each device, no separate per-device queues, no additional logic to balance inference requests between devices, etc. For the application using it, multi-device is like any other device, as it manages all processes internally.
+          .. doxygensnippet:: docs/snippets/ov_auto_batching.cpp
+             :language: cpp
+             :fragment: [compile_model]
+
+    .. tab:: Python
+
+          .. doxygensnippet:: docs/snippets/ov_auto_batching.py
+             :language: python
+             :fragment: [compile_model]
+
+
+  * Query the ``ov::optimal_number_of_infer_requests`` from the ``ov::CompiledModel`` (resulted from a compilation of the model for the device) to create the number of the requests required to saturate the device.
+
+* Use the Async API with callbacks, to avoid any dependency on the completion order of the requests and possible device starvation, as explained in the :doc:`common-optimizations section <openvino_docs_deployment_optimization_guide_common>`.
+
+Multi-Device Execution
+######################
+
+OpenVINO offers the automatic, scalable :doc:`multi-device inference mode <openvino_docs_OV_UG_Running_on_multiple_devices>`, which is a simple *application-transparent* way to improve throughput. There is no need to re-architecture existing applications for any explicit multi-device support: no explicit network loading to each device, no separate per-device queues, no additional logic to balance inference requests between devices, etc. For the application using it, multi-device is like any other device, as it manages all processes internally.
 Just like with other throughput-oriented scenarios, there are several major pre-requisites for optimal multi-device performance:
-*	Using the [Asynchronous API](@ref async_api) and [callbacks](../OV_Runtime_UG/ov_infer_request.md) in particular.
-*	Providing the multi-device (and hence the underlying devices) with enough data to crunch. As the inference requests are naturally independent data pieces, the multi-device performs load-balancing at the "requests" (outermost) level to minimize the scheduling overhead.
+
+* Using the :ref:`Asynchronous API <async_api>` and :doc:`callbacks <openvino_docs_OV_UG_Infer_request>` in particular.
+* Providing the multi-device (and hence the underlying devices) with enough data to crunch. As the inference requests are naturally independent data pieces, the multi-device performs load-balancing at the "requests" (outermost) level to minimize the scheduling overhead.

 Keep in mind that the resulting performance is usually a fraction of the "ideal" (plain sum) value, when the devices compete for certain resources such as the memory-bandwidth, which is shared between CPU and iGPU.

-> **NOTE**: While the legacy approach of optimizing the parameters of each device separately works, the [OpenVINO performance hints](../OV_Runtime_UG/performance_hints.md) allow configuring all devices (that are part of the specific multi-device configuration) at once.
+.. note::
+
+   While the legacy approach of optimizing the parameters of each device separately works, the :doc:`OpenVINO performance hints <openvino_docs_OV_UG_Performance_Hints>` allow configuring all devices (that are part of the specific multi-device configuration) at once.
+
+@endsphinxdirective
--- a/docs/optimization_guide/dldt_deployment_optimization_tput_advanced.md
+++ b/docs/optimization_guide/dldt_deployment_optimization_tput_advanced.md
@ -1,42 +1,62 @@
 # Using Advanced Throughput Options: Streams and Batching {#openvino_docs_deployment_optimization_guide_tput_advanced}

-## OpenVINO Streams
-As explained in the [common-optimizations section](@ref openvino_docs_deployment_optimization_guide_common), running multiple inference requests asynchronously is important for general application efficiency.
+@sphinxdirective
+
+OpenVINO Streams
+####################
+
+As explained in the :doc:`common-optimizations section <openvino_docs_deployment_optimization_guide_common>`, running multiple inference requests asynchronously is important for general application efficiency.
 Internally, every device implements a queue, which acts as a buffer, storing the inference requests until retrieved by the device at its own pace. 
 The devices may actually process multiple inference requests in parallel in order to improve the device utilization and overall throughput.
 This configurable method of this device-side parallelism is commonly referred as **streams**.

-> **NOTE**: Be aware that streams are **really executing the requests in parallel, but not in the lock step** (as the batching does), which makes the streams fully compatible with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md), while individual requests can have different shapes.
+.. note::

-> **NOTE**: Most OpenVINO devices (including CPU and GPU) support the streams, yet the *optimal* number of the streams is deduced very differently. More information on this topic can be found in the section [below](@ref stream_considerations).
+   Be aware that streams are **really executing the requests in parallel, but not in the lock step** (as the batching does), which makes the streams fully compatible with :doc:`dynamically-shaped inputs <openvino_docs_OV_UG_DynamicShapes>`, while individual requests can have different shapes.
+
+.. note::
+
+   Most OpenVINO devices (including CPU and GPU) support the streams, yet the *optimal* number of the streams is deduced very differently. More information on this topic can be found in the section `below <#number-of-streams-considerations>`__.

 A few general considerations:
+
 * Using the streams does increase the latency of an individual request:
-   * When the number of streams is not specified, a device creates a bare minimum of streams (usually, just one), as the latency-oriented case is default.
-   * See further tips for the optimal number of the streams [below](@ref throughput_advanced).
+
+  * When the number of streams is not specified, a device creates a bare minimum of streams (usually, just one), as the latency-oriented case is default.
+  * See further tips for the optimal number of the streams `below <#choosing-the-number-of-streams-and-or-batch-size>`__.
+
 * Streams are memory-intensive, as every stream duplicates the intermediate buffers to do inference in parallel to the rest of the streams:
-   * Always prefer streams over creating multiple `ov:Compiled_Model` instances for the same model, as weights memory is shared across streams, reducing the memory consumption.
+
+  * Always prefer streams over creating multiple ``ov:Compiled_Model`` instances for the same model, as weights memory is shared across streams, reducing the memory consumption.
+
 * Keep in mind that the streams also inflate the model load (compilation) time.

 For efficient asynchronous execution, the streams are actually handling the inference with a special pool of the threads (a thread per stream).
-Each time you start inference requests (potentially from different application threads), they are actually muxed into an inference queue of the particular `ov:Compiled_Model`. 
+Each time you start inference requests (potentially from different application threads), they are actually muxed into an inference queue of the particular ``ov:Compiled_Model``. 
 If there is a vacant stream, it pulls the request from the queue and actually expedites that to the on-device execution.
-There are further device-specific details, like for the CPU, in the [internals](dldt_deployment_optimization_internals.md) section.
+There are further device-specific details, like for the CPU, in the :doc:`internals <openvino_docs_deployment_optimization_guide_internals>` section.
+
+Batching
+####################

-## Batching
 Hardware accelerators such as GPUs are optimized for a massive compute parallelism, so the batching helps to saturate the device and leads to higher throughput.
-While the streams (described in previous section) already help to hide the communication overheads and certain bubbles in the scheduling, running multiple OpenCL kernels simultaneously is less GPU-efficient compared to calling a kernel on the multiple inputs at once.   
+While the streams (described in previous section) already help to hide the communication overheads and certain bubbles in the scheduling, running multiple OpenCL kernels simultaneously is less GPU-efficient compared to calling a kernel on the multiple inputs at once.
 As explained in the next section, the batching is a must to leverage maximum throughput on the GPU.

 There are several primary methods of using the batching to help application performance:
+
 * Collecting the inputs explicitly on the application side and then **sending the batch requests to OpenVINO**:
-   * Although this gives flexibility with the possible batching strategies, the approach requires redesigning the application logic.
-* **Sending individual requests**, while configuring OpenVINO to collect and perform inference on the requests in batch [automatically](../OV_Runtime_UG/automatic_batching.md).
+
+  * Although this gives flexibility with the possible batching strategies, the approach requires redesigning the application logic.
+
+* **Sending individual requests**, while configuring OpenVINO to collect and perform inference on the requests in batch :doc:`automatically <openvino_docs_OV_UG_Automatic_Batching>`.

 In both cases, the optimal batch size is very device-specific. As explained below, the optimal batch size also depends on the model, inference precision and other factors.

-@anchor throughput_advanced
-## Choosing the Number of Streams and/or Batch Size
+
+Choosing the Number of Streams and/or Batch Size
+################################################
+
 Predicting the inference performance is difficult and finding optimal execution parameters requires direct experiments with measurements.
 Run performance testing in the scope of development, and make sure to validate overall (*end-to-end*) application performance.

@ -46,33 +66,54 @@ In some cases, combination of streams and batching may be required to maximize t

 One possible throughput optimization strategy is to **set an upper bound for latency and then increase the batch size and/or number of the streams until that tail latency is met (or the throughput is not growing anymore)**.

-> **NOTE**: When playing with [dynamically-shaped inputs](../OV_Runtime_UG/ov_dynamic_shapes.md), use only the streams (no batching), as they tolerate individual requests having different shapes. 
+.. note::

-> **NOTE**: Using the [High-Level Performance Hints](../OV_Runtime_UG/performance_hints.md) is the alternative, portable and future-proof option, allowing OpenVINO to find the best combination of streams and batching for a given scenario and a model. 
+   When playing with :doc:`dynamically-shaped inputs <openvino_docs_OV_UG_DynamicShapes>`, use only the streams (no batching), as they tolerate individual requests having different shapes.
+
+.. note::
+
+   Using the :doc:`High-Level Performance Hints <openvino_docs_OV_UG_Performance_Hints>` is the alternative, portable and future-proof option, allowing OpenVINO to find the best combination of streams and batching for a given scenario and a model. 
+
+Number of Streams Considerations
++++++++++++++++++++++++++++++++

-@anchor stream_considerations
-### Number of Streams Considerations
 * Select the number of streams that is **less or equal** to the number of requests that the application would be able to run simultaneously.
 * To avoid wasting resources, the number of streams should be enough to meet the *average* parallel slack rather than the peak load.
-* Use the `ov::streams::AUTO` as a more portable option (that also respects the underlying hardware configuration).
+* Use the `ov::streams::AUTO <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1gaddb29425af71fbb6ad3379c59342ff0e>`__ as a more portable option (that also respects the underlying hardware configuration).
 * It is very important to keep these streams busy, by running as many inference requests as possible (for example, start the newly-arrived inputs immediately):
-   * A bare minimum of requests to saturate the device can be queried as the `ov::optimal_number_of_infer_requests` of the  `ov:Compiled_Model`.
-* *The maximum number of streams* for the device (per model) can be queried as the `ov::range_for_streams`.

-### Batch Size Considerations
+  * A bare minimum of requests to saturate the device can be queried as the `ov::optimal_number_of_infer_requests <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1ga087c6da667f7c3d8374aec5f6cbba027>`__ of the  ``ov:Compiled_Model``.
+
+* *The maximum number of streams* for the device (per model) can be queried as the `ov::range_for_streams <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1ga8a5d84196f6873729167aa512c34a94a>`__.
+
+Batch Size Considerations
+++++++++++++++++++++++++
+
 * Select the batch size that is **equal** to the number of requests that your application is able to run simultaneously:
-   * Otherwise (or if the number of "available" requests fluctuates), you may need to keep several instances of the network (reshaped to the different batch size) and select the properly sized instance in the runtime accordingly.
-* For OpenVINO devices that implement a dedicated heuristic internally, the `ov::optimal_batch_size` is a *device* property (that accepts the actual model as a parameter) to query the recommended batch size for the model.
+
+  * Otherwise (or if the number of "available" requests fluctuates), you may need to keep several instances of the network (reshaped to the different batch size) and select the properly sized instance in the runtime accordingly.
+
+* For OpenVINO devices that implement a dedicated heuristic internally, the `ov::optimal_batch_size <groupov_runtime_cpp_prop_api.html#doxid-group-ov-runtime-cpp-prop-api-1ga129bad2da2fc2a40a7d746d86fc9c68d>`__ is a *device* property (that accepts the actual model as a parameter) to query the recommended batch size for the model.


-### A Few Device-specific Details
+A Few Device-specific Details
+++++++++++++++++++++++++++++
+
 * For the **GPU**:
-   * When the parallel slack is small, for example, only 2-4 requests executed simultaneously, then using only the streams for the GPU may suffice:
-      * The GPU runs 2 requests per stream, so 4 requests can be served by 2 streams.
-      * Alternatively, consider a single stream with 2 requests (each with a small batch size like 2), which would total the same 4 inputs in flight.
-   * Typically, for 4 and more requests the batching delivers better throughput.
-   * A batch size can be calculated as "a number of inference requests executed in parallel" divided by the "number of requests that the streams consume":
-      * For example, if you process 16 cameras (by 16 requests inferenced *simultaneously*) by 2 GPU streams (each can process two requests), the batch size per request is 16/(2*2)=4.
+
+  * When the parallel slack is small, for example, only 2-4 requests executed simultaneously, then using only the streams for the GPU may suffice:
+
+    * The GPU runs 2 requests per stream, so 4 requests can be served by 2 streams.
+    * Alternatively, consider a single stream with 2 requests (each with a small batch size like 2), which would total the same 4 inputs in flight.
+
+  * Typically, for 4 and more requests the batching delivers better throughput.
+  * A batch size can be calculated as "a number of inference requests executed in parallel" divided by the "number of requests that the streams consume":
+
+    * For example, if you process 16 cameras (by 16 requests inferenced *simultaneously*) by 2 GPU streams (each can process two requests), the batch size per request is 16/(2*2)=4.

 * For the **CPU, always use the streams first!**:
-   * On high-end CPUs, using moderate (2-8) batch size *in addition* to the maximum number of streams may further improve the performance.
+
+  * On high-end CPUs, using moderate (2-8) batch size *in addition* to the maximum number of streams may further improve the performance.
+
+@endsphinxdirective
+
--- a/docs/resources/tensorflow_frontend.md
+++ b/docs/resources/tensorflow_frontend.md
@ -1,8 +1,10 @@
 # OpenVINO TensorFlow Frontend Capabilities and Limitations {#openvino_docs_MO_DG_TensorFlow_Frontend}

+@sphinxdirective
+
 TensorFlow Frontend is C++ based Frontend for conversion of TensorFlow models and is available as a preview feature starting from 2022.3.
-That means that you can start experimenting with `--use_new_frontend` option passed to Model Optimizer to enjoy improved conversion time for limited scope of models
-or directly loading TensorFlow models through `read_model()` method.
+That means that you can start experimenting with ``--use_new_frontend`` option passed to Model Optimizer to enjoy improved conversion time for limited scope of models
+or directly loading TensorFlow models through ``read_model()`` method.

 The current limitations:

@ -10,4 +12,6 @@ The current limitations:
 * There is no full parity yet between legacy Model Optimizer TensorFlow Frontend and new TensorFlow Frontend so primary path for model conversion is still legacy frontend
 * Model coverage and performance is continuously improving so some conversion phase failures, performance and accuracy issues might occur in case model is not yet covered.
 Known unsupported models: object detection models and all models with transformation configs, models with TF1/TF2 control flow, Complex type and training parts
-* `read_model()` method supports only `*.pb` format while Model Optimizer (or `convert_model` call) will accept other formats as well which are accepted by existing legacy frontend
+* ``read_model()`` method supports only ``*.pb`` format while Model Optimizer (or ``convert_model`` call) will accept other formats as well which are accepted by existing legacy frontend
+
+@endsphinxdirective
--- a/docs/snippets/InferenceEngine_network_with_state_infer.cpp
+++ b/docs/snippets/InferenceEngine_network_with_state_infer.cpp
@ -1,115 +0,0 @@
-// Copyright (C) 2018-2020 Intel Corporation
-// SPDX-License-Identifier: Apache-2.0
-//
-
-#include <iostream>
-#include <ie_core.hpp>
-
-using namespace InferenceEngine;
-
-int main(int argc, char *argv[]) {
-    try {
-        // --------------------------- 1. Load inference engine -------------------------------------
-        std::cout << "Loading Inference Engine" << std::endl;
-        Core ie;
-
-        // 2. Read a model in OpenVINO Intermediate Representation (.xml and .bin files) or ONNX (.onnx file) format
-        std::cout << "Loading network files" << std::endl;
-        CNNNetwork network;
-        network = ie.ReadNetwork(std::string("c:\\work\\git\\github_dldt3\\openvino\\model-optimizer\\summator.xml"));
-        network.setBatchSize(1);
-
-        // 3. Load network to CPU
-        ExecutableNetwork executableNet = ie.LoadNetwork(network, "CPU");
-        // 4. Create Infer Request
-        InferRequest inferRequest = executableNet.CreateInferRequest();
-
-        // 5. Prepare inputs
-        ConstInputsDataMap cInputInfo = executableNet.GetInputsInfo();
-        std::vector<Blob::Ptr> ptrInputBlobs;
-        for (const auto& input : cInputInfo) {
-            ptrInputBlobs.push_back(inferRequest.GetBlob(input.first));
-        }
-        InputsDataMap inputInfo;
-        inputInfo = network.getInputsInfo();
-        for (auto &item : inputInfo) {
-            Precision inputPrecision = Precision::FP32;
-            item.second->setPrecision(inputPrecision);
-        }
-
-        // 6. Prepare outputs
-        std::vector<Blob::Ptr> ptrOutputBlobs;
-        ConstOutputsDataMap cOutputInfo = executableNet.GetOutputsInfo();
-        for (const auto& output : cOutputInfo) {
-            ptrOutputBlobs.push_back(inferRequest.GetBlob(output.first));
-        }
-        
-        // 7. Initialize memory state before starting
-        for (auto &&state : inferRequest.QueryState()) {
-            state.Reset();
-        }
-
-        //! [part1]
-        // input data
-        std::vector<float> data = { 1,2,3,4,5,6};
-        // infer the first utterance
-        for (size_t next_input = 0; next_input < data.size()/2; next_input++) {
-            MemoryBlob::Ptr minput = as<MemoryBlob>(ptrInputBlobs[0]);
-            auto minputHolder = minput->wmap();
-
-            std::memcpy(minputHolder.as<void *>(),
-                &data[next_input],
-                sizeof(float));
-
-            inferRequest.Infer();
-            // check states
-            auto states = inferRequest.QueryState();
-            if (states.empty()) {
-                throw std::runtime_error("Queried states are empty");
-            }
-            auto mstate = as<MemoryBlob>(states[0].GetState());
-            if (mstate == nullptr) {
-                throw std::runtime_error("Can't cast state to MemoryBlob");
-            }
-            auto state_buf = mstate->rmap();
-            float * state =state_buf.as<float*>(); 
-            std::cout << state[0] << "\n";
-        }
-
-        // resetting state between utterances
-        std::cout<<"Reset state\n";
-        for (auto &&state : inferRequest.QueryState()) {
-            state.Reset();
-        }
-
-        // infer the second utterance
-        for (size_t next_input = data.size()/2; next_input < data.size(); next_input++) {
-            MemoryBlob::Ptr minput = as<MemoryBlob>(ptrInputBlobs[0]);
-            auto minputHolder = minput->wmap();
-
-            std::memcpy(minputHolder.as<void *>(),
-                &data[next_input],
-                sizeof(float));
-
-            inferRequest.Infer();
-            // check states
-            auto states = inferRequest.QueryState();
-            auto mstate = as<MemoryBlob>(states[0].GetState());
-            auto state_buf = mstate->rmap();
-            float * state =state_buf.as<float*>(); 
-            std::cout << state[0] << "\n";
-      }
-        //! [part1]
-    }
-    catch (const std::exception &error) {
-        std::cerr << error.what() << std::endl;
-        return 1;
-    }
-    catch (...) {
-        std::cerr << "Unknown/internal exception happened" << std::endl;
-        return 1;
-    }
-
-    std::cerr << "Execution successful" << std::endl;
-    return 0;
-}
--- a/docs/snippets/ov_model_with_state_infer.cpp
+++ b/docs/snippets/ov_model_with_state_infer.cpp
@ -0,0 +1,118 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <iostream>
+
+#include "openvino/op/util/variable.hpp"
+#include "openvino/openvino.hpp"
+#include "openvino/opsets/opset11.hpp"
+
+int main(int argc, char* argv[]) {
+    try {
+        // --------------------------- 1. Load inference engine -------------------------------------
+        std::cout << "Loading OpenVINO" << std::endl;
+        ov::Core core;
+
+        //! [model_create]
+        auto arg = std::make_shared<ov::opset11::Parameter>(ov::element::f32, ov::Shape{1, 1});
+        auto init_const = ov::opset11::Constant::create(ov::element::f32, ov::Shape{1, 1}, {0});
+
+        // The ReadValue/Assign operations must be used in pairs in the model.
+        // For each such a pair, its own variable object must be created.
+        const std::string variable_name("variable0");
+        auto variable = std::make_shared<ov::op::util::Variable>(
+            ov::op::util::VariableInfo{ov::PartialShape::dynamic(), ov::element::dynamic, variable_name});
+
+        // Creating ov::Model
+        auto read = std::make_shared<ov::opset11::ReadValue>(init_const, variable);
+        std::vector<std::shared_ptr<ov::Node>> args = {arg, read};
+        auto add = std::make_shared<ov::opset11::Add>(arg, read);
+        auto assign = std::make_shared<ov::opset11::Assign>(add, variable);
+        auto add2 = std::make_shared<ov::opset11::Add>(add, read);
+        auto res = std::make_shared<ov::opset11::Result>(add2);
+
+        auto model =
+            std::make_shared<ov::Model>(ov::ResultVector({res}), ov::SinkVector({assign}), ov::ParameterVector({arg}));
+        //! [model_create]
+
+        // 2. Read a model in OpenVINO Intermediate Representation (.xml and .bin files) or ONNX (.onnx file) format
+        std::cout << "Loading network files" << std::endl;
+
+        // 3. Load network to CPU
+        ov::CompiledModel compiled_model = core.compile_model(model, "CPU");
+        // 4. Create Infer Request
+        ov::InferRequest infer_request = compiled_model.create_infer_request();
+
+        // 5. Prepare inputs
+
+        std::vector<ov::Tensor> input_tensors;
+        for (const auto& input : compiled_model.inputs()) {
+            input_tensors.emplace_back(infer_request.get_tensor(input));
+        }
+
+        // 6. Prepare outputs
+        std::vector<ov::Tensor> output_tensors;
+        for (const auto& output : compiled_model.outputs()) {
+            output_tensors.emplace_back(infer_request.get_tensor(output));
+        }
+
+        // 7. Initialize memory state before starting
+        for (auto&& state : infer_request.query_state()) {
+            state.reset();
+        }
+
+        //! [part1]
+        // input data
+        std::vector<float> data = {1, 2, 3, 4, 5, 6};
+        // infer the first utterance
+        for (size_t next_input = 0; next_input < data.size() / 2; next_input++) {
+            auto minput = input_tensors[0];
+
+            std::memcpy(minput.data(), &data[next_input], sizeof(float));
+
+            infer_request.infer();
+            // check states
+            auto states = infer_request.query_state();
+            if (states.empty()) {
+                throw std::runtime_error("Queried states are empty");
+            }
+            auto mstate = states[0].get_state();
+            if (!mstate) {
+                throw std::runtime_error("Can't cast state to MemoryBlob");
+            }
+            float* state = mstate.data<float>();
+            std::cout << state[0] << "\n";
+        }
+
+        // resetting state between utterances
+        std::cout << "Reset state\n";
+        for (auto&& state : infer_request.query_state()) {
+            state.reset();
+        }
+
+        // infer the second utterance
+        for (size_t next_input = data.size() / 2; next_input < data.size(); next_input++) {
+            auto minput = input_tensors[0];
+
+            std::memcpy(minput.data(), &data[next_input], sizeof(float));
+
+            infer_request.infer();
+            // check states
+            auto states = infer_request.query_state();
+            auto mstate = states[0].get_state();
+            float* state = mstate.data<float>();
+            std::cout << state[0] << "\n";
+        }
+        //! [part1]
+    } catch (const std::exception& error) {
+        std::cerr << error.what() << std::endl;
+        return 1;
+    } catch (...) {
+        std::cerr << "Unknown/internal exception happened" << std::endl;
+        return 1;
+    }
+
+    std::cerr << "Execution successful" << std::endl;
+    return 0;
+}
--- a/docs/tutorials.md
+++ b/docs/tutorials.md
@ -227,11 +227,8 @@ Demos that demonstrate inference on a particular model.
   +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
   | `234-encodec-audio-compression <notebooks/234-encodec-audio-compression-with-output.html>`__                                  | Audio compression with EnCodec and OpenVINO™                                                                                               | |n234-img1|                               |
   +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+
-
-
-
-
-
+   | `235-controlnet-stable-diffusion <notebooks/235-controlnet-stable-diffusion-with-output.html>`__                              | A Text-to-Image Generation with ControlNet Conditioning and OpenVINO™                                                                      | |n235-img1|                               |
+   +-------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------+


 .. raw:: html
@ -445,6 +442,8 @@ Made with `contributors-img <https://contrib.rocks>`__.
   :target: https://user-images.githubusercontent.com/29454499/221933762-4ff32ecb-5e5d-4484-80e1-e9396cb3c511.png
 .. |n234-img1| image:: https://github.com/facebookresearch/encodec/raw/main/thumbnail.png
   :target: https://github.com/facebookresearch/encodec/raw/main/thumbnail.png
+.. |n235-img1| image:: https://user-images.githubusercontent.com/29454499/224541412-9d13443e-0e42-43f2-8210-aa31820c5b44.png
+   :target: https://user-images.githubusercontent.com/29454499/224541412-9d13443e-0e42-43f2-8210-aa31820c5b44.png
 .. |n301-img1| image:: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
   :target: https://user-images.githubusercontent.com/15709723/127779607-8fa34947-1c35-4260-8d04-981c41a2a2cc.png
 .. |n401-img1| image:: https://user-images.githubusercontent.com/4547501/141471665-82b28c86-cf64-4bfe-98b3-c314658f2d96.gif
--- a/src/bindings/python/src/pyopenvino/graph/passes/manager.cpp
+++ b/src/bindings/python/src/pyopenvino/graph/passes/manager.cpp
@ -53,114 +53,4 @@ void regclass_passes_Manager(py::module m) {
                :param transformation: transformation instance.
                :type transformation: openvino.runtime.passes.PassBase
    )");
-
-    manager.def(
-        "register_pass",
-        [](ov::pass::Manager& self, const std::string& pass_name) -> void {
-            Common::utils::deprecation_warning("register_pass(pass_name)",
-                                               "",
-                                               "Please use register_pass(ConstantFolding()) instead.");
-            if (pass_name == "ConstantFolding") {
-                self.register_pass<ov::pass::ConstantFolding>();
-            }
-        },
-        py::arg("pass_name"),
-        R"(
-                This method is deprecated. Please use m.register_pass(ConstantFolding()) instead.
-
-                Register pass by name from the list of predefined passes.
-
-                :param pass_name: String to set the type of a pass.
-                :type pass_name: str
-    )");
-
-    manager.def(
-        "register_pass",
-        [](ov::pass::Manager& self,
-           const std::string& pass_name,
-           const FilePaths& file_paths,
-           const std::string& version) -> void {
-            Common::utils::deprecation_warning("register_pass(pass_name, output_files, version)",
-                                               "",
-                                               "Please use register_pass(Serialize(xml, bin, version)) instead.");
-            if (pass_name == "Serialize") {
-                self.register_pass<ov::pass::Serialize>(file_paths.first,
-                                                        file_paths.second,
-                                                        Common::utils::convert_to_version(version));
-            }
-        },
-        py::arg("pass_name"),
-        py::arg("output_files"),
-        py::arg("version") = "UNSPECIFIED",
-        R"(
-        This method is deprecated. Please use m.register_pass(Serialize(...)) instead.
-
-        Set the type of register pass for pass manager.
-
-        :param pass_name: String to set the type of a pass.
-        :type pass_name: str
-        :param output_files: Tuple which contains paths where .xml and .bin files will be saved.
-        :type output_files: Tuple[str, str]
-        :param version: Sets the version of the IR which will be generated.
-                                   Supported versions are:
-                                       - "UNSPECIFIED" (default) : Use the latest or function version
-                                       - "IR_V10" : v10 IR
-                                       - "IR_V11" : v11 IR
-        :type version: str
-
-        Examples
-        ----------
-        1. Default Version
-            pass_manager = Manager()
-            pass_manager.register_pass("Serialize", output_files=("example.xml", "example.bin"))
-        2. IR version 11
-            pass_manager = Manager()
-            pass_manager.register_pass("Serialize", output_files=("example.xml", "example.bin"), version="IR_V11")
-    )");
-
-    manager.def(
-        "register_pass",
-        [](ov::pass::Manager& self,
-           const std::string& pass_name,
-           const std::string& xml_path,
-           const std::string& bin_path,
-           const std::string& version) -> void {
-            Common::utils::deprecation_warning("register_pass(pass_name, xml_path, bin_path, version",
-                                               "",
-                                               "Please use register_pass(Serialize(xml, bin, version)) instead.");
-            if (pass_name == "Serialize") {
-                self.register_pass<ov::pass::Serialize>(xml_path, bin_path, Common::utils::convert_to_version(version));
-            }
-        },
-        py::arg("pass_name"),
-        py::arg("xml_path"),
-        py::arg("bin_path"),
-        py::arg("version") = "UNSPECIFIED",
-        R"(
-        This method is deprecated. Please use m.register_pass(Serialize(...)) instead.
-
-        Set the type of register pass for pass manager.
-
-        :param pass_name: String to set the type of a pass.
-        :type pass_name: str
-        :param xml_path: Path where *.xml file will be saved.
-        :type xml_path: str
-        :param bin_path: Path where *.bin file will be saved.
-        :type bin_path: str
-        :param version: Sets the version of the IR which will be generated.
-            Supported versions are:
-                            - "UNSPECIFIED" (default) : Use the latest or function version
-                            - "IR_V10" : v10 IR
-                            - "IR_V11" : v11 IR
-        :type version: str
-
-        Examples
-        ----------
-        1. Default Version
-            pass_manager = Manager()
-            pass_manager.register_pass("Serialize", xml_path="example.xml", bin_path="example.bin")
-        2. IR version 11
-            pass_manager = Manager()
-            pass_manager.register_pass("Serialize", xml_path="example.xml", bin_path="example.bin", version="IR_V11")
-    )");
 }
--- a/src/bindings/python/tests/test_runtime/test_properties.py
+++ b/src/bindings/python/tests/test_runtime/test_properties.py
@ -241,7 +241,7 @@ def test_properties_ro(ov_property_ro, expected_value):
        ),
        (
            properties.intel_cpu.sparse_weights_decompression_rate,
-            "SPARSE_WEIGHTS_DECOMPRESSION_RATE",
+            "CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE",
            (
                (0.1, np.float32(0.1)),
                (2.0, 2.0),
--- a/src/common/offline_transformations/src/compress_quantize_weigths.cpp
+++ b/src/common/offline_transformations/src/compress_quantize_weigths.cpp
@ -4,6 +4,7 @@

 #include <compress_quantize_weights.hpp>
 #include <ngraph/opsets/opset8.hpp>
+#include <ngraph/pattern/op/or.hpp>
 #include <ngraph/pattern/op/wrap_type.hpp>
 #include <ngraph/rt_info.hpp>
 #include <ngraph/validation_util.hpp>
@ -36,7 +37,10 @@ static bool has_dequantization_subgraph(const std::shared_ptr<ngraph::Node>& fir
 }

 ngraph::pass::CompressQuantizeWeights::CompressQuantizeWeights() {
-    auto weights_pattern = pattern::wrap_type<opset8::Constant>();
+    auto weights_const_pattern = pattern::wrap_type<opset8::Constant>();
+    auto weigths_convert_pattern = pattern::wrap_type<opset8::Convert>({weights_const_pattern});
+    OutputVector weights_options{weights_const_pattern, weigths_convert_pattern};
+    auto weights_pattern = std::make_shared<pattern::op::Or>(weights_options);
    auto input_low_pattern = pattern::wrap_type<opset8::Constant>();
    auto input_high_pattern = pattern::wrap_type<opset8::Constant>();
    auto output_low_pattern = pattern::wrap_type<opset8::Constant>();
@ -93,11 +97,14 @@ ngraph::pass::CompressQuantizeWeights::CompressQuantizeWeights() {
            auto new_output_low = op::Constant::create(input_type, Shape{}, {-static_cast<float>(levels / 2)});
            auto new_output_high =
                std::make_shared<opset8::Add>(new_output_low, op::Constant::create(input_type, Shape{}, {levels - 1}));
-            const auto& weights = pattern_value_map.at(weights_pattern);
+            const auto& weights_const = pattern_value_map.at(weights_const_pattern);
            const auto& input_low = pattern_value_map.at(input_low_pattern);
            const auto& input_high = pattern_value_map.at(input_high_pattern);
+            const auto& fq_data_input = pattern_value_map.count(weigths_convert_pattern)
+                                            ? pattern_value_map.at(weigths_convert_pattern)
+                                            : weights_const;
            auto quantize =
-                fq->clone_with_new_inputs({weights, input_low, input_high, new_output_low, new_output_high});
+                fq->clone_with_new_inputs({fq_data_input, input_low, input_high, new_output_low, new_output_high});
            // Convert quantized weights to low precision type
            std::shared_ptr<Node> new_weights = std::make_shared<opset8::Convert>(quantize, quantized_type);
            // Constant fold quantized weights
@ -106,7 +113,7 @@ ngraph::pass::CompressQuantizeWeights::CompressQuantizeWeights() {
            } else {
                return false;
            }
-            new_weights->set_friendly_name(weights.get_node()->get_friendly_name());
+            new_weights->set_friendly_name(weights_const.get_node()->get_friendly_name());

            /*
               Dequantize part is performed by Convert(from low to high precision)->Subtract->Multiply subgraph.
--- a/src/common/transformations/include/transformations/op_conversions/convert_topk11_downgrade.hpp
+++ b/src/common/transformations/include/transformations/op_conversions/convert_topk11_downgrade.hpp
@ -0,0 +1,23 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <openvino/pass/graph_rewrite.hpp>
+#include <transformations_visibility.hpp>
+
+namespace ov {
+namespace pass {
+/**
+ * @ingroup ie_transformation_common_api
+ * @brief Converts TopK version 11 to TopK version 3 if TopK 11 stable attribute is set to false
+ */
+class TRANSFORMATIONS_API ConvertTopK11ToTopK3 : public MatcherPass {
+public:
+    OPENVINO_RTTI("ConvertTopK11ToTopK3", "0");
+    ConvertTopK11ToTopK3();
+};
+
+}  // namespace pass
+}  // namespace ov
--- a/src/common/transformations/src/transformations/common_optimizations/common_optimizations.cpp
+++ b/src/common/transformations/src/transformations/common_optimizations/common_optimizations.cpp
@ -92,6 +92,7 @@
 #include "transformations/op_conversions/convert_softmax_upgrade.hpp"
 #include "transformations/op_conversions/convert_space_to_depth.hpp"
 #include "transformations/op_conversions/convert_subtract.hpp"
+#include "transformations/op_conversions/convert_topk11_downgrade.hpp"
 #include "transformations/op_conversions/convert_xor_to_logical_xor.hpp"
 #include "transformations/op_conversions/detection_output_downgrade.hpp"
 #include "transformations/op_conversions/detection_output_upgrade.hpp"
@ -209,6 +210,7 @@ bool ov::pass::CommonOptimizations::run_on_model(const std::shared_ptr<ov::Model
    REGISTER_PASS(manager, ConvertROIAlign9To3)
    REGISTER_PASS(manager, ConvertMulticlassNms8ToMulticlassNms9)
    REGISTER_PASS(manager, ConvertXorToLogicalXor)
+    REGISTER_PASS(manager, ConvertTopK11ToTopK3)

    auto fq_fusions = manager.register_pass<GraphRewrite>();
    ADD_MATCHER(fq_fusions, FakeQuantizeMulFusion)
--- a/src/common/transformations/src/transformations/op_conversions/convert_topk11_downgrade.cpp
+++ b/src/common/transformations/src/transformations/op_conversions/convert_topk11_downgrade.cpp
@ -0,0 +1,43 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/op_conversions/convert_topk11_downgrade.hpp"
+
+#include <ngraph/pattern/op/wrap_type.hpp>
+#include <ngraph/rt_info.hpp>
+#include <openvino/opsets/opset11.hpp>
+#include <openvino/opsets/opset3.hpp>
+
+#include "itt.hpp"
+
+ov::pass::ConvertTopK11ToTopK3::ConvertTopK11ToTopK3() {
+    MATCHER_SCOPE(ConvertTopK11ToTopK3);
+
+    const auto topk_v11_pattern = pattern::wrap_type<opset11::TopK>();
+
+    const matcher_pass_callback callback = [=](pattern::Matcher& m) {
+        const auto topk_v11 = std::dynamic_pointer_cast<opset11::TopK>(m.get_match_root());
+        if (!topk_v11 || topk_v11->get_stable() || transformation_callback(topk_v11)) {
+            return false;
+        }
+
+        // downgrade only if the stable sort is NOT required
+
+        const auto topk_v3 = std::make_shared<opset3::TopK>(topk_v11->input_value(0),
+                                                            topk_v11->input_value(1),
+                                                            topk_v11->get_axis(),
+                                                            topk_v11->get_mode(),
+                                                            topk_v11->get_sort_type(),
+                                                            topk_v11->get_index_element_type());
+
+        topk_v3->set_friendly_name(topk_v11->get_friendly_name());
+        copy_runtime_info(topk_v11, topk_v3);
+        replace_node(topk_v11, topk_v3);
+
+        return true;
+    };
+
+    auto m = std::make_shared<pattern::Matcher>(topk_v11_pattern, matcher_name);
+    register_matcher(m, callback);
+}
--- a/src/common/transformations/tests/op_conversions/convert_topk11_downgrade_test.cpp
+++ b/src/common/transformations/tests/op_conversions/convert_topk11_downgrade_test.cpp
@ -0,0 +1,64 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <gtest/gtest.h>
+
+#include <memory>
+#include <openvino/opsets/opset11.hpp>
+#include <openvino/opsets/opset3.hpp>
+#include <openvino/pass/manager.hpp>
+#include <transformations/op_conversions/convert_topk11_downgrade.hpp>
+#include <transformations/utils/utils.hpp>
+
+#include "common_test_utils/ngraph_test_utils.hpp"
+
+using namespace testing;
+
+TEST_F(TransformationTestsF, ConvertTopK11ToTopK3) {
+    {
+        const auto input = std::make_shared<ov::opset11::Parameter>(ov::element::i32, ov::Shape{2, 3, 4});
+        const auto k = std::make_shared<ov::opset11::Parameter>(ov::element::i8, ov::Shape{});
+        const auto topk = std::make_shared<ov::opset11::TopK>(input,
+                                                              k,
+                                                              -2,
+                                                              ov::op::TopKMode::MAX,
+                                                              ov::op::TopKSortType::SORT_VALUES,
+                                                              ov::element::i64,
+                                                              false);
+        topk->set_friendly_name("topk11");
+
+        function = std::make_shared<ov::Model>(topk->outputs(), ov::ParameterVector{input, k});
+        manager.register_pass<ov::pass::ConvertTopK11ToTopK3>();
+    }
+
+    {
+        const auto input = std::make_shared<ov::opset3::Parameter>(ov::element::i32, ov::Shape{2, 3, 4});
+        const auto k = std::make_shared<ov::opset3::Parameter>(ov::element::i8, ov::Shape{});
+        const auto topk = std::make_shared<ov::opset3::TopK>(input,
+                                                             k,
+                                                             -2,
+                                                             ov::op::TopKMode::MAX,
+                                                             ov::op::TopKSortType::SORT_VALUES,
+                                                             ov::element::i64);
+        topk->set_friendly_name("topk11");
+
+        function_ref = std::make_shared<ov::Model>(topk->outputs(), ov::ParameterVector{input, k});
+    }
+}
+
+TEST_F(TransformationTestsF, ConvertTopK11ToTopK3_fail) {
+    const auto input = std::make_shared<ov::opset11::Parameter>(ov::element::i32, ov::Shape{2, 3, 4});
+    const auto k = std::make_shared<ov::opset11::Parameter>(ov::element::i8, ov::Shape{});
+    const auto topk = std::make_shared<ov::opset11::TopK>(input,
+                                                          k,
+                                                          -2,
+                                                          ov::op::TopKMode::MAX,
+                                                          ov::op::TopKSortType::SORT_VALUES,
+                                                          ov::element::i64,
+                                                          true);  // stable sort on
+    topk->set_friendly_name("topk11");
+
+    function = std::make_shared<ov::Model>(topk->outputs(), ov::ParameterVector{input, k});
+    manager.register_pass<ov::pass::ConvertTopK11ToTopK3>();
+}
--- a/src/common/transformations/tests/utils/compress_quantize_weights.cpp
+++ b/src/common/transformations/tests/utils/compress_quantize_weights.cpp
@ -31,13 +31,18 @@ struct CompressQuantizeWeightsParams {
    float zero_point_val;
 };

-class CompressQuantizeWeightsTests : public testing::WithParamInterface<CompressQuantizeWeightsParams>,
-                                     public TransformationTestsF {
+class CompressQuantizeWeightsTests
+    : public testing::WithParamInterface<std::tuple<CompressQuantizeWeightsParams, element::Type>>,
+      public TransformationTestsF {
    void SetUp() override {
        TransformationTestsF::SetUp();
-        auto param = GetParam();
+        CompressQuantizeWeightsParams param;
+        ov::element::Type data_prc;
+        std::tie(param, data_prc) = GetParam();
        {
-            auto data = opset8::Constant::create(element::f32, param.shape, param.weights);
+            std::shared_ptr<Node> data = opset8::Constant::create(data_prc, param.shape, param.weights);
+            if (data_prc == element::f16)
+                data = std::make_shared<opset8::Convert>(data, element::f32);
            auto input_low = opset8::Constant::create(element::f32, Shape{}, {param.in_low});
            auto input_high = opset8::Constant::create(element::f32, Shape{}, {param.in_high});
            auto output_low = opset8::Constant::create(element::f32, Shape{}, {param.out_low});
@ -116,7 +121,11 @@ static std::vector<CompressQuantizeWeightsParams> params = {
     -64.25f},
 };

-INSTANTIATE_TEST_SUITE_P(TransformationTests, CompressQuantizeWeightsTests, ::testing::ValuesIn(params));
+static element::TypeVector data_precisions = {element::f32, element::f16};
+
+INSTANTIATE_TEST_SUITE_P(TransformationTests,
+                         CompressQuantizeWeightsTests,
+                         ::testing::Combine(::testing::ValuesIn(params), ::testing::ValuesIn(data_precisions)));

 TEST_F(TransformationTestsF, CompressQuantizeWeightsWithDequantizationSubgraph) {
    {
--- a/src/core/dev_api/openvino/runtime/itensor.hpp
+++ b/src/core/dev_api/openvino/runtime/itensor.hpp
@ -26,12 +26,12 @@ public:
    /**
     * @return A tensor element type
     */
-    virtual const element::Type& get_element_type() const = 0;
+    virtual const ov::element::Type& get_element_type() const = 0;

    /**
     * @return A tensor shape
     */
-    virtual const Shape& get_shape() const = 0;
+    virtual const ov::Shape& get_shape() const = 0;

    /**
     * @brief Returns the total number of elements (a product of all the dims or 1 for scalar)
@ -48,7 +48,7 @@ public:
    /**
     * @return Tensor's strides in bytes
     */
-    virtual const Strides& get_strides() const = 0;
+    virtual const ov::Strides& get_strides() const = 0;

    /**
     * @brief Provides an access to the underlaying host memory
--- a/src/core/dev_api/tensor_conversion_util.hpp
+++ b/src/core/dev_api/tensor_conversion_util.hpp
@ -39,7 +39,7 @@ OPENVINO_DEPRECATED("This function is deprecated and will be removed soon.")
 OPENVINO_API TensorVector wrap_tensors(const std::vector<ngraph::HostTensorPtr>& tensors);

 /**
- * @brief Update output host tensors if they got dynamic shapee before evaluation (not allocated).
+ * @brief Update output host tensors if they got dynamic shape before evaluation (not allocated).
 *
 * Other tensor not requires update as they are created from outputs and points to same data blob.
 *
--- a/src/core/include/openvino/op/util/variable_value.hpp
+++ b/src/core/include/openvino/op/util/variable_value.hpp
@ -8,6 +8,7 @@

 #include "ngraph/runtime/host_tensor.hpp"
 #include "openvino/core/core_visibility.hpp"
+#include "openvino/core/deprecated.hpp"

 namespace ov {
 namespace op {
@ -18,42 +19,69 @@ class OPENVINO_API VariableValue {
 public:
    using Ptr = std::shared_ptr<VariableValue>;
    /// \brief Constructs an uninitialized VariableValue.
-    VariableValue() = default;
+    VariableValue();

    /// \brief Constructor for VariableValue.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
    /// \param value The data for Variable.
-    explicit VariableValue(ngraph::HostTensorPtr value) : m_value(std::move(value)) {}
+    OPENVINO_DEPRECATED(
+        "This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor instead.")
+    explicit VariableValue(ngraph::HostTensorPtr value);

    /// \brief Constructor for VariableValue.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
    /// \param value Data for Variable.
    /// \param reset The current state of the reset flag.
-    VariableValue(ngraph::HostTensorPtr value, bool reset) : m_reset(reset), m_value(std::move(value)) {}
+    OPENVINO_DEPRECATED(
+        "This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor instead.")
+    VariableValue(ngraph::HostTensorPtr value, bool reset);
+
+    /// \brief Returns the current stored data.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
+    OPENVINO_DEPRECATED("This method is deprecated and will be removed in 2024.0 release. Please get_state() instead.")
+    ngraph::HostTensorPtr get_value() const;
+
+    /// \brief Sets new values for Variable.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
+    /// \param value New data for Variable.
+    OPENVINO_DEPRECATED(
+        "This method is deprecated and will be removed in 2024.0 release. Please use set_state() instead.")
+    void set_value(const ngraph::HostTensorPtr& value);

    /// \brief Sets the reset flag to a new state.
    /// \param reset The new state of the reset flag.
-    void set_reset(bool reset) {
-        m_reset = reset;
-    }
+    void set_reset(bool reset);

    /// \brief Returns the current reset flag state.
-    bool get_reset() const {
-        return m_reset;
-    }
+    bool get_reset() const;
+
+    explicit VariableValue(const ov::Tensor& value);
+
+    /// \brief Constructor for VariableValue.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
+    /// \param value Data for Variable.
+    /// \param reset The current state of the reset flag.
+    VariableValue(const ov::Tensor& value, bool reset);

    /// \brief Returns the current stored data.
-    const ngraph::HostTensorPtr& get_value() const {
-        return m_value;
-    }
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
+    const ov::Tensor& get_state() const;

    /// \brief Sets new values for Variable.
+    /// \deprecated This method is deprecated and will be removed in 2024.0 release. Please use method with ov::Tensor
+    /// instead
    /// \param value New data for Variable.
-    void set_value(const ngraph::HostTensorPtr& value) {
-        m_value = value;
-    }
+    void set_state(const ov::Tensor& value);

 private:
    bool m_reset = true;
-    ngraph::HostTensorPtr m_value;
+    ov::Tensor m_value;
 };
 }  // namespace util
 }  // namespace op
--- a/src/core/include/openvino/runtime/tensor.hpp
+++ b/src/core/include/openvino/runtime/tensor.hpp
@ -35,6 +35,12 @@ class IVariableStateInternalWrapper;
 class ITensor;
 class RemoteTensor;

+namespace op {
+namespace util {
+class VariableValue;
+}
+}  // namespace op
+
 /**
 * @brief Tensor API holding host memory
 * It can throw exceptions safely for the application, where it is properly handled.
@ -64,6 +70,7 @@ protected:
    friend class ov::IVariableStateInternalWrapper;
    friend class InferenceEngine::IAsyncInferRequestWrapper;
    friend class InferenceEngine::IVariableStateWrapper;
+    friend class ov::op::util::VariableValue;

 public:
    /// @brief Default constructor
--- a/src/core/src/op/assign.cpp
+++ b/src/core/src/op/assign.cpp
@ -96,6 +96,7 @@ bool op::v6::Assign::evaluate(const HostTensorVector& outputs,

    const auto& variable_values = variable_context.get_variable_values();

+    OPENVINO_SUPPRESS_DEPRECATED_START
    // automatically allocate memory if not provided by user
    if (variable_values.find(m_variable) == variable_values.end()) {
        auto host_tensor =
@ -106,6 +107,7 @@ bool op::v6::Assign::evaluate(const HostTensorVector& outputs,
    const auto var_value = variable_values.find(m_variable)->second;
    var_value->set_reset(false);
    const auto& buffer = var_value->get_value();
+    OPENVINO_SUPPRESS_DEPRECATED_END
    buffer->set_unary(inputs[0]);
    outputs[0]->set_unary(inputs[0]);

--- a/src/core/src/op/read_value.cpp
+++ b/src/core/src/op/read_value.cpp
@ -108,7 +108,9 @@ bool op::v6::ReadValue::evaluate(const HostTensorVector& outputs,
    // initial value (inputs[0]) is not supported, use zeros
    auto zero_const = make_shared<v0::Constant>(inputs[0]->get_element_type(), inputs[0]->get_shape(), 0);
    auto zero_tensor = make_shared<HostTensor>(zero_const);
+    OPENVINO_SUPPRESS_DEPRECATED_START
    const auto& input_tensor = use_context ? var_value->second->get_value() : zero_tensor;
+    OPENVINO_SUPPRESS_DEPRECATED_END
    outputs[0]->set_unary(input_tensor);

    void* input = input_tensor->get_data_ptr();
--- a/src/core/src/op/util/variable_value.cpp
+++ b/src/core/src/op/util/variable_value.cpp
@ -0,0 +1,143 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "openvino/op/util/variable_value.hpp"
+
+#include <memory>
+
+#include "ngraph/node.hpp"
+#include "ngraph/runtime/host_tensor.hpp"
+#include "openvino/core/deprecated.hpp"
+#include "openvino/core/shape.hpp"
+#include "openvino/runtime/allocator.hpp"
+#include "openvino/runtime/itensor.hpp"
+#include "openvino/runtime/tensor.hpp"
+#include "shape_util.hpp"
+
+namespace {
+
+class TensorWrapper : public ngraph::runtime::HostTensor {
+public:
+    TensorWrapper(const ov::Tensor& tensor)
+        : ngraph::runtime::HostTensor(tensor.get_element_type(), tensor.get_shape(), tensor.data()),
+          tensor(tensor) {}
+
+    ov::Tensor tensor;
+};
+
+/**
+ * @brief Tensor what contains HostTensorPtr inside
+ */
+class HostTensorWrapper : public ov::ITensor {
+public:
+    ngraph::HostTensorPtr tensor;
+
+    HostTensorWrapper(const ngraph::HostTensorPtr& tensor) : tensor{tensor}, m_type(tensor->get_element_type()) {
+        const auto& p_shape = tensor->get_partial_shape();
+        if (p_shape.is_static()) {
+            m_shape = p_shape.to_shape();
+        } else {
+            OPENVINO_SUPPRESS_DEPRECATED_START
+            m_shape = ov::util::make_dynamic_shape();
+            OPENVINO_SUPPRESS_DEPRECATED_END
+        }
+        update_strides();
+    }
+
+    const ov::element::Type& get_element_type() const override {
+        return m_type;
+    }
+
+    void set_shape(ov::Shape shape) override {
+        tensor->set_shape(shape);
+        m_shape = shape;
+        update_strides();
+    }
+
+    const ov::Shape& get_shape() const override {
+        return m_shape;
+    }
+
+    const ov::Strides& get_strides() const override {
+        OPENVINO_ASSERT(get_element_type().bitwidth() >= 8,
+                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
+                        get_element_type());
+        return m_strides;
+    }
+
+    size_t get_size() const override {
+        return ov::shape_size(m_shape);
+    }
+
+    size_t get_byte_size() const override {
+        return get_size() * m_type.size();
+    }
+
+    void* data(const ov::element::Type& element_type) const override {
+        return tensor->get_data_ptr();
+    }
+
+private:
+    ov::element::Type m_type;
+    ov::Shape m_shape;
+    ov::Strides m_strides;
+
+    void update_strides() {
+        if (m_type.bitwidth() >= 8) {
+            m_strides.clear();
+            m_strides.resize(m_shape.size());
+            auto size = m_strides.size();
+            for (size_t i = 0; i < size; i++) {
+                size_t value(m_type.size());
+                size_t dim(m_shape[size - 1 - i]);
+                if (i) {
+                    value = m_strides[size - i] * dim;
+                }
+                m_strides[size - i - 1] = value;
+            }
+        }
+    }
+};
+}  // namespace
+
+ov::op::util::VariableValue::VariableValue() = default;
+
+OPENVINO_SUPPRESS_DEPRECATED_START
+ov::op::util::VariableValue::VariableValue(ngraph::HostTensorPtr value)
+    : m_value(ov::Tensor{std::make_shared<HostTensorWrapper>(value), {}}) {}
+
+ov::op::util::VariableValue::VariableValue(ngraph::HostTensorPtr value, bool reset)
+    : m_reset(reset),
+      m_value(ov::Tensor{std::make_shared<HostTensorWrapper>(value), {}}) {}
+
+ngraph::HostTensorPtr ov::op::util::VariableValue::get_value() const {
+    if (auto wrapper = std::dynamic_pointer_cast<HostTensorWrapper>(m_value._impl))
+        return wrapper->tensor;
+    return std::make_shared<TensorWrapper>(m_value);
+}
+
+void ov::op::util::VariableValue::set_value(const ngraph::HostTensorPtr& value) {
+    m_value = ov::Tensor{std::make_shared<HostTensorWrapper>(value), {}};
+}
+OPENVINO_SUPPRESS_DEPRECATED_END
+
+void ov::op::util::VariableValue::set_reset(bool reset) {
+    m_reset = reset;
+}
+
+bool ov::op::util::VariableValue::get_reset() const {
+    return m_reset;
+}
+
+ov::op::util::VariableValue::VariableValue(const ov::Tensor& value) : m_value(value) {}
+
+ov::op::util::VariableValue::VariableValue(const ov::Tensor& value, bool reset) : m_reset(reset), m_value(value) {}
+
+const ov::Tensor& ov::op::util::VariableValue::get_state() const {
+    return m_value;
+}
+
+void ov::op::util::VariableValue::set_state(const ov::Tensor& value) {
+    m_value = value;
+}
--- a/src/core/src/pass/serialize.cpp
+++ b/src/core/src/pass/serialize.cpp
@ -255,11 +255,6 @@ class XmlSerializer : public ov::AttributeVisitor {
            }
        }

-        if (ir_version < 11) {
-            // ops for serialized body function are provided in reversed order
-            std::reverse(output.begin(), output.end());
-        }
-
        return output;
    }

@ -836,7 +831,10 @@ void ngfunction_2_ir(pugi::xml_node& netXml,
    const bool exec_graph = is_exec_graph(model);

    auto sorted_ops = model.get_ordered_ops();
-    if (version >= 11) {
+
+    // get_ordered_ops() returns operations after a topological sort. The topological sort reverses order of Parameters
+    // and Results. So we need to put them into sorted_ops separately to ensure correct order of inputs and outputs.
+    {
        std::vector<std::shared_ptr<ov::Node>> result;
        result.reserve(sorted_ops.size());
        for (const auto& param : model.get_parameters()) {
--- a/src/core/src/runtime/itensor.cpp
+++ b/src/core/src/runtime/itensor.cpp
@ -21,258 +21,4 @@ size_t ITensor::get_byte_size() const {
    return (get_size() * get_element_type().bitwidth() + 8 - 1) / 8;
 }

-/**
- * @brief View tensor to external memory
- * The tensor doesn't own the external memory
- */
-class ViewTensor : public ITensor {
-public:
-    ViewTensor(const element::Type element_type, const Shape& shape, void* ptr)
-        : m_element_type{element_type},
-          m_shape{shape},
-          m_capacity{shape},
-          m_ptr{ptr} {
-        OPENVINO_ASSERT(m_ptr != nullptr);
-        OPENVINO_ASSERT(m_element_type != element::undefined && m_element_type != element::dynamic);
-        update_strides();
-    }
-
-    void* data(const element::Type& element_type) const override {
-        if (element_type != element::undefined && element_type != element::dynamic) {
-            OPENVINO_ASSERT(element_type == get_element_type(),
-                            "Tensor data with element type ",
-                            get_element_type(),
-                            ", is not representable as pointer to ",
-                            element_type);
-        }
-        return m_ptr;
-    }
-
-    const element::Type& get_element_type() const override {
-        return m_element_type;
-    }
-
-    const Shape& get_shape() const override {
-        return m_shape;
-    }
-
-    void set_shape(ov::Shape new_shape) override {
-        OPENVINO_ASSERT(shape_size(new_shape) <= ov::shape_size(m_capacity), "Could set new shape: ", new_shape);
-        m_shape = std::move(new_shape);
-        update_strides();
-    }
-
-    const Strides& get_strides() const override {
-        OPENVINO_ASSERT(m_element_type.bitwidth() >= 8,
-                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
-                        m_element_type);
-        return m_strides;
-    }
-
-protected:
-    void update_strides() {
-        if (m_element_type.bitwidth() < 8)
-            return;
-        auto& shape = get_shape();
-        m_strides.clear();
-        if (!shape.empty()) {
-            m_strides.resize(shape.size());
-            m_strides.back() = m_element_type.size();
-            std::copy(shape.rbegin(), shape.rend() - 1, m_strides.rbegin() + 1);
-            std::partial_sum(m_strides.rbegin(), m_strides.rend(), m_strides.rbegin(), std::multiplies<size_t>());
-        }
-    }
-
-    element::Type m_element_type;
-    Shape m_shape;
-    Shape m_capacity;
-    Strides m_strides;
-    void* m_ptr;
-};
-
-/**
- * @brief View tensor on external memory with strides
- */
-class StridedViewTensor : public ViewTensor {
-public:
-    StridedViewTensor(const element::Type element_type, const Shape& shape, void* ptr, const Strides& strides)
-        : ViewTensor{element_type, shape, ptr} {
-        OPENVINO_ASSERT(
-            get_element_type().bitwidth() >= 8,
-            "Could not create strided access tensor for types with bitwidths less then 8 bit. Tensor type: ",
-            get_element_type());
-        // Save default strides
-        auto shape_strides = m_strides;
-        // Change strides
-        m_strides = strides;
-        OPENVINO_ASSERT(m_shape.size() == m_strides.size());
-
-        for (size_t i = 0; i < m_strides.size(); ++i) {
-            OPENVINO_ASSERT(shape_strides[i] <= m_strides[i],
-                            "shape stride: ",
-                            shape_strides[i],
-                            ", stride: ",
-                            m_strides[i]);
-            OPENVINO_ASSERT((m_strides[i] % get_element_type().size()) == 0,
-                            "shape stride: ",
-                            shape_strides[i],
-                            ", stride: ",
-                            m_strides[i]);
-            if (i) {
-                OPENVINO_ASSERT(m_strides[i - 1] >= m_strides[i] * shape[i],
-                                "Strides: ",
-                                m_strides,
-                                " are incompatible with shapes: ",
-                                m_shape);
-            }
-        }
-    }
-
-    void set_shape(ov::Shape new_shape) override {
-        OPENVINO_ASSERT(m_capacity.size() == new_shape.size(),
-                        "Cannot set new shape: ",
-                        new_shape,
-                        " for tensor with strides! Shapes are not compatible.");
-        for (size_t i = 0; i < new_shape.size(); i++) {
-            OPENVINO_ASSERT(m_capacity[i] >= new_shape[i],
-                            "Cannot set new shape: ",
-                            new_shape,
-                            " for tensor with strides! Dimension: ",
-                            i,
-                            " is not compatible.");
-        }
-        m_shape = std::move(new_shape);
-    }
-};
-
-/**
- * @brief Creates view tensor on external memory
- *
- * @param element_type Tensor element type
- * @param shape Tensor shape
- * @param ptr pointer to external memoty
- * @param byte_strides Tensor strides
- *
- * @return Shared pointer to tensor interface
- */
-std::shared_ptr<ITensor> make_tensor(const element::Type element_type,
-                                     const Shape& shape,
-                                     void* ptr,
-                                     const Strides& byte_strides) {
-    return byte_strides.empty() ? std::make_shared<ViewTensor>(element_type, shape, ptr)
-                                : std::make_shared<StridedViewTensor>(element_type, shape, ptr, byte_strides);
-}
-
-/**
- * @brief Tensor with allocated memory
- * Tensor owns the memory
- */
-class AllocatedTensor : public ViewTensor {
-public:
-    AllocatedTensor(const element::Type element_type, const Shape& shape, const Allocator& allocator)
-        : ViewTensor{element_type,
-                     shape,
-                     [&] {
-                         OPENVINO_ASSERT(allocator, "Allocator was not initialized");
-                         return const_cast<Allocator&>(allocator).allocate(element_type.size() * shape_size(shape));
-                     }()},
-          m_allocator{allocator} {}
-
-    ~AllocatedTensor() {
-        m_allocator.deallocate(m_ptr, get_byte_size());
-    }
-
-    void set_shape(ov::Shape new_shape) override {
-        auto old_byte_size = get_byte_size();
-        m_shape = std::move(new_shape);
-        if (get_byte_size() > old_byte_size) {
-            m_allocator.deallocate(m_ptr, old_byte_size);
-            m_ptr = m_allocator.allocate(get_byte_size());
-        }
-        update_strides();
-    }
-
-private:
-    Allocator m_allocator;
-};
-
-/**
- * @brief Creates allocated tensor
- *
- * @param element_type Tensor element type
- * @param shape Tensor shape
- * @param allocator Tensor allocator
- *
- * @return Shared pointer to tensor interface
- */
-std::shared_ptr<ITensor> make_tensor(const element::Type element_type, const Shape& shape, const Allocator& allocator) {
-    return std::make_shared<AllocatedTensor>(element_type, shape, allocator);
-}
-
-/**
- * @brief ROI tensor on other tensor
- * ROI tensor holds the owner
- */
-class RoiTensor : public ITensor {
-public:
-    RoiTensor(const std::shared_ptr<ITensor>& owner, const Coordinate& begin, const Coordinate& end) : m_owner{owner} {
-        OPENVINO_ASSERT(owner->get_element_type().bitwidth() >= 8,
-                        "ROI Tensor for types with bitwidths less then 8 bit is not implemented. Tensor type: ",
-                        owner->get_element_type());
-        auto owner_shape = owner->get_shape();
-        OPENVINO_ASSERT(owner_shape.size() == begin.size());
-        OPENVINO_ASSERT(begin.size() == end.size());
-        m_shape.resize(begin.size());
-        for (size_t i = 0; i < begin.size(); ++i) {
-            OPENVINO_ASSERT(begin[i] <= owner_shape[i]);
-            OPENVINO_ASSERT(end[i] <= owner_shape[i]);
-            m_shape[i] = end[i] - begin[i];
-            OPENVINO_ASSERT(m_shape[i] <= owner_shape[i]);
-        }
-        auto& strides = get_strides();
-        m_offset = std::inner_product(begin.begin(), begin.end(), strides.begin(), static_cast<size_t>(0));
-    }
-
-    const element::Type& get_element_type() const override {
-        return m_owner->get_element_type();
-    }
-
-    const Strides& get_strides() const override {
-        return m_owner->get_strides();
-    }
-
-    const Shape& get_shape() const override {
-        return m_shape;
-    }
-
-    void set_shape(ov::Shape new_shape) override {
-        OPENVINO_THROW("Shapes cannot be changed for ROI Tensor");
-    }
-
-    void* data(const element::Type& element_type) const override {
-        auto owner_data = m_owner->data(element_type);
-        return static_cast<uint8_t*>(owner_data) + m_offset;
-    }
-
-private:
-    std::shared_ptr<ITensor> m_owner;
-    size_t m_offset;
-    Shape m_shape;
-};
-
-/**
- * @brief Creates ROI tensor
- *
- * @param other Tensor what owns the memory
- * @param begin Begin coordinates
- * @param end End coordinates
- *
- * @return Shared pointer to tensor interface
- */
-std::shared_ptr<ITensor> make_tensor(const std::shared_ptr<ITensor>& other,
-                                     const Coordinate& begin,
-                                     const Coordinate& end) {
-    return std::make_shared<RoiTensor>(other, begin, end);
-}
-
 }  // namespace ov
--- a/src/core/tests/pass/serialization/deterministicity.cpp
+++ b/src/core/tests/pass/serialization/deterministicity.cpp
@ -8,27 +8,29 @@

 #include "common_test_utils/common_utils.hpp"
 #include "common_test_utils/file_utils.hpp"
+#include "openvino/opsets/opset1.hpp"
 #include "openvino/pass/serialize.hpp"
 #include "openvino/util/file_util.hpp"
 #include "read_ir.hpp"
 #include "util/test_common.hpp"

-class SerializationDeterministicityTest : public ov::test::TestsCommon {
+class DeterministicityCommon {
 protected:
-    std::string m_out_xml_path_1;
-    std::string m_out_bin_path_1;
-    std::string m_out_xml_path_2;
-    std::string m_out_bin_path_2;
+    std::string m_out_xml_path_1{};
+    std::string m_out_bin_path_1{};
+    std::string m_out_xml_path_2{};
+    std::string m_out_bin_path_2{};
+    std::string filePrefix{};

-    void SetUp() override {
-        std::string filePrefix = CommonTestUtils::generateTestFilePrefix();
+    void SetupFileNames() {
+        filePrefix = CommonTestUtils::generateTestFilePrefix();
        m_out_xml_path_1 = filePrefix + "1" + ".xml";
        m_out_bin_path_1 = filePrefix + "1" + ".bin";
        m_out_xml_path_2 = filePrefix + "2" + ".xml";
        m_out_bin_path_2 = filePrefix + "2" + ".bin";
    }

-    void TearDown() override {
+    void RemoveFiles() {
        std::remove(m_out_xml_path_1.c_str());
        std::remove(m_out_xml_path_2.c_str());
        std::remove(m_out_bin_path_1.c_str());
@ -55,6 +57,17 @@ protected:
    }
 };

+class SerializationDeterministicityTest : public ov::test::TestsCommon, public DeterministicityCommon {
+protected:
+    void SetUp() override {
+        SetupFileNames();
+    }
+
+    void TearDown() override {
+        RemoveFiles();
+    }
+};
+
 #ifdef ENABLE_OV_ONNX_FRONTEND

 TEST_F(SerializationDeterministicityTest, BasicModel) {
@ -130,3 +143,158 @@ TEST_F(SerializationDeterministicityTest, ModelWithConstants) {
    ASSERT_TRUE(files_equal(xml_1, xml_2));
    ASSERT_TRUE(files_equal(bin_1, bin_2));
 }
+
+class SerializationDeterministicityInputOutputTest : public testing::TestWithParam<ov::pass::Serialize::Version>,
+                                                     public DeterministicityCommon {
+protected:
+    std::string input0Name{"input0"};
+    std::string input1Name{"input1"};
+    std::string output0Name{"output0"};
+    std::string output1Name{"output1"};
+
+    std::string xmlFileName{};
+
+    void SetupFileNames() {
+        DeterministicityCommon::SetupFileNames();
+        xmlFileName = filePrefix + "_TestModel.xml";
+    }
+
+    void RemoveFiles() {
+        DeterministicityCommon::RemoveFiles();
+        std::remove(xmlFileName.c_str());
+    }
+
+    void SetUp() override {
+        SetupFileNames();
+    }
+
+    void TearDown() override {
+        RemoveFiles();
+    }
+};
+
+TEST_P(SerializationDeterministicityInputOutputTest, FromOvModel) {
+    auto irVersion = GetParam();
+
+    std::shared_ptr<ov::Model> modelRef;
+    {
+        auto parameter0 = std::make_shared<ov::opset1::Parameter>(ov::element::f32, ov::Shape{1, 3, 22, 22});
+        parameter0->set_friendly_name("input0");
+        auto result0 = std::make_shared<ov::opset1::Result>(parameter0);
+        result0->set_friendly_name("output0");
+        auto parameter1 = std::make_shared<ov::opset1::Parameter>(ov::element::f32, ov::Shape{1, 3, 22, 22});
+        parameter1->set_friendly_name("input1");
+        auto result1 = std::make_shared<ov::opset1::Result>(parameter1);
+        result1->set_friendly_name("output1");
+        modelRef =
+            std::make_shared<ov::Model>(ov::NodeVector{result0, result1}, ov::ParameterVector{parameter0, parameter1});
+    }
+
+    auto& expected1 = modelRef;
+    ov::pass::Serialize(m_out_xml_path_1, m_out_bin_path_1, irVersion).run_on_model(modelRef);
+    auto expected2 = ov::test::readModel(m_out_xml_path_1, m_out_bin_path_1);
+    ov::pass::Serialize(m_out_xml_path_2, m_out_bin_path_2, irVersion).run_on_model(expected2);
+
+    EXPECT_EQ(input0Name, expected1->input(0).get_node()->get_friendly_name());
+    EXPECT_EQ(input1Name, expected1->input(1).get_node()->get_friendly_name());
+    EXPECT_EQ(output0Name, expected1->output(0).get_node()->get_friendly_name());
+    EXPECT_EQ(output1Name, expected1->output(1).get_node()->get_friendly_name());
+    EXPECT_EQ(input0Name, expected2->input(0).get_node()->get_friendly_name());
+    EXPECT_EQ(input1Name, expected2->input(1).get_node()->get_friendly_name());
+    EXPECT_EQ(output0Name, expected2->output(0).get_node()->get_friendly_name());
+    EXPECT_EQ(output1Name, expected2->output(1).get_node()->get_friendly_name());
+
+    std::ifstream xml_1(m_out_xml_path_1, std::ios::in | std::ios::binary);
+    std::ifstream xml_2(m_out_xml_path_2, std::ios::in | std::ios::binary);
+    EXPECT_TRUE(files_equal(xml_1, xml_2));
+}
+
+TEST_P(SerializationDeterministicityInputOutputTest, FromIrModel) {
+    auto irVersion = GetParam();
+
+    std::string irModel_1stPart = R"V0G0N(<?xml version="1.0"?>
+<net name="Model0" version=")V0G0N";
+    std::string irModel_2ndPart = R"V0G0N(">
+	<layers>
+		<layer id="0" name="input0" type="Parameter" version="opset1">
+			<data shape="1,3,22,22" element_type="f32" />
+			<output>
+				<port id="0" precision="FP32">
+					<dim>1</dim>
+					<dim>3</dim>
+					<dim>22</dim>
+					<dim>22</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="1" name="input1" type="Parameter" version="opset1">
+			<data shape="1,3,22,22" element_type="f32" />
+			<output>
+				<port id="0" precision="FP32">
+					<dim>1</dim>
+					<dim>3</dim>
+					<dim>22</dim>
+					<dim>22</dim>
+				</port>
+			</output>
+		</layer>
+		<layer id="2" name="output0" type="Result" version="opset1">
+			<input>
+				<port id="0" precision="FP32">
+					<dim>1</dim>
+					<dim>3</dim>
+					<dim>22</dim>
+					<dim>22</dim>
+				</port>
+			</input>
+		</layer>
+		<layer id="3" name="output1" type="Result" version="opset1">
+			<input>
+				<port id="0" precision="FP32">
+					<dim>1</dim>
+					<dim>3</dim>
+					<dim>22</dim>
+					<dim>22</dim>
+				</port>
+			</input>
+		</layer>
+	</layers>
+	<edges>
+		<edge from-layer="0" from-port="0" to-layer="2" to-port="0" />
+		<edge from-layer="1" from-port="0" to-layer="3" to-port="0" />
+	</edges>
+	<rt_info />
+</net>
+)V0G0N";
+    std::string strVersion = irVersion == ov::pass::Serialize::Version::IR_V11 ? "11" : "10";
+    std::string irModel = irModel_1stPart + strVersion + irModel_2ndPart;
+
+    {
+        std::ofstream xmlFile;
+        xmlFile.open(xmlFileName);
+        xmlFile << irModel;
+        xmlFile.close();
+    }
+
+    auto expected1 = ov::test::readModel(xmlFileName, "");
+    ov::pass::Serialize(m_out_xml_path_1, "", irVersion).run_on_model(expected1);
+    auto expected2 = ov::test::readModel(m_out_xml_path_1, "");
+    ov::pass::Serialize(m_out_xml_path_2, "", irVersion).run_on_model(expected2);
+
+    EXPECT_EQ(input0Name, expected1->input(0).get_node()->get_friendly_name());
+    EXPECT_EQ(input1Name, expected1->input(1).get_node()->get_friendly_name());
+    EXPECT_EQ(output0Name, expected1->output(0).get_node()->get_friendly_name());
+    EXPECT_EQ(output1Name, expected1->output(1).get_node()->get_friendly_name());
+    EXPECT_EQ(input0Name, expected2->input(0).get_node()->get_friendly_name());
+    EXPECT_EQ(input1Name, expected2->input(1).get_node()->get_friendly_name());
+    EXPECT_EQ(output0Name, expected2->output(0).get_node()->get_friendly_name());
+    EXPECT_EQ(output1Name, expected2->output(1).get_node()->get_friendly_name());
+
+    std::ifstream xml_1(m_out_xml_path_1, std::ios::in | std::ios::binary);
+    std::ifstream xml_2(m_out_xml_path_2, std::ios::in | std::ios::binary);
+    EXPECT_TRUE(files_equal(xml_2, xml_1));
+}
+
+INSTANTIATE_TEST_CASE_P(DeterministicityInputOutput,
+                        SerializationDeterministicityInputOutputTest,
+                        ::testing::Values(ov::pass::Serialize::Version::IR_V10, ov::pass::Serialize::Version::IR_V11));
--- a/src/core/tests/visitors/op/interpolate.cpp
+++ b/src/core/tests/visitors/op/interpolate.cpp
@ -81,7 +81,8 @@ TEST(attributes, interpolate_op4) {
 TEST(attributes, interpolate_op11) {
    NodeBuilder::get_ops().register_factory<opset11::Interpolate>();
    const auto img = make_shared<op::Parameter>(element::f32, Shape{1, 3, 32, 32});
-    const auto scales = op::v0::Constant::create(element::f32, {1}, {1.0});
+    const auto scales = op::v0::Constant::create(element::f32, {2}, {2.0, 2.0});
+    const auto axes = op::v0::Constant::create(element::i32, {2}, {2, 3});

    op::v11::Interpolate::InterpolateAttrs attrs;
    attrs.mode = op::v11::Interpolate::InterpolateMode::BILINEAR_PILLOW;
@ -93,7 +94,7 @@ TEST(attributes, interpolate_op11) {
    attrs.antialias = true;
    attrs.cube_coeff = -0.75;

-    auto interpolate = make_shared<opset11::Interpolate>(img, scales, attrs);
+    auto interpolate = make_shared<opset11::Interpolate>(img, scales, axes, attrs);
    NodeBuilder builder(interpolate, {img, scales});
    auto g_interpolate = ov::as_type_ptr<opset11::Interpolate>(builder.create());

--- a/src/frontends/onnx/tests/runtime/ie/unit_test.manifest
+++ b/src/frontends/onnx/tests/runtime/ie/unit_test.manifest
@ -430,3 +430,6 @@ IE_CPU.onnx_bool_init_and
 IE_CPU.onnx_model_top_k_repeating_1D
 IE_CPU.onnx_model_top_k_repeating
 IE_CPU.onnx_model_top_k_repeating_unsorted
+
+# Accuracy regression - Ticket 105909
+IE_CPU.onnx_model_attention_qkv_hidden_sizes
--- a/src/frontends/paddle/tests/CMakeLists.txt
+++ b/src/frontends/paddle/tests/CMakeLists.txt
@ -25,10 +25,21 @@ ov_add_test_target(
 )

 # Test model generating
-ov_check_pip_packages(REQUIREMENTS_FILE "${CMAKE_CURRENT_SOURCE_DIR}/requirements.txt"
-                      MESSAGE_MODE WARNING
-                      WARNING_MESSAGE "PaddlePaddle frontend unit tests will be skipped"
-                      RESULT_VAR paddlepaddle_FOUND)
+set(PADDLE_REQ "${CMAKE_CURRENT_SOURCE_DIR}/requirements.txt")
+if(PYTHONINTERP_FOUND)
+    execute_process(
+        COMMAND ${PYTHON_EXECUTABLE} "${CMAKE_CURRENT_SOURCE_DIR}/paddle_pip_check.py" ${PADDLE_REQ}
+        RESULT_VARIABLE EXIT_CODE
+        OUTPUT_VARIABLE OUTPUT_TEXT
+        ERROR_VARIABLE ERROR_TEXT)
+endif()
+
+if(NOT EXIT_CODE EQUAL 0)
+    set(paddlepaddle_FOUND OFF)
+    message(WARNING "Python requirement file ${PADDLE_REQ} is not installed, PaddlePaddle frontend unit tests will be skipped")
+else()
+    set(paddlepaddle_FOUND ON)
+endif()

 set(TEST_PADDLE_MODELS_DIRNAME test_model_zoo/paddle_test_models)
 target_compile_definitions(${TARGET_NAME} PRIVATE -D TEST_PADDLE_MODELS_DIRNAME=\"${TEST_PADDLE_MODELS_DIRNAME}/\")
--- a/src/frontends/paddle/tests/paddle_pip_check.py
+++ b/src/frontends/paddle/tests/paddle_pip_check.py
@ -0,0 +1,20 @@
+import pkg_resources
+import re
+import sys
+
+req_file=sys.argv[1]
+
+try:
+    pkg_resources.require(open(req_file, mode='r'))
+except Exception as inst:
+    pattern = re.compile(r"protobuf .*, Requirement.parse\('protobuf<=3\.20\.0,>=3\.1\.0'\), {'paddlepaddle'}")
+    result = pattern.findall(str(inst))
+    if len(result) == 0:
+        raise inst
+    else:
+        env = pkg_resources.Environment()
+        env['protobuf'].clear()
+        env.add(pkg_resources.DistInfoDistribution(project_name="protobuf", version="3.20.0"))
+        ws = pkg_resources.working_set
+        reqs = pkg_resources.parse_requirements(open(req_file, mode='r'))
+        dists = ws.resolve(reqs, env, replace_conflicting=True)
--- a/src/frontends/pytorch/src/op/floor_divide.cpp
+++ b/src/frontends/pytorch/src/op/floor_divide.cpp
@ -18,6 +18,7 @@ OutputVector translate_floor_divide(NodeContext& context) {
    num_inputs_check(context, 2, 2);
    auto x = context.get_input(0);
    auto y = context.get_input(1);
+    align_eltwise_input_types(context, x, y, true);
    auto div = context.mark_node(std::make_shared<v1::Divide>(x, y, true));
    return {context.mark_node(std::make_shared<v0::Floor>(div))};
 };
--- a/src/frontends/pytorch/src/op/floordiv.cpp
+++ b/src/frontends/pytorch/src/op/floordiv.cpp
@ -21,4 +21,4 @@ OutputVector translate_floordiv(NodeContext& context) {
 }  // namespace op
 }  // namespace pytorch
 }  // namespace frontend
-}  // namespace ov
+}  // namespace ov
--- a/src/frontends/pytorch/src/op/full.cpp
+++ b/src/frontends/pytorch/src/op/full.cpp
@ -65,7 +65,7 @@ OutputVector translate_full_like(NodeContext& context) {
    auto input = context.get_input(0);
    auto value = context.get_input(1);
    auto sizes = context.mark_node(std::make_shared<v3::ShapeOf>(input, element::i32));
-    if (context.get_input_size() == 7) {
+    if (context.get_input_size() == 7 && !context.input_is_none(2)) {
        return {base_translate_full_with_convert(context, sizes, value, 2)};
    }
    auto out = context.input_is_none(3) ? input : context.get_input(3);
@ -113,7 +113,7 @@ OutputVector translate_zeros_like(NodeContext& context) {
    auto input = context.get_input(0);
    auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {0}));
    auto sizes = context.mark_node(std::make_shared<v3::ShapeOf>(input, element::i32));
-    if (context.get_input_size() == 6) {
+    if (context.get_input_size() == 6 && !context.input_is_none(1)) {
        return {base_translate_full_with_convert(context, sizes, value, 1)};
    }
    auto out = context.input_is_none(2) ? input : context.get_input(2);
@ -153,7 +153,7 @@ OutputVector translate_ones_like(NodeContext& context) {
    auto input = context.get_input(0);
    auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {1}));
    auto sizes = context.mark_node(std::make_shared<v3::ShapeOf>(input, element::i32));
-    if (context.get_input_size() == 6) {
+    if (context.get_input_size() == 6 && !context.input_is_none(1)) {
        return {base_translate_full_with_convert(context, sizes, value, 1)};
    }
    auto out = context.input_is_none(2) ? input : context.get_input(2);
@ -172,7 +172,7 @@ OutputVector translate_new_ones(NodeContext& context) {
 };

 OutputVector translate_empty(NodeContext& context) {
-    num_inputs_check(context, 1, 2);
+    num_inputs_check(context, 1, 5);
    auto sizes = context.get_input(0);
    // In OV uninitialised data is not supported, so we create a tensor filled with zeros with a given shape and type.
    auto value = context.mark_node(v0::Constant::create(element::f32, Shape{}, {0}));
@ -185,8 +185,7 @@ OutputVector translate_empty(NodeContext& context) {
    }
    return {empty};
 };
-
 }  // namespace op
 }  // namespace pytorch
 }  // namespace frontend
-}  // namespace ov
+}  // namespace ov
--- a/src/frontends/pytorch/src/op_table.cpp
+++ b/src/frontends/pytorch/src/op_table.cpp
@ -254,6 +254,7 @@ const std::map<std::string, PytorchCreatorFunction> get_supported_ops() {
        {"aten::narrow", op::translate_narrow},
        {"aten::ne", op::translate_1to1_match_2_inputs_align_types<opset10::NotEqual>},
        {"aten::neg", op::translate_neg},
+        {"aten::new_empty", op::translate_new_zeros},
        {"aten::new_full", op::translate_new_full},
        {"aten::new_ones", op::translate_new_ones},
        {"aten::new_zeros", op::translate_new_zeros},
--- a/src/frontends/tensorflow_common/src/op/matrix_diag.cpp
+++ b/src/frontends/tensorflow_common/src/op/matrix_diag.cpp
@ -16,6 +16,7 @@ namespace tensorflow {
 namespace op {

 OutputVector translate_matrix_diag_op(const NodeContext& node) {
+    default_op_checks(node, 1, {"MatrixDiag", "MATRIX_DIAG"});
    // The translation of MatrixDiag to OpenVINO opset relies on padding of input tensor with zeros,
    // reshape to a special form and cutting of unneeded padding part.
    // Here is a basic idea described by an example,
@ -27,7 +28,6 @@ OutputVector translate_matrix_diag_op(const NodeContext& node) {
    // Reshape to tensor of a shape [12] equal to [1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0]
    // Cut off last 3 elements and get [1, 0, 0, 0, 2, 0, 0, 0, 3] and reshape to [3, 3]
    // This idea is generalized to higher rank tensors
-    TENSORFLOW_OP_VALIDATION(node, node.get_input_size() > 0, "MatrixDiag must have at least one input.");
    // diagonal is the single input to MatrixDiag operation and has a shape [I, J, ..., M, N]
    auto diagonal = node.get_input(0);
    auto diagonal_type = diagonal.get_element_type();
--- a/src/inference/CMakeLists.txt
+++ b/src/inference/CMakeLists.txt
@ -100,6 +100,7 @@ add_library(${TARGET_NAME}_plugin_api INTERFACE)

 target_include_directories(${TARGET_NAME}_plugin_api INTERFACE
    $<TARGET_PROPERTY:openvino_gapi_preproc,INTERFACE_INCLUDE_DIRECTORIES>
+    $<TARGET_PROPERTY:openvino::core::dev,INTERFACE_INCLUDE_DIRECTORIES>
    $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/dev_api>
    $<BUILD_INTERFACE:${PUBLIC_HEADERS_DIR}>
    $<BUILD_INTERFACE:${PUBLIC_HEADERS_DIR}/ie>)
--- a/src/inference/dev_api/openvino/runtime/icompiled_model.hpp
+++ b/src/inference/dev_api/openvino/runtime/icompiled_model.hpp
@ -38,7 +38,7 @@ class IAsyncInferRequest;
 class OPENVINO_RUNTIME_API ICompiledModel : public std::enable_shared_from_this<ICompiledModel> {
 public:
    /**
-     * @brief Main constructor for ICompiledModel interface
+     * @brief Constructor for ICompiledModel interface
     *
     * @param model OpenVINO model representation
     *
@ -56,6 +56,28 @@ public:
        const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor =
            std::make_shared<ov::threading::CPUStreamsExecutor>(ov::threading::IStreamsExecutor::Config{"Callback"}));

+    /**
+     * @brief Constructor for ICompiledModel interface with remote context
+     *
+     * @param model OpenVINO model representation
+     *
+     * @param plugin Pointer to plugin
+     *
+     * @param context Remote context
+     *
+     * @param task_executor Task executor (CPUStreamsExecutor by default)
+     *
+     * @param callback_executor Callback executor (CPUStreamsExecutor by default)
+     */
+    ICompiledModel(
+        const std::shared_ptr<const ov::Model>& model,
+        const std::shared_ptr<const ov::IPlugin>& plugin,
+        const ov::RemoteContext& context,
+        const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor =
+            std::make_shared<ov::threading::CPUStreamsExecutor>(ov::threading::IStreamsExecutor::Config{"Default"}),
+        const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor =
+            std::make_shared<ov::threading::CPUStreamsExecutor>(ov::threading::IStreamsExecutor::Config{"Callback"}));
+
    /**
     * @brief Gets all outputs from compiled model
     *
@ -112,12 +134,13 @@ public:
     *
     * @return OpenVINO RemoteContext
     */
-    virtual ov::RemoteContext get_context() const = 0;
+    std::shared_ptr<ov::IRemoteContext> get_context() const;

 private:
    std::shared_ptr<const ov::IPlugin> m_plugin;
    std::vector<ov::Output<const ov::Node>> m_inputs;
    std::vector<ov::Output<const ov::Node>> m_outputs;
+    ov::RemoteContext m_context;

    std::shared_ptr<ov::threading::ITaskExecutor> m_task_executor = nullptr;      //!< Holds a task executor
    std::shared_ptr<ov::threading::ITaskExecutor> m_callback_executor = nullptr;  //!< Holds a callback executor
--- a/src/inference/dev_api/openvino/runtime/iplugin.hpp
+++ b/src/inference/dev_api/openvino/runtime/iplugin.hpp
@ -18,7 +18,7 @@
 #include "openvino/runtime/common.hpp"
 #include "openvino/runtime/icompiled_model.hpp"
 #include "openvino/runtime/icore.hpp"
-#include "openvino/runtime/remote_context.hpp"
+#include "openvino/runtime/iremote_context.hpp"
 #include "openvino/runtime/threading/executor_manager.hpp"

 namespace InferenceEngine {
@ -153,7 +153,7 @@ public:
     *
     * @return A remote context object
     */
-    virtual ov::RemoteContext create_context(const ov::AnyMap& remote_properties) const = 0;
+    virtual std::shared_ptr<ov::IRemoteContext> create_context(const ov::AnyMap& remote_properties) const = 0;

    /**
     * @brief Provides a default remote context instance if supported by a plugin
@ -161,7 +161,7 @@ public:
     *
     * @return The default context.
     */
-    virtual ov::RemoteContext get_default_context(const ov::AnyMap& remote_properties) const = 0;
+    virtual std::shared_ptr<ov::IRemoteContext> get_default_context(const ov::AnyMap& remote_properties) const = 0;

    /**
     * @brief Creates an compiled model from an previously exported model using plugin implementation
--- a/src/inference/dev_api/openvino/runtime/iremote_context.hpp
+++ b/src/inference/dev_api/openvino/runtime/iremote_context.hpp
@ -0,0 +1,66 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+/**
+ * @brief OpenVINO Runtime Remote Context interface
+ * @file openvino/runtime/iremote_context.hpp
+ */
+
+#pragma once
+
+#include <memory>
+
+#include "openvino/core/any.hpp"
+#include "openvino/core/shape.hpp"
+#include "openvino/core/type/element_type.hpp"
+#include "openvino/runtime/common.hpp"
+#include "openvino/runtime/iremote_tensor.hpp"
+
+namespace ov {
+
+class OPENVINO_RUNTIME_API IRemoteContext : public std::enable_shared_from_this<IRemoteContext> {
+public:
+    /**
+     * @brief Returns name of a device on which underlying object is allocated.
+     * Abstract method.
+     * @return A device name string in fully specified format `<device_name>[.<device_id>[.<tile_id>]]` (e.g. GPU.0.1).
+     */
+    virtual const std::string& get_device_name() const = 0;
+
+    /**
+     * @brief Returns a map of device-specific parameters required for low-level
+     * operations with underlying object.
+     * Parameters include device/context handles, access flags,
+     * etc. Contents of the map returned depend on remote execution context that is
+     * currently set on the device (working scenario).
+     * Abstract method.
+     * @return A map of name/Any elements.
+     */
+    virtual const ov::AnyMap& get_property() const = 0;
+
+    /**
+     * @brief Allocates memory tensor in device memory or wraps user-supplied memory handle
+     * using the specified tensor description and low-level device-specific parameters.
+     * Returns a pointer to the object that implements the RemoteTensor interface.
+     * @param type Defines the element type of the tensor.
+     * @param shape Defines the shape of the tensor.
+     * @param params Map of the low-level tensor object parameters.
+     * @return Pointer to a plugin object that implements the RemoteTensor interface.
+     */
+    virtual std::shared_ptr<ov::IRemoteTensor> create_tensor(const ov::element::Type& type,
+                                                             const ov::Shape& shape,
+                                                             const ov::AnyMap& params = {}) = 0;
+
+    /**
+     * @brief This method is used to create a host tensor object friendly for the device in current context.
+     * For example, GPU context may allocate USM host memory (if corresponding extension is available),
+     * which could be more efficient than regular host memory.
+     * @param type Tensor element type.
+     * @param shape Tensor shape.
+     * @return A tensor instance with device friendly memory.
+     */
+    virtual std::shared_ptr<ov::ITensor> create_host_tensor(const ov::element::Type type, const ov::Shape& shape);
+};
+
+}  // namespace ov
--- a/src/inference/include/openvino/runtime/auto/properties.hpp
+++ b/src/inference/include/openvino/runtime/auto/properties.hpp
@ -19,9 +19,14 @@ namespace intel_auto {
 static constexpr Property<bool> device_bind_buffer{"DEVICE_BIND_BUFFER"};

 /**
- * @brief auto/multi device setting that enable/disable CPU as acceleration (or helper device) at the beginning
+ * @brief auto device setting that enable/disable CPU as acceleration (or helper device) at the beginning
 */
 static constexpr Property<bool> enable_startup_fallback{"ENABLE_STARTUP_FALLBACK"};

+/**
+ * @brief auto device setting that enable/disable runtime fallback to other devices when infer fails on current
+ * selected device
+ */
+static constexpr Property<bool> enable_runtime_fallback{"ENABLE_RUNTIME_FALLBACK"};
 }  // namespace intel_auto
 }  // namespace ov
--- a/src/inference/include/openvino/runtime/intel_cpu/properties.hpp
+++ b/src/inference/include/openvino/runtime/intel_cpu/properties.hpp
@ -47,7 +47,21 @@ namespace intel_cpu {
 */
 static constexpr Property<bool> denormals_optimization{"CPU_DENORMALS_OPTIMIZATION"};

-static constexpr Property<float> sparse_weights_decompression_rate{"SPARSE_WEIGHTS_DECOMPRESSION_RATE"};
+/**
+ * @brief This property defines threshold for sparse weights decompression feature activation
+ * @ingroup ov_runtime_cpu_prop_cpp_api
+ *
+ * Sparse weights decompression feature allows to pack weights for Matrix Multiplication operations directly in the CPU
+ * plugin at the model compilation stage and store non-zero values in a special packed format. Then, during the
+ * execution of the model, the weights are unpacked and used in the computational kernel. Since the weights are loaded
+ * from DDR/L3 cache in the packed format this significantly decreases memory consumption and as a consequence improve
+ * inference performance. The following code allows to set the sparse rate value.
+ *
+ * @code
+ * core.set_property(ov::intel_cpu::sparse_weights_decompression_rate(0.8));
+ * @endcode
+ */
+static constexpr Property<float> sparse_weights_decompression_rate{"CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE"};

 }  // namespace intel_cpu
 }  // namespace ov
--- a/src/inference/include/openvino/runtime/remote_context.hpp
+++ b/src/inference/include/openvino/runtime/remote_context.hpp
@ -19,7 +19,6 @@
 #include "openvino/runtime/remote_tensor.hpp"

 namespace InferenceEngine {
-class RemoteContext;
 class IPluginWrapper;
 class ICompiledModelWrapper;
 class Core;
@ -31,9 +30,11 @@ class Core;
 class CoreImpl;
 class Plugin;
 class IPlugin;
+class IRemoteContext;
 class ISyncInferRequest;
 class IInferencePluginWrapper;
 class IExecutableNetworkWrapper;
+class ICompiledModel;
 class CompiledModel;

 /**
@ -45,7 +46,7 @@ class CompiledModel;
 */
 class OPENVINO_RUNTIME_API RemoteContext {
 protected:
-    std::shared_ptr<InferenceEngine::RemoteContext> _impl;  //!< Pointer to the remote context implementation.
+    std::shared_ptr<IRemoteContext> _impl;   //!< Pointer to the remote context implementation.
    std::vector<std::shared_ptr<void>> _so;  //!< Reference to the shared object that loaded implementation.

    /**
@ -54,8 +55,7 @@ protected:
     * @param so Plugin to use. This is required to ensure that RemoteContext can work properly even if a plugin
     * object is destroyed.
     */
-    RemoteContext(const std::shared_ptr<InferenceEngine::RemoteContext>& impl,
-                  const std::vector<std::shared_ptr<void>>& so);
+    RemoteContext(const std::shared_ptr<IRemoteContext>& impl, const std::vector<std::shared_ptr<void>>& so);
    friend class InferenceEngine::Core;
    friend class InferenceEngine::IPluginWrapper;
    friend class InferenceEngine::ICompiledModelWrapper;
@ -66,6 +66,7 @@ protected:
    friend class ov::ISyncInferRequest;
    friend class ov::IInferencePluginWrapper;
    friend class ov::IExecutableNetworkWrapper;
+    friend class ov::ICompiledModel;
    friend class ov::CompiledModel;

 public:
--- a/src/inference/src/compiled_model.cpp
+++ b/src/inference/src/compiled_model.cpp
@ -121,9 +121,7 @@ Any CompiledModel::get_property(const std::string& name) const {
 RemoteContext CompiledModel::get_context() const {
    OV_COMPILED_MODEL_CALL_STATEMENT({
        auto ctx = _impl->get_context();
-        auto so_vec = ctx._so;
-        so_vec.emplace_back(_so);
-        return {ctx._impl, so_vec};
+        return {ctx, {_so}};
    });
 }

--- a/src/inference/src/cpp/ie_remote_context.cpp
+++ b/src/inference/src/cpp/ie_remote_context.cpp
@ -11,6 +11,7 @@
 #include "ie_ngraph_utils.hpp"
 #include "ie_remote_blob.hpp"
 #include "openvino/core/except.hpp"
+#include "openvino/runtime/iremote_context.hpp"
 #include "openvino/runtime/itensor.hpp"
 #include "openvino/runtime/remote_context.hpp"

@ -27,12 +28,12 @@

 namespace ov {

-void RemoteContext::type_check(const RemoteContext& tensor,
+void RemoteContext::type_check(const RemoteContext& context,
                               const std::map<std::string, std::vector<std::string>>& type_info) {
-    auto remote_impl = dynamic_cast<const ie::RemoteContext*>(tensor._impl.get());
+    auto remote_impl = context._impl;
    OPENVINO_ASSERT(remote_impl != nullptr, "Context was not initialized using remote implementation");
    if (!type_info.empty()) {
-        auto params = remote_impl->getParams();
+        auto params = remote_impl->get_property();
        for (auto&& type_info_value : type_info) {
            auto it_param = params.find(type_info_value.first);
            OPENVINO_ASSERT(it_param != params.end(), "Parameter with key ", type_info_value.first, " not found");
@ -53,43 +54,33 @@ RemoteContext::~RemoteContext() {
    _impl = {};
 }

-RemoteContext::RemoteContext(const ie::RemoteContext::Ptr& impl, const std::vector<std::shared_ptr<void>>& so)
+RemoteContext::RemoteContext(const std::shared_ptr<ov::IRemoteContext>& impl,
+                             const std::vector<std::shared_ptr<void>>& so)
    : _impl{impl},
      _so{so} {
    OPENVINO_ASSERT(_impl != nullptr, "RemoteContext was not initialized.");
 }

 std::string RemoteContext::get_device_name() const {
-    OV_REMOTE_CONTEXT_STATEMENT(return _impl->getDeviceName());
+    OV_REMOTE_CONTEXT_STATEMENT(return _impl->get_device_name());
 }

 RemoteTensor RemoteContext::create_tensor(const element::Type& type, const Shape& shape, const AnyMap& params) {
    OV_REMOTE_CONTEXT_STATEMENT({
-        auto blob = _impl->CreateBlob(
-            {ie::details::convertPrecision(type), shape, ie::TensorDesc::getLayoutByRank(shape.size())},
-            params);
-        blob->allocate();
-        return {ov::make_tensor(blob), {_so}};
+        auto tensor = _impl->create_tensor(type, shape, params);
+        return {tensor, {_so}};
    });
 }

 Tensor RemoteContext::create_host_tensor(const element::Type element_type, const Shape& shape) {
    OV_REMOTE_CONTEXT_STATEMENT({
-        auto blob = _impl->CreateHostBlob(
-            {ie::details::convertPrecision(element_type), shape, ie::TensorDesc::getLayoutByRank(shape.size())});
-        blob->allocate();
-        return {ov::make_tensor(blob), {_so}};
+        auto tensor = _impl->create_host_tensor(element_type, shape);
+        return {tensor, {_so}};
    });
 }

 AnyMap RemoteContext::get_params() const {
-    AnyMap paramMap;
-    OV_REMOTE_CONTEXT_STATEMENT({
-        for (auto&& param : _impl->getParams()) {
-            paramMap.emplace(param.first, Any{param.second, _so});
-        }
-    });
-    return paramMap;
+    OV_REMOTE_CONTEXT_STATEMENT(return _impl->get_property());
 }

 }  // namespace ov
--- a/src/inference/src/dev/converter_utils.cpp
+++ b/src/inference/src/dev/converter_utils.cpp
@ -250,7 +250,7 @@ public:
        return ov::legacy_convert::convert_compiled_model(
            m_plugin->compile_model(ov::legacy_convert::convert_model(network, m_plugin->is_new_api()),
                                    ov::any_copy(config),
-                                    ov::RemoteContext{context, {}}));
+                                    ov::RemoteContext{ov::legacy_convert::convert_remote_context(context), {}}));
    }

    ov::SoPtr<InferenceEngine::IExecutableNetworkInternal> LoadNetwork(
@ -286,12 +286,12 @@ public:
    }

    std::shared_ptr<InferenceEngine::RemoteContext> CreateContext(const InferenceEngine::ParamMap& params) override {
-        return m_plugin->create_context(params)._impl;
+        return ov::legacy_convert::convert_remote_context(m_plugin->create_context(params));
    }

    std::shared_ptr<InferenceEngine::RemoteContext> GetDefaultContext(
        const InferenceEngine::ParamMap& params) override {
-        return m_plugin->get_default_context(params)._impl;
+        return ov::legacy_convert::convert_remote_context(m_plugin->get_default_context(params));
    }

    std::shared_ptr<InferenceEngine::IExecutableNetworkInternal> ImportNetwork(
@ -312,7 +312,9 @@ public:
        const std::shared_ptr<InferenceEngine::RemoteContext>& context,
        const std::map<std::string, std::string>& config) override {
        return ov::legacy_convert::convert_compiled_model(
-            m_plugin->import_model(networkModel, ov::RemoteContext{context, {}}, ov::any_copy(config)));
+            m_plugin->import_model(networkModel,
+                                   ov::RemoteContext{ov::legacy_convert::convert_remote_context(context), {}},
+                                   ov::any_copy(config)));
    }

    void SetCore(std::weak_ptr<InferenceEngine::ICore> core) override {
@ -415,7 +417,7 @@ public:
    }

    std::shared_ptr<InferenceEngine::RemoteContext> GetContext() const override {
-        return m_model->get_context()._impl;
+        return ov::legacy_convert::convert_remote_context(m_model->get_context());
    }

    std::shared_ptr<ov::ICompiledModel> get_compiled_model() {
@ -789,3 +791,102 @@ std::shared_ptr<::ov::IAsyncInferRequest> ov::legacy_convert::convert_infer_requ
    }
    return std::make_shared<InferenceEngine::IAsyncInferRequestWrapper>(request);
 }
+
+namespace ov {
+
+class RemoteContextWrapper : public InferenceEngine::RemoteContext {
+private:
+    std::shared_ptr<ov::IRemoteContext> m_context;
+
+public:
+    RemoteContextWrapper(const std::shared_ptr<ov::IRemoteContext>& context) : m_context(context) {}
+
+    const std::shared_ptr<ov::IRemoteContext>& get_context() {
+        return m_context;
+    }
+
+    std::string getDeviceName() const noexcept override {
+        return m_context->get_device_name();
+    }
+
+    InferenceEngine::RemoteBlob::Ptr CreateBlob(const InferenceEngine::TensorDesc& tensorDesc,
+                                                const InferenceEngine::ParamMap& params = {}) override {
+        return std::dynamic_pointer_cast<InferenceEngine::RemoteBlob>(ov::tensor_to_blob(
+            m_context->create_tensor(InferenceEngine::details::convertPrecision(tensorDesc.getPrecision()),
+                                     tensorDesc.getBlockingDesc().getBlockDims(),
+                                     params)));
+    }
+
+    InferenceEngine::MemoryBlob::Ptr CreateHostBlob(const InferenceEngine::TensorDesc& tensorDesc) override {
+        return std::dynamic_pointer_cast<InferenceEngine::MemoryBlob>(ov::tensor_to_blob(
+            m_context->create_host_tensor(InferenceEngine::details::convertPrecision(tensorDesc.getPrecision()),
+                                          tensorDesc.getBlockingDesc().getBlockDims())));
+    }
+
+    InferenceEngine::ParamMap getParams() const override {
+        return m_context->get_property();
+    }
+};
+
+}  // namespace ov
+
+namespace InferenceEngine {
+
+class IRemoteContextWrapper : public ov::IRemoteContext {
+private:
+    std::shared_ptr<InferenceEngine::RemoteContext> m_context;
+    mutable std::string m_name;
+    mutable ov::AnyMap m_params;
+
+public:
+    IRemoteContextWrapper(const std::shared_ptr<InferenceEngine::RemoteContext>& context) : m_context(context) {}
+    const std::shared_ptr<InferenceEngine::RemoteContext>& get_context() {
+        return m_context;
+    }
+    const std::string& get_device_name() const override {
+        m_name = m_context->getDeviceName();
+        return m_name;
+    }
+
+    const ov::AnyMap& get_property() const override {
+        m_params = m_context->getParams();
+        return m_params;
+    }
+
+    std::shared_ptr<ov::IRemoteTensor> create_tensor(const ov::element::Type& type,
+                                                     const ov::Shape& shape,
+                                                     const ov::AnyMap& params = {}) override {
+        InferenceEngine::TensorDesc desc(InferenceEngine::details::convertPrecision(type),
+                                         shape,
+                                         InferenceEngine::TensorDesc::getLayoutByDims(shape));
+        auto blob = m_context->CreateBlob(desc, params);
+        blob->allocate();
+        return std::dynamic_pointer_cast<ov::IRemoteTensor>(ov::make_tensor(blob));
+    }
+
+    std::shared_ptr<ov::ITensor> create_host_tensor(const ov::element::Type type, const ov::Shape& shape) override {
+        InferenceEngine::TensorDesc desc(InferenceEngine::details::convertPrecision(type),
+                                         shape,
+                                         InferenceEngine::TensorDesc::getLayoutByDims(shape));
+        auto blob = m_context->CreateHostBlob(desc);
+        blob->allocate();
+        return ov::make_tensor(blob);
+    }
+};
+
+}  // namespace InferenceEngine
+
+std::shared_ptr<InferenceEngine::RemoteContext> ov::legacy_convert::convert_remote_context(
+    const std::shared_ptr<ov::IRemoteContext>& context) {
+    if (auto ctx = std::dynamic_pointer_cast<InferenceEngine::IRemoteContextWrapper>(context)) {
+        return ctx->get_context();
+    }
+    return std::make_shared<ov::RemoteContextWrapper>(context);
+}
+std::shared_ptr<ov::IRemoteContext> ov::legacy_convert::convert_remote_context(
+    const std::shared_ptr<InferenceEngine::RemoteContext>& context) {
+    if (auto ctx = std::dynamic_pointer_cast<ov::RemoteContextWrapper>(context)) {
+        return ctx->get_context();
+    }
+    return std::make_shared<InferenceEngine::IRemoteContextWrapper>(context);
+}
--- a/src/inference/src/dev/converter_utils.hpp
+++ b/src/inference/src/dev/converter_utils.hpp
@ -7,10 +7,12 @@
 #include "cpp/ie_cnn_network.h"
 #include "cpp_interfaces/interface/ie_iinfer_request_internal.hpp"
 #include "cpp_interfaces/interface/ie_iplugin_internal.hpp"
+#include "ie_remote_blob.hpp"
 #include "openvino/core/model.hpp"
 #include "openvino/runtime/iasync_infer_request.hpp"
 #include "openvino/runtime/icompiled_model.hpp"
 #include "openvino/runtime/iplugin.hpp"
+#include "openvino/runtime/iremote_context.hpp"

 namespace ov {
 namespace legacy_convert {
@ -34,6 +36,11 @@ std::shared_ptr<::InferenceEngine::IInferRequestInternal> convert_infer_request(
 std::shared_ptr<::ov::IAsyncInferRequest> convert_infer_request(
    const std::shared_ptr<::InferenceEngine::IInferRequestInternal>& request);

+std::shared_ptr<InferenceEngine::RemoteContext> convert_remote_context(
+    const std::shared_ptr<ov::IRemoteContext>& context);
+std::shared_ptr<ov::IRemoteContext> convert_remote_context(
+    const std::shared_ptr<InferenceEngine::RemoteContext>& context);
+
 }  // namespace legacy_convert
 }  // namespace ov

--- a/src/inference/src/dev/core_impl_ie.cpp
+++ b/src/inference/src/dev/core_impl_ie.cpp
@ -43,7 +43,7 @@ ov::SoPtr<InferenceEngine::IExecutableNetworkInternal> ov::CoreImpl::LoadNetwork
 }

 InferenceEngine::RemoteContext::Ptr ov::CoreImpl::GetDefaultContext(const std::string& deviceName) {
-    return get_default_context(deviceName)._impl;
+    return ov::legacy_convert::convert_remote_context(get_default_context(deviceName)._impl);
 }

 InferenceEngine::CNNNetwork ov::CoreImpl::ReadNetwork(const std::string& modelPath, const std::string& binPath) const {
@ -64,7 +64,7 @@ ov::SoPtr<InferenceEngine::IExecutableNetworkInternal> ov::CoreImpl::LoadNetwork
    const std::map<std::string, std::string>& config) {
    OV_ITT_SCOPE(FIRST_INFERENCE, InferenceEngine::itt::domains::IE_LT, "Core::LoadNetwork::RemoteContext");
    if (network.getFunction()) {
-        ov::RemoteContext ctx{context, {nullptr}};
+        ov::RemoteContext ctx{ov::legacy_convert::convert_remote_context(context), {nullptr}};
        auto compiled_model =
            compile_model(ov::legacy_convert::convert_model(network, isNewAPI()), ctx, any_copy(config));
        return {ov::legacy_convert::convert_compiled_model(compiled_model._ptr), compiled_model._so};
@ -207,7 +207,7 @@ std::vector<std::string> ov::CoreImpl::GetAvailableDevices() const {

 InferenceEngine::RemoteContext::Ptr ov::CoreImpl::CreateContext(const std::string& deviceName,
                                                                const InferenceEngine::ParamMap& params) {
-    return create_context(deviceName, params)._impl;
+    return ov::legacy_convert::convert_remote_context(create_context(deviceName, params)._impl);
 }

 /**
--- a/src/inference/src/dev/icompiled_model.cpp
+++ b/src/inference/src/dev/icompiled_model.cpp
@ -4,6 +4,7 @@

 #include "openvino/runtime/icompiled_model.hpp"

+#include "dev/converter_utils.hpp"
 #include "icompiled_model_wrapper.hpp"
 #include "openvino/core/model.hpp"
 #include "openvino/runtime/properties.hpp"
@ -13,7 +14,15 @@ ov::ICompiledModel::ICompiledModel(const std::shared_ptr<const ov::Model>& model
                                   const std::shared_ptr<const ov::IPlugin>& plugin,
                                   const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor,
                                   const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor)
+    : ICompiledModel(model, plugin, {}, task_executor, callback_executor) {}
+
+ov::ICompiledModel::ICompiledModel(const std::shared_ptr<const ov::Model>& model,
+                                   const std::shared_ptr<const ov::IPlugin>& plugin,
+                                   const ov::RemoteContext& context,
+                                   const std::shared_ptr<ov::threading::ITaskExecutor>& task_executor,
+                                   const std::shared_ptr<ov::threading::ITaskExecutor>& callback_executor)
    : m_plugin(plugin),
+      m_context(context),
      m_task_executor(task_executor),
      m_callback_executor(callback_executor) {
    OPENVINO_ASSERT(m_plugin);
@ -92,3 +101,12 @@ const std::shared_ptr<ov::threading::ITaskExecutor> ov::ICompiledModel::get_task
 const std::shared_ptr<ov::threading::ITaskExecutor> ov::ICompiledModel::get_callback_executor() const {
    return m_callback_executor;
 }
+
+std::shared_ptr<ov::IRemoteContext> ov::ICompiledModel::get_context() const {
+    if (auto wrapper = dynamic_cast<const InferenceEngine::ICompiledModelWrapper*>(this)) {
+        return ov::legacy_convert::convert_remote_context(wrapper->get_executable_network()->GetContext());
+    }
+    if (m_context._impl)
+        return m_context._impl;
+    return m_plugin->get_default_context({});
+}
--- a/src/inference/src/dev/icompiled_model_wrapper.cpp
+++ b/src/inference/src/dev/icompiled_model_wrapper.cpp
@ -77,10 +77,6 @@ ov::Any InferenceEngine::ICompiledModelWrapper::get_property(const std::string&
    }
 }

-ov::RemoteContext InferenceEngine::ICompiledModelWrapper::get_context() const {
-    return {m_model->GetContext(), {}};
-}
-
 std::shared_ptr<InferenceEngine::IExecutableNetworkInternal>
 InferenceEngine::ICompiledModelWrapper::get_executable_network() {
    return m_model;
--- a/src/inference/src/dev/icompiled_model_wrapper.hpp
+++ b/src/inference/src/dev/icompiled_model_wrapper.hpp
@ -23,8 +23,6 @@ public:

    ov::Any get_property(const std::string& name) const override;

-    ov::RemoteContext get_context() const override;
-
    std::shared_ptr<InferenceEngine::IExecutableNetworkInternal> get_executable_network();
    std::shared_ptr<const InferenceEngine::IExecutableNetworkInternal> get_executable_network() const;

--- a/src/inference/src/dev/iplugin_wrapper.cpp
+++ b/src/inference/src/dev/iplugin_wrapper.cpp
@ -49,7 +49,7 @@ std::shared_ptr<ov::ICompiledModel> IPluginWrapper::compile_model(const std::sha
    return ov::legacy_convert::convert_compiled_model(
        update_exec_network(m_old_plugin->LoadNetwork(ov::legacy_convert::convert_model(model, is_new_api()),
                                                      any_copy(properties),
-                                                      context._impl)));
+                                                      ov::legacy_convert::convert_remote_context(context._impl))));
 }

 void IPluginWrapper::set_property(const ov::AnyMap& properties) {
@ -64,12 +64,12 @@ ov::Any IPluginWrapper::get_property(const std::string& name, const ov::AnyMap&
    }
 }

-ov::RemoteContext IPluginWrapper::create_context(const ov::AnyMap& remote_properties) const {
-    return ov::RemoteContext{m_old_plugin->CreateContext(remote_properties), {nullptr}};
+std::shared_ptr<ov::IRemoteContext> IPluginWrapper::create_context(const ov::AnyMap& remote_properties) const {
+    return ov::legacy_convert::convert_remote_context(m_old_plugin->CreateContext(remote_properties));
 }

-ov::RemoteContext IPluginWrapper::get_default_context(const ov::AnyMap& remote_properties) const {
-    return ov::RemoteContext{m_old_plugin->GetDefaultContext(remote_properties), {nullptr}};
+std::shared_ptr<ov::IRemoteContext> IPluginWrapper::get_default_context(const ov::AnyMap& remote_properties) const {
+    return ov::legacy_convert::convert_remote_context(m_old_plugin->GetDefaultContext(remote_properties));
 }

 std::shared_ptr<ov::ICompiledModel> IPluginWrapper::import_model(std::istream& model,
@ -82,7 +82,9 @@ std::shared_ptr<ov::ICompiledModel> IPluginWrapper::import_model(std::istream& m
                                                                 const ov::RemoteContext& context,
                                                                 const ov::AnyMap& properties) const {
    return ov::legacy_convert::convert_compiled_model(
-        update_exec_network(m_old_plugin->ImportNetwork(model, context._impl, any_copy(properties))));
+        update_exec_network(m_old_plugin->ImportNetwork(model,
+                                                        ov::legacy_convert::convert_remote_context(context._impl),
+                                                        any_copy(properties))));
 }

 ov::SupportedOpsMap IPluginWrapper::query_model(const std::shared_ptr<const ov::Model>& model,
--- a/src/inference/src/dev/iplugin_wrapper.hpp
+++ b/src/inference/src/dev/iplugin_wrapper.hpp
@ -80,7 +80,7 @@ public:
     *
     * @return Remote context
     */
-    ov::RemoteContext create_context(const ov::AnyMap& remote_properties) const override;
+    std::shared_ptr<ov::IRemoteContext> create_context(const ov::AnyMap& remote_properties) const override;

    /**
     * @brief Create default remote context
@ -89,7 +89,7 @@ public:
     *
     * @return Remote context
     */
-    ov::RemoteContext get_default_context(const ov::AnyMap& remote_properties) const override;
+    std::shared_ptr<ov::IRemoteContext> get_default_context(const ov::AnyMap& remote_properties) const override;

    /**
     * @brief Import model to the plugin
--- a/src/inference/src/dev/iremote_context.cpp
+++ b/src/inference/src/dev/iremote_context.cpp
@ -0,0 +1,12 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "openvino/runtime/iremote_context.hpp"
+
+#include "dev/make_tensor.hpp"
+
+std::shared_ptr<ov::ITensor> ov::IRemoteContext::create_host_tensor(const ov::element::Type type,
+                                                                    const ov::Shape& shape) {
+    return ov::make_tensor(type, shape);
+}
--- a/src/inference/src/dev/iremote_tensor.cpp
+++ b/src/inference/src/dev/iremote_tensor.cpp
@ -16,346 +16,4 @@ namespace ov {

 IRemoteTensor::~IRemoteTensor() = default;

-/**
- * @brief Tensor what contains InferenceEngine::Blob inside
- * Blob owns the memory
- */
-class BlobTensor : public ITensor {
-    mutable element::Type m_type;
-    mutable Shape m_shape;
-    mutable Strides m_strides;
-
-    void update_strides() {
-        if (get_element_type().bitwidth() >= 8) {
-            const auto& element_strides = blob->getTensorDesc().getBlockingDesc().getStrides();
-            const size_t elem_size = get_element_type().size();
-            m_strides.clear();
-            m_strides.resize(element_strides.size());
-            std::transform(element_strides.begin(),
-                           element_strides.end(),
-                           m_strides.begin(),
-                           [&elem_size](size_t stride) {
-                               return stride * elem_size;
-                           });
-        }
-    }
-
-public:
-    std::shared_ptr<ie::Blob> blob;
-
-    BlobTensor(const InferenceEngine::Blob::Ptr& blob) : blob{blob} {
-        auto remote_impl = dynamic_cast<InferenceEngine::RemoteBlob*>(blob.get());
-        OPENVINO_ASSERT(!remote_impl);
-        OPENVINO_ASSERT(blob);
-        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
-        update_strides();
-    }
-
-    const element::Type& get_element_type() const override {
-        m_type = InferenceEngine::details::convertPrecision(blob->getTensorDesc().getPrecision());
-        return m_type;
-    }
-
-    void set_shape(ov::Shape shape) override {
-        blob->setShape({shape.begin(), shape.end()});
-        update_strides();
-    }
-
-    const Shape& get_shape() const override {
-        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
-        return m_shape;
-    }
-
-    const Strides& get_strides() const override {
-        OPENVINO_ASSERT(get_element_type().bitwidth() >= 8,
-                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
-                        get_element_type());
-        return m_strides;
-    }
-
-    size_t get_size() const override {
-        return blob->size();
-    }
-
-    size_t get_byte_size() const override {
-        return blob->byteSize();
-    }
-
-    void* data(const element::Type& element_type) const override {
-        OPENVINO_ASSERT(blob != nullptr, "Tensor was not initialized.");
-#define TYPE_CHECK(TYPE) (dynamic_cast<const ie::TBlob<TYPE>*>(blob.get()) != nullptr)
-        auto host_accesable_implementation = TYPE_CHECK(bool) || TYPE_CHECK(int8_t) || TYPE_CHECK(uint8_t) ||
-                                             TYPE_CHECK(int16_t) || TYPE_CHECK(uint16_t) || TYPE_CHECK(int32_t) ||
-                                             TYPE_CHECK(uint32_t) || TYPE_CHECK(int64_t) || TYPE_CHECK(uint64_t) ||
-                                             TYPE_CHECK(float) || TYPE_CHECK(double);
-#undef TYPE_CHECK
-        OPENVINO_ASSERT(host_accesable_implementation,
-                        "Tensor implementation type dose not contains host accessable data");
-        if (element_type != element::undefined && element_type.is_static()) {
-            OPENVINO_ASSERT(element_type == get_element_type(),
-                            "Tensor data with element type ",
-                            get_element_type(),
-                            ", is not representable as pointer to ",
-                            element_type);
-        }
-        // since we don't use byte offsets, we need to explicitly multiply by element_size
-        auto byte_offset = blob->getTensorDesc().getBlockingDesc().getOffsetPadding() * get_element_type().size();
-        OPENVINO_ASSERT((get_element_type().bitwidth() >= 8) || (byte_offset == 0),
-                        "ROI access for types with bitwidths less then 8 bit is not implemented. Tensor type: ",
-                        get_element_type());
-        return byte_offset + InferenceEngine::as<InferenceEngine::MemoryBlob>(blob)->rmap().as<uint8_t*>();
-    }
-};
-
-/**
- * @brief Tensor what contains InferenceEngine::RemoteBlob inside
- * Blob owns the memory
- */
-class RemoteBlobTensor : public IRemoteTensor {
-    mutable element::Type m_type;
-    mutable Shape m_shape;
-    mutable Strides m_strides;
-    mutable ov::AnyMap m_properties;
-    mutable std::string m_dev_name;
-
-public:
-    std::shared_ptr<ie::RemoteBlob> blob;
-
-    RemoteBlobTensor(const InferenceEngine::RemoteBlob::Ptr& blob) : blob{blob} {
-        OPENVINO_ASSERT(blob);
-        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
-    }
-
-    const element::Type& get_element_type() const override {
-        m_type = InferenceEngine::details::convertPrecision(blob->getTensorDesc().getPrecision());
-        return m_type;
-    }
-
-    void set_shape(ov::Shape shape) override {
-        blob->setShape({shape.begin(), shape.end()});
-    }
-
-    const Shape& get_shape() const override {
-        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
-        return m_shape;
-    }
-
-    const Strides& get_strides() const override {
-        OPENVINO_ASSERT(get_element_type().bitwidth() >= 8,
-                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
-                        get_element_type());
-        const auto& element_strides = blob->getTensorDesc().getBlockingDesc().getStrides();
-        const size_t elem_size = get_element_type().size();
-        m_strides.clear();
-        m_strides.resize(element_strides.size());
-        std::transform(element_strides.begin(), element_strides.end(), m_strides.begin(), [&elem_size](size_t stride) {
-            return stride * elem_size;
-        });
-        return m_strides;
-    }
-
-    size_t get_size() const override {
-        return blob->size();
-    }
-
-    size_t get_byte_size() const override {
-        return blob->byteSize();
-    }
-
-    const AnyMap& get_properties() const override {
-        m_properties = blob->getParams();
-        return m_properties;
-    }
-
-    const std::string& get_device_name() const override {
-        m_dev_name = blob->getDeviceName();
-        return m_dev_name;
-    }
-};
-
-/**
- * @brief Create InferenceEngine::RemoteBlob from the Tensor
- */
-class TensorRemoteBlob : public ie::RemoteBlob {
-public:
-    TensorRemoteBlob(const std::shared_ptr<ITensor>& tensor)
-        : ie::RemoteBlob{ie::TensorDesc{ie::details::convertPrecision(tensor->get_element_type()),
-                                        tensor->get_shape(),
-                                        ie::TensorDesc::getLayoutByRank(tensor->get_shape().size())}},
-          tensor{std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor)} {
-        OPENVINO_ASSERT(this->tensor);
-    }
-    AnyMap getParams() const override {
-        return tensor->get_properties();
-    }
-    std::string getDeviceName() const noexcept override {
-        try {
-            return tensor->get_device_name();
-        } catch (...) {
-            return {};
-        }
-    }
-    std::shared_ptr<ie::RemoteContext> getContext() const noexcept override {
-        return {};
-    }
-
-    void allocate() noexcept override {}
-    bool deallocate() noexcept override {
-        return true;
-    }
-    ie::LockedMemory<void> buffer() noexcept override {
-        return {nullptr, nullptr, 0};
-    }
-    ie::LockedMemory<const void> cbuffer() const noexcept override {
-        return {nullptr, nullptr, 0};
-    }
-    ie::LockedMemory<void> rwmap() noexcept override {
-        return {nullptr, nullptr, 0};
-    }
-    ie::LockedMemory<const void> rmap() const noexcept override {
-        return {nullptr, nullptr, 0};
-    }
-    ie::LockedMemory<void> wmap() noexcept override {
-        return {nullptr, nullptr, 0};
-    }
-    const std::shared_ptr<ie::IAllocator>& getAllocator() const noexcept override {
-        return m_allocator;
-    }
-    void* getHandle() const noexcept override {
-        return nullptr;
-    }
-
-    std::shared_ptr<IRemoteTensor> tensor;
-
-private:
-    std::shared_ptr<ie::IAllocator> m_allocator;
-};
-
-/**
- * @brief Create InferenceEngine::TBlob<T> from the tensor
- *
- * @tparam T Blob data type
- */
-template <typename T>
-class TensorMemoryBlob : public ie::TBlob<T> {
-public:
-    ~TensorMemoryBlob() override = default;
-    explicit TensorMemoryBlob(const std::shared_ptr<ITensor>& tensor_) try : ie
-        ::TBlob<T>{[&] {
-                       auto element_type = tensor_->get_element_type();
-                       auto shape = tensor_->get_shape();
-                       ie::SizeVector blk_order(shape.size());
-                       std::iota(blk_order.begin(), blk_order.end(), 0);
-                       ie::SizeVector dim_offset(shape.size(), 0);
-                       ie::SizeVector blk_strides;
-                       auto byte_strides = element_type.bitwidth() >= 8 ? tensor_->get_strides() : Strides{};
-                       if (byte_strides.empty()) {
-                           blk_strides = ov::row_major_strides(shape);
-                       } else {
-                           blk_strides.resize(byte_strides.size());
-                           std::transform(byte_strides.begin(),
-                                          byte_strides.end(),
-                                          blk_strides.begin(),
-                                          [&element_type](size_t byte_stride) {
-                                              OPENVINO_ASSERT(byte_stride % element_type.size() == 0,
-                                                              "Limitation: Stride in bytes ",
-                                                              byte_stride,
-                                                              " should be divisible by size of element ",
-                                                              element_type.size());
-                                              return byte_stride / element_type.size();
-                                          });
-                       }
-                       return ie::TensorDesc{ie::details::convertPrecision(element_type),
-                                             shape,
-                                             ie::BlockingDesc{shape, blk_order, 0, dim_offset, blk_strides}};
-                   }(),
-                   static_cast<T*>(tensor_->data()),
-                   tensor_->get_byte_size()},
-            tensor{tensor_} {
-            OPENVINO_ASSERT(!std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor));
-        }
-    catch (const std::exception& ex) {
-        throw ov::Exception(ex.what());
-    }
-
-    void setShape(const ie::SizeVector& dims) override {
-        tensor->set_shape(dims);
-        ie::TBlob<T>::setShape(dims);
-    }
-
-    std::shared_ptr<ITensor> tensor;
-};
-
-std::shared_ptr<ITensor> make_tensor(const std::shared_ptr<ie::Blob>& blob) {
-#define ELSE_IF(type)                                                                \
-    else if (auto tblob = dynamic_cast<const TensorMemoryBlob<type>*>(blob.get())) { \
-        return tblob->tensor;                                                        \
-    }
-    if (blob == nullptr) {
-        return {};
-    } else if (auto remote_blob = std::dynamic_pointer_cast<TensorRemoteBlob>(blob)) {
-        return remote_blob->tensor;
-    } else if (auto remote_blob = std::dynamic_pointer_cast<InferenceEngine::RemoteBlob>(blob)) {
-        return std::make_shared<RemoteBlobTensor>(remote_blob);
-    }
-    ELSE_IF(float)
-    ELSE_IF(double)
-    ELSE_IF(int8_t)
-    ELSE_IF(int8_t)
-    ELSE_IF(int16_t)
-    ELSE_IF(int32_t)
-    ELSE_IF(int64_t)
-    ELSE_IF(uint8_t)
-    ELSE_IF(uint8_t)
-    ELSE_IF(uint16_t)
-    ELSE_IF(uint32_t)
-    ELSE_IF(uint64_t)
-    ELSE_IF(int8_t)
-    ELSE_IF(bool) else {
-        return std::make_shared<BlobTensor>(blob);
-    }
-#undef IF
-}
-
-ie::Blob::Ptr tensor_to_blob(const std::shared_ptr<ITensor>& tensor) {
-    if (tensor == nullptr) {
-        return {};
-    } else if (auto blob_tensor = std::dynamic_pointer_cast<BlobTensor>(tensor)) {
-        return blob_tensor->blob;
-    } else if (auto blob_tensor = std::dynamic_pointer_cast<RemoteBlobTensor>(tensor)) {
-        return blob_tensor->blob;
-    } else if (auto blob_tensor = dynamic_cast<const BlobTensor*>(tensor.get())) {
-        return blob_tensor->blob;
-    } else if (std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor)) {
-        return std::make_shared<TensorRemoteBlob>(tensor);
-    } else {
-#define CASE(precision, T)   \
-    case element::precision: \
-        return std::make_shared<TensorMemoryBlob<T>>(tensor);
-        switch (tensor->get_element_type()) {
-            CASE(f32, float);
-            CASE(f64, double);
-            CASE(i4, int8_t);
-            CASE(i8, int8_t);
-            CASE(i16, int16_t);
-            CASE(i32, int32_t);
-            CASE(i64, int64_t);
-            CASE(u4, uint8_t);
-            CASE(u8, uint8_t);
-            CASE(u16, uint16_t);
-            CASE(u32, uint32_t);
-            CASE(u64, uint64_t);
-            CASE(u1, int8_t);
-            CASE(boolean, bool);
-        case element::f16:
-            return std::make_shared<TensorMemoryBlob<int16_t>>(tensor);
-        case element::bf16:
-            return std::make_shared<TensorMemoryBlob<int16_t>>(tensor);
-        default:
-            OPENVINO_THROW("Unsupported element type");
-        }
-#undef CASE
-    }
-    OPENVINO_THROW("Cannot convert tensor to blob!");
-}
 }  // namespace ov
--- a/src/inference/src/dev/isync_infer_request.cpp
+++ b/src/inference/src/dev/isync_infer_request.cpp
@ -13,7 +13,7 @@
 #include "openvino/op/util/op_types.hpp"
 #include "openvino/runtime/icompiled_model.hpp"
 #include "openvino/runtime/iinfer_request.hpp"
-#include "openvino/runtime/remote_context.hpp"
+#include "openvino/runtime/iremote_context.hpp"
 #include "openvino/runtime/tensor.hpp"

 namespace {
@ -137,7 +137,7 @@ void ov::ISyncInferRequest::convert_batched_tensors() {
        auto tmp_shape = item.second.at(0).get_shape();
        auto tmp_et = item.second.at(0).get_element_type();
        tmp_shape[0] = item.second.size();
-        ov::RemoteContext remote_context;
+        std::shared_ptr<ov::IRemoteContext> remote_context;
        ov::Tensor input_tensor;
        try {
            auto net = get_compiled_model();
@ -146,8 +146,8 @@ void ov::ISyncInferRequest::convert_batched_tensors() {
            }
        } catch (const ov::NotImplemented&) {
        }
-        if (remote_context._impl) {
-            input_tensor = remote_context.create_host_tensor(tmp_et, tmp_shape);
+        if (remote_context) {
+            input_tensor = ov::Tensor(remote_context->create_host_tensor(tmp_et, tmp_shape), {});
        } else {
            input_tensor = ov::Tensor(tmp_et, tmp_shape);
        }
@ -251,7 +251,7 @@ void ov::ISyncInferRequest::check_tensor(const ov::Output<const ov::Node>& port,
                    " expecting ",
                    port.get_shape(),
                    ".");
-    OPENVINO_ASSERT(tensor.data() != nullptr, "Tensor data equal nullptr!");
+    OPENVINO_ASSERT(tensor.is<ov::RemoteTensor>() || tensor.data() != nullptr, "Tensor data equal nullptr!");
 }

 void ov::ISyncInferRequest::allocate_tensor(const ov::Output<const ov::Node>& port,
--- a/src/inference/src/dev/make_tensor.cpp
+++ b/src/inference/src/dev/make_tensor.cpp
@ -0,0 +1,614 @@
+// Copyright (C) 2018-2023 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "dev/make_tensor.hpp"
+
+#include <memory>
+
+#include "dev/make_tensor.hpp"
+#include "ie_blob.h"
+#include "ie_ngraph_utils.hpp"
+#include "ie_remote_blob.hpp"
+#include "openvino/runtime/iremote_tensor.hpp"
+#include "openvino/runtime/properties.hpp"
+
+namespace ov {
+
+/**
+ * @brief View tensor to external memory
+ * The tensor doesn't own the external memory
+ */
+class ViewTensor : public ITensor {
+public:
+    ViewTensor(const element::Type element_type, const Shape& shape, void* ptr)
+        : m_element_type{element_type},
+          m_shape{shape},
+          m_capacity{shape},
+          m_ptr{ptr} {
+        OPENVINO_ASSERT(m_ptr != nullptr);
+        OPENVINO_ASSERT(m_element_type != element::undefined && m_element_type.is_static());
+        update_strides();
+    }
+
+    void* data(const element::Type& element_type) const override {
+        if (element_type != element::undefined && element_type != element::dynamic) {
+            OPENVINO_ASSERT(element_type == get_element_type(),
+                            "Tensor data with element type ",
+                            get_element_type(),
+                            ", is not representable as pointer to ",
+                            element_type);
+        }
+        return m_ptr;
+    }
+
+    const element::Type& get_element_type() const override {
+        return m_element_type;
+    }
+
+    const Shape& get_shape() const override {
+        return m_shape;
+    }
+
+    void set_shape(ov::Shape new_shape) override {
+        OPENVINO_ASSERT(shape_size(new_shape) <= ov::shape_size(m_capacity), "Could set new shape: ", new_shape);
+        m_shape = std::move(new_shape);
+        update_strides();
+    }
+
+    const Strides& get_strides() const override {
+        OPENVINO_ASSERT(m_element_type.bitwidth() >= 8,
+                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
+                        m_element_type);
+        return m_strides;
+    }
+
+protected:
+    void update_strides() {
+        if (m_element_type.bitwidth() < 8)
+            return;
+        auto& shape = get_shape();
+        m_strides.clear();
+        if (!shape.empty()) {
+            m_strides.resize(shape.size());
+            m_strides.back() = m_element_type.size();
+            std::copy(shape.rbegin(), shape.rend() - 1, m_strides.rbegin() + 1);
+            std::partial_sum(m_strides.rbegin(), m_strides.rend(), m_strides.rbegin(), std::multiplies<size_t>());
+        }
+    }
+
+    element::Type m_element_type;
+    Shape m_shape;
+    Shape m_capacity;
+    Strides m_strides;
+    void* m_ptr;
+};
+
+/**
+ * @brief View tensor on external memory with strides
+ */
+class StridedViewTensor : public ViewTensor {
+public:
+    StridedViewTensor(const element::Type element_type, const Shape& shape, void* ptr, const Strides& strides)
+        : ViewTensor{element_type, shape, ptr} {
+        OPENVINO_ASSERT(
+            get_element_type().bitwidth() >= 8,
+            "Could not create strided access tensor for types with bitwidths less then 8 bit. Tensor type: ",
+            get_element_type());
+        // Save default strides
+        auto shape_strides = m_strides;
+        // Change strides
+        m_strides = strides;
+        OPENVINO_ASSERT(m_shape.size() == m_strides.size());
+
+        for (size_t i = 0; i < m_strides.size(); ++i) {
+            OPENVINO_ASSERT(shape_strides[i] <= m_strides[i],
+                            "shape stride: ",
+                            shape_strides[i],
+                            ", stride: ",
+                            m_strides[i]);
+            OPENVINO_ASSERT((m_strides[i] % get_element_type().size()) == 0,
+                            "shape stride: ",
+                            shape_strides[i],
+                            ", stride: ",
+                            m_strides[i]);
+            if (i) {
+                OPENVINO_ASSERT(m_strides[i - 1] >= m_strides[i] * shape[i],
+                                "Strides: ",
+                                m_strides,
+                                " are incompatible with shapes: ",
+                                m_shape);
+            }
+        }
+    }
+
+    void set_shape(ov::Shape new_shape) override {
+        OPENVINO_ASSERT(m_capacity.size() == new_shape.size(),
+                        "Cannot set new shape: ",
+                        new_shape,
+                        " for tensor with strides! Shapes are not compatible.");
+        for (size_t i = 0; i < new_shape.size(); i++) {
+            OPENVINO_ASSERT(m_capacity[i] >= new_shape[i],
+                            "Cannot set new shape: ",
+                            new_shape,
+                            " for tensor with strides! Dimension: ",
+                            i,
+                            " is not compatible.");
+        }
+        m_shape = std::move(new_shape);
+    }
+};
+
+/**
+ * @brief Creates view tensor on external memory
+ *
+ * @param element_type Tensor element type
+ * @param shape Tensor shape
+ * @param ptr pointer to external memoty
+ * @param byte_strides Tensor strides
+ *
+ * @return Shared pointer to tensor interface
+ */
+std::shared_ptr<ITensor> make_tensor(const element::Type element_type,
+                                     const Shape& shape,
+                                     void* ptr,
+                                     const Strides& byte_strides) {
+    return byte_strides.empty() ? std::make_shared<ViewTensor>(element_type, shape, ptr)
+                                : std::make_shared<StridedViewTensor>(element_type, shape, ptr, byte_strides);
+}
+
+/**
+ * @brief Tensor with allocated memory
+ * Tensor owns the memory
+ */
+class AllocatedTensor : public ViewTensor {
+public:
+    AllocatedTensor(const element::Type element_type, const Shape& shape, const Allocator& allocator)
+        : ViewTensor{element_type,
+                     shape,
+                     [&] {
+                         OPENVINO_ASSERT(allocator, "Allocator was not initialized");
+                         return const_cast<Allocator&>(allocator).allocate(element_type.size() * shape_size(shape));
+                     }()},
+          m_allocator{allocator} {}
+
+    ~AllocatedTensor() {
+        m_allocator.deallocate(m_ptr, get_byte_size());
+    }
+
+    void set_shape(ov::Shape new_shape) override {
+        auto old_byte_size = get_byte_size();
+        m_shape = std::move(new_shape);
+        if (get_byte_size() > old_byte_size) {
+            m_allocator.deallocate(m_ptr, old_byte_size);
+            m_ptr = m_allocator.allocate(get_byte_size());
+        }
+        update_strides();
+    }
+
+private:
+    Allocator m_allocator;
+};
+
+/**
+ * @brief Creates allocated tensor
+ *
+ * @param element_type Tensor element type
+ * @param shape Tensor shape
+ * @param allocator Tensor allocator
+ *
+ * @return Shared pointer to tensor interface
+ */
+std::shared_ptr<ITensor> make_tensor(const element::Type element_type, const Shape& shape, const Allocator& allocator) {
+    return std::make_shared<AllocatedTensor>(element_type, shape, allocator);
+}
+
+/**
+ * @brief ROI tensor on other tensor
+ * ROI tensor holds the owner
+ */
+class RoiTensor : public ITensor {
+public:
+    RoiTensor(const std::shared_ptr<ITensor>& owner, const Coordinate& begin, const Coordinate& end) : m_owner{owner} {
+        OPENVINO_ASSERT(owner->get_element_type().bitwidth() >= 8,
+                        "ROI Tensor for types with bitwidths less then 8 bit is not implemented. Tensor type: ",
+                        owner->get_element_type());
+        auto owner_shape = owner->get_shape();
+        OPENVINO_ASSERT(owner_shape.size() == begin.size());
+        OPENVINO_ASSERT(begin.size() == end.size());
+        m_shape.resize(begin.size());
+        for (size_t i = 0; i < begin.size(); ++i) {
+            OPENVINO_ASSERT(begin[i] <= owner_shape[i]);
+            OPENVINO_ASSERT(end[i] <= owner_shape[i]);
+            m_shape[i] = end[i] - begin[i];
+            OPENVINO_ASSERT(m_shape[i] <= owner_shape[i]);
+        }
+        auto& strides = get_strides();
+        m_offset = std::inner_product(begin.begin(), begin.end(), strides.begin(), static_cast<size_t>(0));
+    }
+
+    const element::Type& get_element_type() const override {
+        return m_owner->get_element_type();
+    }
+
+    const Strides& get_strides() const override {
+        return m_owner->get_strides();
+    }
+
+    const Shape& get_shape() const override {
+        return m_shape;
+    }
+
+    void set_shape(ov::Shape new_shape) override {
+        OPENVINO_THROW("Shapes cannot be changed for ROI Tensor");
+    }
+
+    void* data(const element::Type& element_type) const override {
+        auto owner_data = m_owner->data(element_type);
+        return static_cast<uint8_t*>(owner_data) + m_offset;
+    }
+
+private:
+    std::shared_ptr<ITensor> m_owner;
+    size_t m_offset;
+    Shape m_shape;
+};
+
+/**
+ * @brief Creates ROI tensor
+ *
+ * @param other Tensor what owns the memory
+ * @param begin Begin coordinates
+ * @param end End coordinates
+ *
+ * @return Shared pointer to tensor interface
+ */
+std::shared_ptr<ITensor> make_tensor(const std::shared_ptr<ITensor>& other,
+                                     const Coordinate& begin,
+                                     const Coordinate& end) {
+    return std::make_shared<RoiTensor>(other, begin, end);
+}
+
+/**
+ * @brief Tensor what contains InferenceEngine::Blob inside
+ * Blob owns the memory
+ */
+class BlobTensor : public ITensor {
+    mutable element::Type m_type;
+    mutable Shape m_shape;
+    mutable Strides m_strides;
+
+    void update_strides() {
+        if (get_element_type().bitwidth() >= 8) {
+            const auto& element_strides = blob->getTensorDesc().getBlockingDesc().getStrides();
+            const size_t elem_size = get_element_type().size();
+            m_strides.clear();
+            m_strides.resize(element_strides.size());
+            std::transform(element_strides.begin(),
+                           element_strides.end(),
+                           m_strides.begin(),
+                           [&elem_size](size_t stride) {
+                               return stride * elem_size;
+                           });
+        }
+    }
+
+public:
+    std::shared_ptr<ie::Blob> blob;
+
+    BlobTensor(const InferenceEngine::Blob::Ptr& blob) : blob{blob} {
+        auto remote_impl = dynamic_cast<InferenceEngine::RemoteBlob*>(blob.get());
+        OPENVINO_ASSERT(!remote_impl);
+        OPENVINO_ASSERT(blob);
+        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
+        update_strides();
+    }
+
+    const element::Type& get_element_type() const override {
+        m_type = InferenceEngine::details::convertPrecision(blob->getTensorDesc().getPrecision());
+        return m_type;
+    }
+
+    void set_shape(ov::Shape shape) override {
+        blob->setShape({shape.begin(), shape.end()});
+        update_strides();
+    }
+
+    const Shape& get_shape() const override {
+        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
+        return m_shape;
+    }
+
+    const Strides& get_strides() const override {
+        OPENVINO_ASSERT(get_element_type().bitwidth() >= 8,
+                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
+                        get_element_type());
+        return m_strides;
+    }
+
+    size_t get_size() const override {
+        return blob->size();
+    }
+
+    size_t get_byte_size() const override {
+        return blob->byteSize();
+    }
+
+    void* data(const element::Type& element_type) const override {
+        OPENVINO_ASSERT(blob != nullptr, "Tensor was not initialized.");
+#define TYPE_CHECK(TYPE) (dynamic_cast<const ie::TBlob<TYPE>*>(blob.get()) != nullptr)
+        auto host_accesable_implementation = TYPE_CHECK(bool) || TYPE_CHECK(int8_t) || TYPE_CHECK(uint8_t) ||
+                                             TYPE_CHECK(int16_t) || TYPE_CHECK(uint16_t) || TYPE_CHECK(int32_t) ||
+                                             TYPE_CHECK(uint32_t) || TYPE_CHECK(int64_t) || TYPE_CHECK(uint64_t) ||
+                                             TYPE_CHECK(float) || TYPE_CHECK(double);
+#undef TYPE_CHECK
+        OPENVINO_ASSERT(host_accesable_implementation,
+                        "Tensor implementation type dose not contains host accessable data");
+        if (element_type != element::undefined && element_type.is_static()) {
+            OPENVINO_ASSERT(element_type == get_element_type(),
+                            "Tensor data with element type ",
+                            get_element_type(),
+                            ", is not representable as pointer to ",
+                            element_type);
+        }
+        // since we don't use byte offsets, we need to explicitly multiply by element_size
+        auto byte_offset = blob->getTensorDesc().getBlockingDesc().getOffsetPadding() * get_element_type().size();
+        OPENVINO_ASSERT((get_element_type().bitwidth() >= 8) || (byte_offset == 0),
+                        "ROI access for types with bitwidths less then 8 bit is not implemented. Tensor type: ",
+                        get_element_type());
+        return byte_offset + InferenceEngine::as<InferenceEngine::MemoryBlob>(blob)->rmap().as<uint8_t*>();
+    }
+};
+
+/**
+ * @brief Tensor what contains InferenceEngine::RemoteBlob inside
+ * Blob owns the memory
+ */
+class RemoteBlobTensor : public IRemoteTensor {
+    mutable element::Type m_type;
+    mutable Shape m_shape;
+    mutable Strides m_strides;
+    mutable ov::AnyMap m_properties;
+    mutable std::string m_dev_name;
+
+public:
+    std::shared_ptr<ie::RemoteBlob> blob;
+
+    RemoteBlobTensor(const InferenceEngine::RemoteBlob::Ptr& blob) : blob{blob} {
+        OPENVINO_ASSERT(blob);
+        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
+    }
+
+    const element::Type& get_element_type() const override {
+        m_type = InferenceEngine::details::convertPrecision(blob->getTensorDesc().getPrecision());
+        return m_type;
+    }
+
+    void set_shape(ov::Shape shape) override {
+        blob->setShape({shape.begin(), shape.end()});
+    }
+
+    const Shape& get_shape() const override {
+        m_shape = blob->getTensorDesc().getBlockingDesc().getBlockDims();
+        return m_shape;
+    }
+
+    const Strides& get_strides() const override {
+        OPENVINO_ASSERT(get_element_type().bitwidth() >= 8,
+                        "Could not get strides for types with bitwidths less then 8 bit. Tensor type: ",
+                        get_element_type());
+        const auto& element_strides = blob->getTensorDesc().getBlockingDesc().getStrides();
+        const size_t elem_size = get_element_type().size();
+        m_strides.clear();
+        m_strides.resize(element_strides.size());
+        std::transform(element_strides.begin(), element_strides.end(), m_strides.begin(), [&elem_size](size_t stride) {
+            return stride * elem_size;
+        });
+        return m_strides;
+    }
+
+    size_t get_size() const override {
+        return blob->size();
+    }
+
+    size_t get_byte_size() const override {
+        return blob->byteSize();
+    }
+
+    const AnyMap& get_properties() const override {
+        m_properties = blob->getParams();
+        return m_properties;
+    }
+
+    const std::string& get_device_name() const override {
+        m_dev_name = blob->getDeviceName();
+        return m_dev_name;
+    }
+};
+
+/**
+ * @brief Create InferenceEngine::RemoteBlob from the Tensor
+ */
+class TensorRemoteBlob : public ie::RemoteBlob {
+public:
+    TensorRemoteBlob(const std::shared_ptr<ITensor>& tensor)
+        : ie::RemoteBlob{ie::TensorDesc{ie::details::convertPrecision(tensor->get_element_type()),
+                                        tensor->get_shape(),
+                                        ie::TensorDesc::getLayoutByRank(tensor->get_shape().size())}},
+          tensor{std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor)} {
+        OPENVINO_ASSERT(this->tensor);
+    }
+    AnyMap getParams() const override {
+        return tensor->get_properties();
+    }
+    std::string getDeviceName() const noexcept override {
+        try {
+            return tensor->get_device_name();
+        } catch (...) {
+            return {};
+        }
+    }
+    std::shared_ptr<ie::RemoteContext> getContext() const noexcept override {
+        return {};
+    }
+
+    void allocate() noexcept override {}
+    bool deallocate() noexcept override {
+        return true;
+    }
+    ie::LockedMemory<void> buffer() noexcept override {
+        return {nullptr, nullptr, 0};
+    }
+    ie::LockedMemory<const void> cbuffer() const noexcept override {
+        return {nullptr, nullptr, 0};
+    }
+    ie::LockedMemory<void> rwmap() noexcept override {
+        return {nullptr, nullptr, 0};
+    }
+    ie::LockedMemory<const void> rmap() const noexcept override {
+        return {nullptr, nullptr, 0};
+    }
+    ie::LockedMemory<void> wmap() noexcept override {
+        return {nullptr, nullptr, 0};
+    }
+    const std::shared_ptr<ie::IAllocator>& getAllocator() const noexcept override {
+        return m_allocator;
+    }
+    void* getHandle() const noexcept override {
+        return nullptr;
+    }
+
+    std::shared_ptr<IRemoteTensor> tensor;
+
+private:
+    std::shared_ptr<ie::IAllocator> m_allocator;
+};
+
+/**
+ * @brief Create InferenceEngine::TBlob<T> from the tensor
+ *
+ * @tparam T Blob data type
+ */
+template <typename T>
+class TensorMemoryBlob : public ie::TBlob<T> {
+public:
+    ~TensorMemoryBlob() override = default;
+    explicit TensorMemoryBlob(const std::shared_ptr<ITensor>& tensor_) try : ie
+        ::TBlob<T>{[&] {
+                       auto element_type = tensor_->get_element_type();
+                       auto shape = tensor_->get_shape();
+                       ie::SizeVector blk_order(shape.size());
+                       std::iota(blk_order.begin(), blk_order.end(), 0);
+                       ie::SizeVector dim_offset(shape.size(), 0);
+                       ie::SizeVector blk_strides;
+                       auto byte_strides = element_type.bitwidth() >= 8 ? tensor_->get_strides() : Strides{};
+                       if (byte_strides.empty()) {
+                           blk_strides = ov::row_major_strides(shape);
+                       } else {
+                           blk_strides.resize(byte_strides.size());
+                           std::transform(byte_strides.begin(),
+                                          byte_strides.end(),
+                                          blk_strides.begin(),
+                                          [&element_type](size_t byte_stride) {
+                                              OPENVINO_ASSERT(byte_stride % element_type.size() == 0,
+                                                              "Limitation: Stride in bytes ",
+                                                              byte_stride,
+                                                              " should be divisible by size of element ",
+                                                              element_type.size());
+                                              return byte_stride / element_type.size();
+                                          });
+                       }
+                       return ie::TensorDesc{ie::details::convertPrecision(element_type),
+                                             shape,
+                                             ie::BlockingDesc{shape, blk_order, 0, dim_offset, blk_strides}};
+                   }(),
+                   static_cast<T*>(tensor_->data()),
+                   tensor_->get_byte_size()},
+            tensor{tensor_} {
+            OPENVINO_ASSERT(!std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor));
+        }
+    catch (const std::exception& ex) {
+        throw ov::Exception(ex.what());
+    }
+
+    void setShape(const ie::SizeVector& dims) override {
+        tensor->set_shape(dims);
+        ie::TBlob<T>::setShape(dims);
+    }
+
+    std::shared_ptr<ITensor> tensor;
+};
+
+std::shared_ptr<ITensor> make_tensor(const std::shared_ptr<ie::Blob>& blob) {
+#define ELSE_IF(type)                                                                \
+    else if (auto tblob = dynamic_cast<const TensorMemoryBlob<type>*>(blob.get())) { \
+        return tblob->tensor;                                                        \
+    }
+    if (blob == nullptr) {
+        return {};
+    } else if (auto remote_blob = std::dynamic_pointer_cast<TensorRemoteBlob>(blob)) {
+        return remote_blob->tensor;
+    } else if (auto remote_blob = std::dynamic_pointer_cast<InferenceEngine::RemoteBlob>(blob)) {
+        return std::make_shared<RemoteBlobTensor>(remote_blob);
+    }
+    ELSE_IF(float)
+    ELSE_IF(double)
+    ELSE_IF(int8_t)
+    ELSE_IF(int8_t)
+    ELSE_IF(int16_t)
+    ELSE_IF(int32_t)
+    ELSE_IF(int64_t)
+    ELSE_IF(uint8_t)
+    ELSE_IF(uint8_t)
+    ELSE_IF(uint16_t)
+    ELSE_IF(uint32_t)
+    ELSE_IF(uint64_t)
+    ELSE_IF(int8_t)
+    ELSE_IF(bool) else {
+        return std::make_shared<BlobTensor>(blob);
+    }
+#undef IF
+}
+
+ie::Blob::Ptr tensor_to_blob(const std::shared_ptr<ITensor>& tensor) {
+    if (tensor == nullptr) {
+        return {};
+    } else if (auto blob_tensor = std::dynamic_pointer_cast<BlobTensor>(tensor)) {
+        return blob_tensor->blob;
+    } else if (auto blob_tensor = std::dynamic_pointer_cast<RemoteBlobTensor>(tensor)) {
+        return blob_tensor->blob;
+    } else if (auto blob_tensor = dynamic_cast<const BlobTensor*>(tensor.get())) {
+        return blob_tensor->blob;
+    } else if (std::dynamic_pointer_cast<ov::IRemoteTensor>(tensor)) {
+        return std::make_shared<TensorRemoteBlob>(tensor);
+    } else {
+#define CASE(precision, T)   \
+    case element::precision: \
+        return std::make_shared<TensorMemoryBlob<T>>(tensor);
+        switch (tensor->get_element_type()) {
+            CASE(f32, float);
+            CASE(f64, double);
+            CASE(i4, int8_t);
+            CASE(i8, int8_t);
+            CASE(i16, int16_t);
+            CASE(i32, int32_t);
+            CASE(i64, int64_t);
+            CASE(u4, uint8_t);
+            CASE(u8, uint8_t);
+            CASE(u16, uint16_t);
+            CASE(u32, uint32_t);
+            CASE(u64, uint64_t);
+            CASE(u1, int8_t);
+            CASE(boolean, bool);
+        case element::f16:
+            return std::make_shared<TensorMemoryBlob<int16_t>>(tensor);
+        case element::bf16:
+            return std::make_shared<TensorMemoryBlob<int16_t>>(tensor);
+        default:
+            OPENVINO_THROW("Unsupported element type");
+        }
+#undef CASE
+    }
+    OPENVINO_THROW("Cannot convert tensor to blob!");
+}
+}  // namespace ov
--- a/src/inference/src/dev/plugin.cpp
+++ b/src/inference/src/dev/plugin.cpp
@ -93,20 +93,14 @@ ov::SoPtr<ov::ICompiledModel> ov::Plugin::import_model(std::istream& networkMode
 ov::RemoteContext ov::Plugin::create_context(const AnyMap& params) const {
    OV_PLUGIN_CALL_STATEMENT({
        auto remote = m_ptr->create_context(params);
-        auto so = remote._so;
-        if (m_so)
-            so.emplace_back(m_so);
-        return {remote._impl, so};
+        return {remote, {m_so}};
    });
 }

 ov::RemoteContext ov::Plugin::get_default_context(const AnyMap& params) const {
    OV_PLUGIN_CALL_STATEMENT({
        auto remote = m_ptr->get_default_context(params);
-        auto so = remote._so;
-        if (m_so)
-            so.emplace_back(m_so);
-        return {remote._impl, so};
+        return {remote, {m_so}};
    });
 }

--- a/src/inference/src/ie_core.cpp
+++ b/src/inference/src/ie_core.cpp
@ -257,7 +257,7 @@ ExecutableNetwork Core::ImportNetwork(std::istream& networkModel,
    auto parsed = ov::parseDeviceNameIntoConfig(deviceName, config);
    auto exec = _impl->get_plugin(deviceName)
                    .import_model(networkModel,
-                                  ov::RemoteContext{std::dynamic_pointer_cast<RemoteContext>(context), {}},
+                                  ov::RemoteContext{ov::legacy_convert::convert_remote_context(context), {}},
                                  ov::any_copy(parsed._config));
    return {ov::legacy_convert::convert_compiled_model(exec._ptr), exec._so};
 }
--- a/Show More
+++ b/Show More