update system requirements (#1321 )

* update system requirements * update release version in readme
fix build target name in demos for Windows (#1248 )
2020-07-14 20:25:39 +03:00 · 2020-07-07 18:26:50 +03:00 · 2020-06-24 16:28:59 +03:00 · 2020-06-24 15:39:08 +03:00 · 2020-06-24 12:30:25 +03:00 · 2020-06-23 12:19:15 +03:00
248 changed files with 3762 additions and 2497 deletions
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
 # [OpenVINO™ Toolkit](https://01.org/openvinotoolkit) - Deep Learning Deployment Toolkit repository
-[![Stable release](https://img.shields.io/badge/version-2020.3-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2020.3.0)
+[![Stable release](https://img.shields.io/badge/version-2020.4-green.svg)](https://github.com/openvinotoolkit/openvino/releases/tag/2020.4.0)
 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)

 This toolkit allows developers to deploy pre-trained deep learning models 
--- a/build-instruction.md
+++ b/build-instruction.md
@@ -52,14 +52,15 @@ as a part of [Intel® Distribution of OpenVINO™].
 ## Build on Linux\* Systems

 The software was validated on:
+- Ubuntu\* 18.04 (64-bit) with default GCC\* 7.5.0
 - Ubuntu\* 16.04 (64-bit) with default GCC\* 5.4.0
 - CentOS\* 7.4 (64-bit) with default GCC\* 4.8.5

 ### Software Requirements
 - [CMake]\* 3.11 or higher
 - GCC\* 4.8 or higher to build the Inference Engine
- Python 2.7 or higher for Inference Engine Python API wrapper
- (Optional) [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352].
+- Python 3.5 or higher for Inference Engine Python API wrapper
+- (Optional) [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441].

 ### Build Steps
 1. Clone submodules:
@@ -77,7 +78,7 @@ The software was validated on:
   ```
 3. By default, the build enables the Inference Engine GPU plugin to infer models
   on your Intel® Processor Graphics. This requires you to
-   [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352]
+   [Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441]
   before running the build. If you don't want to use the GPU plugin, use the
   `-DENABLE_CLDNN=OFF` CMake build option and skip the installation of the
   Intel® Graphics Compute Runtime for OpenCL™ Driver.
@@ -202,7 +203,7 @@ Native compilation of the Inference Engine is the most straightforward solution.

  This compilation was tested on the following configuration:

-  * Host: Ubuntu\* 16.04 (64-bit, Intel® Core™ i7-6700K CPU @ 4.00GHz × 8)
+  * Host: Ubuntu\* 18.04 (64-bit, Intel® Core™ i7-6700K CPU @ 4.00GHz × 8)
  * Target: Raspbian\* Stretch (32-bit, ARMv7, Raspberry Pi\* 3)

 1. Install Docker\*:
@@ -337,7 +338,7 @@ The software was validated on:
 - [CMake]\*3.11 or higher
 - Microsoft\* Visual Studio 2017, 2019 or [Intel® C++ Compiler] 18.0
 - (Optional) Intel® Graphics Driver for Windows* (26.20) [driver package].
- Python 3.4 or higher for Inference Engine Python API wrapper
+- Python 3.5 or higher for Inference Engine Python API wrapper

 ### Build Steps

@@ -454,7 +455,7 @@ The software was validated on:

 - [CMake]\* 3.11 or higher
 - Clang\* compiler from Xcode\* 10.1 or higher
- Python\* 3.4 or higher for the Inference Engine Python API wrapper
+- Python\* 3.5 or higher for the Inference Engine Python API wrapper

 ### Build Steps

@@ -574,8 +575,7 @@ This section describes how to build Inference Engine for Android x86 (64-bit) op

 ## Use Custom OpenCV Builds for Inference Engine

-> **NOTE**: The recommended and tested version of OpenCV is 4.3. The minimum
-supported version is 3.4.0.
+> **NOTE**: The recommended and tested version of OpenCV is 4.4.0.

 Required versions of OpenCV packages are downloaded automatically during the
 building Inference Engine library. If the build script can not find and download
@@ -691,7 +691,7 @@ This target collects all dependencies, prepares the nGraph package and copies it

 [Intel® Distribution of OpenVINO™]:https://software.intel.com/en-us/openvino-toolkit
 [CMake]:https://cmake.org/download/
-[Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 20.13.16352]:https://github.com/intel/compute-runtime/releases/tag/20.13.16352
+[Install Intel® Graphics Compute Runtime for OpenCL™ Driver package 19.41.14441]:https://github.com/intel/compute-runtime/releases/tag/19.41.14441
 [MKL-DNN repository]:https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_lnx_2019.0.5.20190502.tgz
 [MKL-DNN repository for Windows]:(https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_win_2019.0.5.20190502.zip)
 [OpenBLAS]:https://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int64.zip/download
--- a/cmake/sanitizer.cmake
+++ b/cmake/sanitizer.cmake
@@ -27,8 +27,14 @@ endif()

 if (ENABLE_THREAD_SANITIZER)
    set(SANITIZER_COMPILER_FLAGS "-g -fsanitize=thread -fno-omit-frame-pointer")
-    set(SANITIZER_LINKER_FLAGS "-fsanitize=thread -static-libsan")
-
+    set(SANITIZER_LINKER_FLAGS "-fsanitize=thread")
+    if(CMAKE_CXX_COMPILER_ID MATCHES "^(Apple)?Clang$" AND NOT WIN32)
+        if(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER_EQUAL 8.0)
+            set(SANITIZER_LINKER_FLAGS "${SANITIZER_LINKER_FLAGS} -fuse-ld=lld")
+        else()
+            set(SANITIZER_LINKER_FLAGS "${SANITIZER_LINKER_FLAGS} -static-libsan")
+        endif()
+    endif()
    set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${SANITIZER_COMPILER_FLAGS}")
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${SANITIZER_COMPILER_FLAGS}")
    set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${SANITIZER_LINKER_FLAGS}")
--- a/inference-engine/CMakeLists.txt
+++ b/inference-engine/CMakeLists.txt
@@ -79,7 +79,7 @@ function(ie_build_samples)
                MINGW64 CMAKE_BUILD_TYPE CMAKE_MACOSX_RPATH)
        unset(${var})
    endforeach()
-
+    include(sanitizer)
    add_subdirectory(samples)
 endfunction()

--- a/inference-engine/cmake/vpu_dependencies.cmake
+++ b/inference-engine/cmake/vpu_dependencies.cmake
@@ -19,7 +19,7 @@ set(VPU_SUPPORTED_FIRMWARES usb-ma2450 usb-ma2x8x pcie-ma248x)
 # Default packages
 #

-set(FIRMWARE_PACKAGE_VERSION 1216)
+set(FIRMWARE_PACKAGE_VERSION 1223)
 set(VPU_CLC_MA2X8X_VERSION "movi-cltools-20.02.0")

 #
--- a/inference-engine/ie_bridges/python/requirements.txt
+++ b/inference-engine/ie_bridges/python/requirements.txt
@@ -1,2 +1,2 @@
-numpy
-cython>=0.29
+numpy==1.13.3
+cython==0.29.17
--- a/inference-engine/ie_bridges/python/sample/requirements.txt
+++ b/inference-engine/ie_bridges/python/sample/requirements.txt
@@ -1,2 +1,2 @@
-opencv-python==3.4.4
-numpy==1.18.1
+opencv-python==3.4.4.19
+numpy==1.13.3
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pyx
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pyx
@@ -814,8 +814,8 @@ cdef class ExecutableNetwork:
        current_request = self.requests[0]
        current_request.infer(inputs)
        res = {}
-        for out in current_request._outputs_list:
-            res[out] = deepcopy(current_request.output_blobs[out].buffer)
+        for name, value in current_request.output_blobs.items():
+            res[name] = deepcopy(value.buffer)
        return res


--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.cpp
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.cpp
@@ -229,12 +229,14 @@ void InferenceEnginePython::IENetwork::serialize(const std::string &path_to_xml,

 const std::vector <InferenceEngine::CNNLayerPtr>
 InferenceEnginePython::IENetwork::getLayers() {
+    IE_SUPPRESS_DEPRECATED_START
    std::vector<InferenceEngine::CNNLayerPtr> result;
    std::vector<InferenceEngine::CNNLayerPtr> sorted_layers = InferenceEngine::details::CNNNetSortTopologically(*actual);
    for (const auto &layer : sorted_layers) {
        result.emplace_back(layer);
    }
    return result;
+    IE_SUPPRESS_DEPRECATED_END
 }

 PyObject* InferenceEnginePython::IENetwork::getFunction() {
--- a/inference-engine/ie_bridges/python/src/requirements-dev.txt
+++ b/inference-engine/ie_bridges/python/src/requirements-dev.txt
@@ -1,4 +1,4 @@
-cython==0.29.17
+opencv-python==3.4.4.19
 pytest==4.0.1
 attrs==19.1.0
 pytest-html==1.19.0
--- a/inference-engine/include/cpp/ie_plugin_cpp.hpp
+++ b/inference-engine/include/cpp/ie_plugin_cpp.hpp
@@ -22,12 +22,12 @@
 namespace InferenceEngine {

 /**
- * @deprecated Use InferenceEngine::Core instead. Will be removed in 2020.3
+ * @deprecated Use InferenceEngine::Core instead. Will be removed in 2021.1
 * @brief This class is a C++ API wrapper for IInferencePlugin.
 *
 * It can throw exceptions safely for the application, where it is properly handled.
 */
-class INFERENCE_ENGINE_DEPRECATED("Use InferenceEngine::Core instead. Will be removed in 2020.3") InferencePlugin {
+class INFERENCE_ENGINE_DEPRECATED("Use InferenceEngine::Core instead. Will be removed in 2021.1") InferencePlugin {
    IE_SUPPRESS_DEPRECATED_START
    InferenceEnginePluginPtr actual;

--- a/inference-engine/include/details/ie_cnn_network_iterator.hpp
+++ b/inference-engine/include/details/ie_cnn_network_iterator.hpp
@@ -21,10 +21,10 @@ namespace InferenceEngine {
 namespace details {

 /**
- * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+ * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
 * @brief This class enables range loops for CNNNetwork objects
 */
-class INFERENCE_ENGINE_INTERNAL("Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3")
+class INFERENCE_ENGINE_INTERNAL("Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1")
 CNNNetworkIterator {
    IE_SUPPRESS_DEPRECATED_START

--- a/inference-engine/include/details/ie_cnn_network_tools.h
+++ b/inference-engine/include/details/ie_cnn_network_tools.h
@@ -16,6 +16,7 @@
 namespace InferenceEngine {
 namespace details {

+INFERENCE_ENGINE_INTERNAL("Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1")
 INFERENCE_ENGINE_API_CPP(std::vector<CNNLayerPtr>) CNNNetSortTopologically(const ICNNNetwork& network);

 }  // namespace details
--- a/inference-engine/include/ie_data.h
+++ b/inference-engine/include/ie_data.h
@@ -126,7 +126,7 @@ public:
    const SizeVector& getDims() const;

    /**
-     * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+     * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
     * @brief Returns an owner of this data layer, parent layer in di-graph
     * @return A weak pointer to CNNLayer that creates this data
     */
@@ -147,7 +147,7 @@ public:
    void setName(const std::string& newName);

    /**
-     * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+     * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
     * @brief Privates child layers in di-graph
     * @return A map of child layers
     */
--- a/inference-engine/include/ie_layers.h
+++ b/inference-engine/include/ie_layers.h
@@ -2049,7 +2049,7 @@ public:
 };

 /**
- * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+ * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
 * @brief This class represents a standard ScatterUpdate layer
 */
 class INFERENCE_ENGINE_INTERNAL_CNNLAYER_CLASS(ScatterUpdateLayer): public CNNLayer {
@@ -2063,7 +2063,7 @@ public:
 };

 /**
- * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+ * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
 * @brief This class represents a standard ScatterElementsUpdate layer
 */
 class INFERENCE_ENGINE_INTERNAL_CNNLAYER_CLASS(ScatterElementsUpdateLayer): public CNNLayer {
@@ -2077,7 +2077,7 @@ public:
 };

 /**
- * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2020.3
+ * @deprecated Migrate to IR v10 and work with ngraph::Function directly. The method will be removed in 2021.1
 * @brief This class represents an onnx ExperimentalDetectronPriorGridGenerator Layer
 */
 class INFERENCE_ENGINE_INTERNAL_CNNLAYER_CLASS(ExperimentalDetectronPriorGridGeneratorLayer): public CNNLayer {
--- a/inference-engine/include/vpu/vpu_plugin_config.hpp
+++ b/inference-engine/include/vpu/vpu_plugin_config.hpp
@@ -123,11 +123,13 @@ DECLARE_VPU_CONFIG_VALUE(NDHWC);
 DECLARE_VPU_CONFIG_KEY(CUSTOM_LAYERS);

 /**
+ * @deprecated IR statistic is not available in IR v10. The option will be removed in 2021.1
 * @brief Ignore statistic in IR by plugin.
 * Plugin could use statistic present in IR in order to try to improve calculations precision.
 * If you don't want statistic to be used enable this option.
 * This option should be used with values: CONFIG_VALUE(YES) or CONFIG_VALUE(NO) (default)
 */
+INFERENCE_ENGINE_DEPRECATED("IR statistic is not available in IR v10. The option will be removed in 2021.1")
 DECLARE_VPU_CONFIG_KEY(IGNORE_IR_STATISTIC);

 /**
--- a/inference-engine/samples/ngraph_function_creation_sample/main.cpp
+++ b/inference-engine/samples/ngraph_function_creation_sample/main.cpp
@@ -382,6 +382,9 @@ int main(int argc, char* argv[]) {
                trim(strLine);
                labels.push_back(strLine);
            }
+            inputFile.close();
+        } else {
+            throw std::logic_error("Cannot read label file");
        }

        ClassificationResult classificationResult(outputBlob, images, batchSize, FLAGS_nt, labels);
--- a/inference-engine/src/cldnn_engine/cldnn_engine.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_engine.cpp
@@ -71,8 +71,8 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
 }

 InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneNetwork(const InferenceEngine::ICNNNetwork& network) const {
-    std::shared_ptr<ICNNNetwork> clonedNetwork(nullptr);
-    if (network.getFunction()) {
+    std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
+    if (clonedNetwork->getFunction()) {
        const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
            // DepthToSpace node implementation supports only equal input/output tensors with rank <= 5
            // Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
@@ -84,8 +84,7 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneNetwork(const InferenceEngin
            return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
                   std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node);
        };
-        CNNNetwork net(network.getFunction());
-        auto nGraphFunc = net.getFunction();
+        auto nGraphFunc = clonedNetwork->getFunction();
        // Disable shape inference (WA for generic operations)
        ::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);

@@ -94,9 +93,7 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneNetwork(const InferenceEngin
        ngraph::pass::ConvertOpSet3ToOpSet2(transformations_callback).run_on_function(nGraphFunc);
        ngraph::pass::ConvertOpSet2ToOpSet1(transformations_callback).run_on_function(nGraphFunc);
        ngraph::pass::ConvertOpSet1ToLegacy(transformations_callback).run_on_function(nGraphFunc);
-        clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, network);
-    } else {
-        clonedNetwork = cloneNet(network);
+        clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
    }

    auto implNetwork = std::dynamic_pointer_cast<InferenceEngine::details::CNNNetworkImpl>(clonedNetwork);
--- a/inference-engine/src/cldnn_engine/cldnn_program.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_program.cpp
@@ -3518,10 +3518,29 @@ void Program::AddConstantBlobInput(cldnn::topology& topology, InferenceEngine::C
        return false;
    };

+    // WA to inconsistency between input and const 1d tensors
+    // For Concat along batch we go with batch interpretation
+    // For Gather input we go with batch interpretation
+    bool needsBatchInterpretation = false;
+    if (constDims.size() == 1) {
+        for (auto next : GetNextLayers(layer->outData[0])) {
+            if (LayerTypeFromStr(next->type) == Concatenate) {
+                auto nextConcat = as<InferenceEngine::ConcatLayer*>(next);
+                if (nextConcat->_axis == cldnn::concatenation::concatenation_axis::along_b) {
+                    needsBatchInterpretation = true;
+                    break;
+                }
+            } else if (LayerTypeFromStr(next->type) == Gather) {
+                needsBatchInterpretation = true;
+                break;
+            }
+        }
+    }
+
    // If quantize on weights has per-channel ranges, we have to swap channel and batch dimensions, because
    // quantization should be applied per output channel of weights
    // TODO: Check if it's still needed once LowPrecisionTransformations ready
-    if (inputToConstQuantize(layer)) {
+    if (inputToConstQuantize(layer) || needsBatchInterpretation) {
        constTensor.batch[0] = constTensor.count();
        constTensor.feature[0] = 1;
    }
@@ -3862,11 +3881,13 @@ void Program::CreateStridedSlicePrimitive(cldnn::topology& topology, InferenceEn
    tmp = stridedSliceLayer->GetParamAsUInts("shrink_axis_mask");
    std::vector<uint8_t> shrink_axis_mask(tmp.begin(), tmp.end());

+    auto out_size = CldnnTensorFromIEDims(stridedSliceLayer->outData[0]->getTensorDesc().getDims());
+
    std::string stridedSliceLayerName = layer_type_name_ID(layer);
    auto stridedSlicePrim = cldnn::strided_slice(
            stridedSliceLayerName,
            inputPrimitives[0], inputPrimitives[1], inputPrimitives[2], inputPrimitives[3],
-            begin_mask, end_mask, new_axis_mask, shrink_axis_mask);
+            begin_mask, end_mask, new_axis_mask, shrink_axis_mask, out_size);

    topology.add(stridedSlicePrim);
    AddPrimitiveToProfiler(stridedSliceLayerName, layer);
--- a/inference-engine/src/gna_plugin/backend/am_intel_dnn.cpp
+++ b/inference-engine/src/gna_plugin/backend/am_intel_dnn.cpp
@@ -359,7 +359,7 @@ void GNAPluginNS::backend::AMIntelDNN::InitDeinterleaveComponentPrivate(intel_dn
    comp.operation = kDnnDeinterleaveOp;
    comp.macro_operation = kDnnMacroOpNone;
    comp.orientation_in = kDnnInterleavedOrientation;
-    comp.orientation_out = kDnnNonInterleavedOrientation;
+    comp.orientation_out = kDnnInterleavedOrientation;
    comp.output_scale_factor = output_scale_factor;
    comp.input_scale_factor = output_scale_factor;
    if (!postInitMem) {
@@ -1524,6 +1524,7 @@ void GNAPluginNS::backend::AMIntelDNN::InitGNAStruct(intel_nnet_type_t *ptr_nnet
                        THROW_GNA_EXCEPTION << "Encountered activation component before pooling component at." << i;
                    } else {
                        const auto poolMode = reinterpret_cast<Gna2PoolingMode*>(gnaUserAllocator(sizeof(Gna2PoolingMode)));
+                        IE_ASSERT(poolMode != nullptr);
                        *poolMode = (comp.op.maxpool.do_sum_not_max) ? Gna2PoolingModeSum : Gna2PoolingModeMax;
                        const auto poolWindow = create_shape1D_parameter(comp.op.maxpool.num_inputs);
                        const auto poolStride = create_shape1D_parameter(comp.op.maxpool.num_inputs_step);
@@ -1583,6 +1584,7 @@ void GNAPluginNS::backend::AMIntelDNN::InitGNAStruct(intel_nnet_type_t *ptr_nnet
            case kDnnPiecewiselinearOp:
 #if  GNA_LIB_VER == 2
                {
+                    IE_ASSERT(gnaOperation->Operands != nullptr);
                    auto& outputTensor = const_cast<Gna2Tensor&>(*gnaOperation->Operands[OutOpIdx]);
                    outputTensor.Data = comp.ptr_outputs;
                    outputTensor.Type = Gna2DataTypeFromBytes(comp.num_bytes_per_output);
--- a/inference-engine/src/gna_plugin/backend/dnn_types.h
+++ b/inference-engine/src/gna_plugin/backend/dnn_types.h
@@ -80,7 +80,7 @@ static const char *intel_dnn_softmax_name[kSoftmaxNumType] = {
 };

 typedef enum {
-    kDnnUnknownOrientation,
+    kDnnUnknownOrientation = 100,
    kDnnInterleavedOrientation,
    kDnnNonInterleavedOrientation,
    kDnnNumOrientation
--- a/inference-engine/src/gna_plugin/frontend/scale_factor_calc.hpp
+++ b/inference-engine/src/gna_plugin/frontend/scale_factor_calc.hpp
@@ -199,9 +199,17 @@ class ScaleFactorPerLayer<InferenceEngine::CNNLayer *> {

        if (cnnLayer->type == "Const") {
            auto blob = cnnLayer->blobs["custom"];
-            if (blob->getTensorDesc().getPrecision() == InferenceEngine::Precision::FP16) {
+            auto blob_precision = blob->getTensorDesc().getPrecision();
+
+            if (blob_precision != InferenceEngine::Precision::FP32 && blob_precision != InferenceEngine::Precision::FP16) {
+                quant->_dst_quant.scale = 1.0f;
+                return true;
+            }
+
+            if (blob_precision == InferenceEngine::Precision::FP16) {
                blob = make_fp32_blob(blob);
            }
+
            auto max_val = std::numeric_limits<float>::min();
            auto min_val = std::numeric_limits<float>::max();

--- a/inference-engine/src/gna_plugin/gna2_model_debug_log.cpp
+++ b/inference-engine/src/gna_plugin/gna2_model_debug_log.cpp
@@ -9,6 +9,7 @@
 #if GNA_LIB_VER == 2
 #include "gna2_model_debug_log.hpp"
 #include "gna2-model-api.h"
+#include <details/ie_exception.hpp>

 #include <cstdint>
 #include <fstream>
@@ -52,6 +53,7 @@ template <class T>
 bool NextElement(T & elementIndex, const Gna2Shape& total) {
    if (total.NumberOfDimensions == 0) return false;
    auto idx = total.NumberOfDimensions - 1;
+    IE_ASSERT(idx < GNA2_SHAPE_MAXIMUM_NUMBER_OF_DIMENSIONS);
    while (elementIndex[idx] + 1 >= total.Dimensions[idx] && idx > 0) {
        idx--;
    }
--- a/inference-engine/src/gna_plugin/gna2_model_helper.cpp
+++ b/inference-engine/src/gna_plugin/gna2_model_helper.cpp
@@ -60,6 +60,7 @@ Gna2Tensor HelperGna2TensorInit3D(uint32_t x, uint32_t y, uint32_t z, Gna2DataTy

 Gna2Tensor * createGna2Tensor1D(uint32_t x, uint32_t byteSize, void* data) {
    const auto input = reinterpret_cast<Gna2Tensor*>(gnaUserAllocator(sizeof(Gna2Tensor)));
+    IE_ASSERT(input != nullptr);
    *input = HelperGna2TensorInit1D(x, Gna2DataTypeFromBytes(byteSize), data);
    return input;
 }
@@ -74,6 +75,7 @@ Gna2Tensor * createGna2TensorPwl(uint32_t x, void* data) {

 Gna2Tensor * createGna2BiasTensor1D(uint32_t x, uint32_t byteSize, void* data) {
    const auto input = reinterpret_cast<Gna2Tensor*>(gnaUserAllocator(sizeof(Gna2Tensor)));
+    IE_ASSERT(input != nullptr);
    if (byteSize == 8) {
        *input = HelperGna2TensorInit1D(x, Gna2DataTypeCompoundBias, data);
    } else {
@@ -84,24 +86,28 @@ Gna2Tensor * createGna2BiasTensor1D(uint32_t x, uint32_t byteSize, void* data) {

 Gna2Tensor * createGna2Tensor2D(uint32_t x, uint32_t y, uint32_t byteSize, void* data) {
    const auto input = reinterpret_cast<Gna2Tensor*>(gnaUserAllocator(sizeof(Gna2Tensor)));
+    IE_ASSERT(input != nullptr);
    *input = HelperGna2TensorInit2D(x, y, Gna2DataTypeFromBytes(byteSize), data);
    return input;
 }

 Gna2Tensor * createGna2Tensor3D(uint32_t x, uint32_t y, uint32_t z, uint32_t byteSize, void* data) {
    const auto input = reinterpret_cast<Gna2Tensor*>(gnaUserAllocator(sizeof(Gna2Tensor)));
+    IE_ASSERT(input != nullptr);
    *input = HelperGna2TensorInit3D(x, y, z, Gna2DataTypeFromBytes(byteSize), data);
    return input;
 }

 uint32_t* create_uint32_parameter(uint32_t value) {
    const auto param = reinterpret_cast<uint32_t*>(gnaUserAllocator(sizeof(uint32_t)));
+    IE_ASSERT(param != nullptr);
    *param = value;
    return param;
 }

 Gna2Shape* create_shape1D_parameter(uint32_t x) {
    const auto shp = reinterpret_cast<Gna2Shape*>(gnaUserAllocator(sizeof(Gna2Shape)));
+    IE_ASSERT(shp != nullptr);
    shp->NumberOfDimensions = 1;
    shp->Dimensions[0] = x;
    return shp;
--- a/inference-engine/src/gna_plugin/gna_device.cpp
+++ b/inference-engine/src/gna_plugin/gna_device.cpp
@@ -25,7 +25,7 @@
 #include "gna_plugin_log.hpp"

 uint8_t* GNADeviceHelper::alloc(uint32_t size_requested, uint32_t *size_granted) {
-    void * memPtr;
+    void * memPtr = nullptr;
 #if GNA_LIB_VER == 1
    memPtr = GNAAlloc(nGNAHandle, size_requested, size_granted);
 #else
--- a/inference-engine/src/gna_plugin/gna_graph_compiler.cpp
+++ b/inference-engine/src/gna_plugin/gna_graph_compiler.cpp
@@ -337,6 +337,7 @@ void GNAGraphCompiler::ConvolutionPrimitive(InferenceEngine::CNNLayerPtr layer)
 void GNAGraphCompiler::PowerPrimitive(InferenceEngine::CNNLayerPtr layer) {
    auto& power = dynamic_cast<PowerLayer&>(*layer.get());
    auto quantized = InferenceEngine::getInjectedData<QuantizedLayerParams>(layer);
+    IE_ASSERT(gnaFlags->sw_fp32 ? (quantized == nullptr) : (quantized != nullptr));

    if (power.power != 1.0) {
        THROW_IE_EXCEPTION << "[GNA plugin] unsupported power factor, expected 1 but was " << power.power;
@@ -386,29 +387,14 @@ void GNAGraphCompiler::PowerPrimitive(InferenceEngine::CNNLayerPtr layer) {

    if (gnaFlags->sw_fp32) {
        gnamem->readonly().push_value(ptr_weights, power.scale, num_rows_out, 64);
-        gnamem->readonly().push_value(ptr_biases, power.scale, num_rows_out, 64);
+        gnamem->readonly().push_value(ptr_biases, power.offset, num_rows_out, 64);
    } else {
-        auto weightsScaledIdentity = power.scale;
-        auto biasesScaledIdentity = power.scale;
-        if (quantized != nullptr) {
-            weightsScaledIdentity = quantized->_weights_quant.scale * weightsScaledIdentity;
-            biasesScaledIdentity = quantized->_bias_quant.scale * biasesScaledIdentity;
-        }
-
-        auto weightQuantizedIdentity = FLOAT_TO_INT16(std::min(weightsScaledIdentity, static_cast<float>(INT16_MAX)));
-        auto biasesQuantizedIdentity = FLOAT_TO_INT16(std::min(biasesScaledIdentity, static_cast<float>(INT16_MAX)));
-        gnamem->readonly().push_value<int16_t>(ptr_weights, weightQuantizedIdentity, num_rows_out, 64);
-        gnamem->readonly().push_value<int32_t>(ptr_biases, biasesQuantizedIdentity, num_rows_out, 64);
-    }
-
-    if (power.offset != 0.0f) {
-        if (quantized == nullptr) {
-            gnamem->readonly().push_value(ptr_biases, 0.0f, num_rows_out, 64);
-        } else {
-            gnamem->readonly().push_value<int32_t>(ptr_biases, 0, num_rows_out, 64);
-        }
-    } else {
-        gnamem->readonly().push_value(ptr_biases, 0.0f, num_rows_out, 64);
+        auto quantizedScale = FLOAT_TO_INT16(std::min(quantized->_weights_quant.scale * power.scale,
+                                                      static_cast<float>(INT16_MAX)));
+        auto quantizedOffset = FLOAT_TO_INT32(std::min(quantized->_dst_quant.scale * power.offset,
+                                                       static_cast<float>(INT32_MAX)));
+        gnamem->readonly().push_value<int16_t>(ptr_weights, quantizedScale, num_rows_out, 64);
+        gnamem->readonly().push_value<int32_t>(ptr_biases, quantizedOffset, num_rows_out, 64);
    }
 }

@@ -1417,6 +1403,7 @@ void GNAGraphCompiler::PermutePrimitive(InferenceEngine::CNNLayerPtr layer) {
    }
    auto layerOrder = layer->GetParamAsInts("order");
    auto quantized = InferenceEngine::getInjectedData<QuantizedLayerParams>(layer);
+    IE_ASSERT(!layer->insData.empty());
    auto inputs = layer->insData.begin()->lock();
    auto inputsOrder = inputs->getTensorDesc().getDims();
    auto outputs = layer->outData.front();
--- a/inference-engine/src/gna_plugin/gna_graph_tools.hpp
+++ b/inference-engine/src/gna_plugin/gna_graph_tools.hpp
@@ -176,6 +176,63 @@ inline std::pair<InferenceEngine::CNNLayerPtr, int>  CNNNetCheckNextLayerSkipCer
    return CNNNetCheckNextLayerSkipCertain(outLayer->second, 0, 0, bOnlyCheck, shouldSkip);
 }

+/**
+ * @brief return all layers reachable from given one
+ * @param layer
+ * @param oDataIdx - -1 means iterate over all odata indexes
+ * @param shouldSkip
+ * @return
+ */
+    template <class Layer>
+    inline std::vector<CNNLayerPtr> CNNNetGetAllNextLayersSkipCertain(Layer layer, int oDataIdx, const std::function<bool(CNNLayerPtr)> &shouldSkip)  {
+        // TODO: need to have generic function that creates slice of the graph : starting from given layer
+        //  and skipped all non functional - ending up into functional one
+
+        std::list<CNNLayerPtr> currentSet;
+        std::vector<CNNLayerPtr> resultSet;
+
+        std::vector<std::map<std::string, CNNLayerPtr>> start;
+        if (oDataIdx == -1) {
+            for (int i = 0; i != layer->outData.size(); i++) {
+                start.push_back(layer->outData[i]->getInputTo());
+            }
+        } else {
+            start.push_back(layer->outData[oDataIdx]->getInputTo());
+        }
+
+        auto separate_layers = [&currentSet, &resultSet, &shouldSkip](std::map<std::string, CNNLayerPtr>& inputTo) {
+            for (auto &&bfsLayer : inputTo) {
+                if (shouldSkip(bfsLayer.second)) {
+                    currentSet.push_back(bfsLayer.second);
+                    continue;
+                }
+                resultSet.push_back(bfsLayer.second);
+            }
+        };
+
+        int startIdx, endIdx;
+        if (oDataIdx == -1) {
+            startIdx = 0;
+            endIdx = layer->outData.size();
+        } else {
+            startIdx = oDataIdx;
+            endIdx = oDataIdx + 1;
+        }
+
+        for (int i = startIdx; i != endIdx; i++) {
+            separate_layers(layer->outData[i]->getInputTo());
+        }
+
+        while (!currentSet.empty()) {
+            auto currentLayer = currentSet.front();
+            currentSet.pop_front();
+            for (auto && oData : currentLayer->outData) {
+                separate_layers(oData->getInputTo());
+            }
+        }
+        return resultSet;
+    }
+
 /// @brief alias for strict checkNextLayer (false)
 template <class Layer>
 inline std::pair<InferenceEngine::CNNLayerPtr, int>  CNNNetGetNextLayerSkipCertain(Layer layer, int oidx, int iidx,
@@ -474,7 +531,31 @@ inline void CNNNetworkInsertLayer(CNNLayerPtr after,
 }

 /**
- * @brief remove givven layer from topology, currently only layers with one input data and one output data supported
+ * @brief returns previous layers and outData index for it
+ * @tparam T
+ * @param origin
+ * @param acceptanceCriteria
+ * @param idx
+ */
+template <class T>
+std::vector<std::pair<CNNLayerPtr, int> > CNNNetGetPrevLayersSkip(CNNLayerPtr origin, const T &acceptanceCriteria, int idx = -1) {
+    std::vector<std::pair<CNNLayerPtr, int> > prevLayers;
+    for (int i = idx == -1 ? 0 : idx; CNNNetHasPrevLayer(origin.get(), i) && (idx == -1 || i == idx); i++) {
+        auto prevLayer = CNNNetPrevLayer(origin, i);
+        if (acceptanceCriteria(prevLayer)) {
+            prevLayers.push_back({prevLayer, CNNLayerFindOutDataIdx(origin, i)});
+        } else {
+            // if for some input we need to look in upper layers - original index not used here intentionally
+            auto prevPrevLayers = CNNNetGetPrevLayersSkip(prevLayer, acceptanceCriteria);
+            prevLayers.insert(prevLayers.end(), prevPrevLayers.begin(), prevPrevLayers.end());
+        }
+    }
+
+    return prevLayers;
+}
+
+/**
+ * @brief remove given layer from topology, currently only layers with one input data and one output data supported
 */
 inline void CNNNetworkRemoveLayer(CNNLayerPtr layer) {
    if (!layer) {
--- a/inference-engine/src/gna_plugin/gna_model_serial.cpp
+++ b/inference-engine/src/gna_plugin/gna_model_serial.cpp
@@ -8,6 +8,9 @@
 #include <ios>
 #include <iomanip>
 #include <map>
+#include <ie_algorithm.hpp>
+#include <ie_common.h>
+#include <ie_precision.hpp>

 #if defined __INTEL_COMPILER || defined _MSC_VER
 #include <malloc.h>
@@ -119,15 +122,26 @@ const std::map<Gna2OperationType, std::vector<uint32_t>> GnaParamSize{
        sizeof(Gna2Shape),
        sizeof(Gna2Shape)}},
    {Gna2OperationTypeCopy, {sizeof(Gna2Shape)}},
+    {Gna2OperationTypeTransposition, {sizeof(Gna2Shape)}},
 };

-void GNAModelSerial::Import(void *basePointer, size_t gnaGraphSize, std::istream & is) {
+void GNAModelSerial::Import(void *basePointer,
+        size_t gnaGraphSize,
+        std::istream & is,
+        std::shared_ptr<GNAPluginNS::InputDesc> inputsDesc,
+        std::vector<GNAPluginNS::OutputDesc> &desc,
+        InferenceEngine::InputsDataMap& inputsDataMap,
+        InferenceEngine::OutputsDataMap& outputsDataMap) {
    is.exceptions(std::istream::failbit);

+    ImportInputs(is, basePointer, inputsDesc, inputsDataMap);
+    ImportOutputs(is, basePointer, desc, outputsDataMap);
+
    for (auto operation = gna2Model->Operations; operation != gna2Model->Operations + gna2Model->NumberOfOperations; ++operation) {
        readNBits<32>(operation->Type, is);
        readBits(operation->NumberOfOperands, is);
        operation->Operands = static_cast<Gna2Tensor const **>(gnaUserAllocator(sizeof(Gna2Tensor*) * operation->NumberOfOperands));
+        IE_ASSERT(operation->Operands != nullptr);
        for (uint32_t i = 0; i < operation->NumberOfOperands; i++) {
            Gna2Tensor t{};
            readBits(t, is);
@@ -145,11 +159,10 @@ void GNAModelSerial::Import(void *basePointer, size_t gnaGraphSize, std::istream
        case Gna2OperationTypeFullyConnectedAffine:
        case Gna2OperationTypeConvolution:
        case Gna2OperationTypeCopy:
+        case Gna2OperationTypeTransposition:
            break;
        case Gna2OperationTypeRecurrent:
            THROW_GNA_EXCEPTION << "Importing of recurrent operation not supported";
-        case Gna2OperationTypeTransposition:
-            THROW_GNA_EXCEPTION << "Importing of transposition operation not supported";
        default:
            THROW_GNA_EXCEPTION << "Importing of unknown GNA operation type(" << operation->Type << ")  not supported";
        }
@@ -158,8 +171,9 @@ void GNAModelSerial::Import(void *basePointer, size_t gnaGraphSize, std::istream
        else
            operation->Parameters = nullptr;
        for (uint32_t i = 0; i < operation->NumberOfParameters; i++) {
-            uint32_t paramSize;
+            uint32_t paramSize = 0;
            readBits(paramSize, is);
+            IE_ASSERT(operation->Parameters != nullptr);
            if (paramSize == 0) {
                operation->Parameters[i] = nullptr;
                continue;
@@ -235,11 +249,12 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
    };

    auto convert_to_serial = [getOffsetFromBase](const GNAModelSerial::RuntimeEndPoint& ep) {
-        ModelHeader::EndPoint out;
+        RuntimeEndPoint out;
        out.elements_count = ep.elements_count;
        out.descriptor_offset = offsetFromBase(ep.descriptor_ptr);
        out.scaleFactor = ep.scaleFactor;
        out.element_size = ep.element_size;
+        out.orientation = ep.orientation;
        return out;
    };
    /**
@@ -256,15 +271,21 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
    header.gnaMemSize = gnaGraphSize;
    header.layersCount = layers.size();
    header.nGroup = guessGrouping(*gna2Model);
-    header.input = convert_to_serial(input);
-    header.output = convert_to_serial(output);
-
+    header.nInputs = inputs.size();
+    header.nOutputs = outputs.size();
    header.nRotateRows = nRotateRows;
    header.nRotateColumns = nRotateColumns;


    writeBits(header, os);

+    for (const auto &input : inputs) {
+        writeBits(convert_to_serial(input), os);
+    }
+    for (const auto &output : outputs) {
+        writeBits(convert_to_serial(output), os);
+    }
+
    for (const auto & layer : layers) {
        writeBits(static_cast<uint32_t>(layer.Type), os);
        writeBits(layer.NumberOfOperands, os);
@@ -284,11 +305,10 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
        case Gna2OperationTypeFullyConnectedAffine:
        case Gna2OperationTypeConvolution:
        case Gna2OperationTypeCopy:
+        case Gna2OperationTypeTransposition:
            break;
        case Gna2OperationTypeRecurrent:
            THROW_GNA_EXCEPTION << "Exporting of recurrent operation not supported";
-        case Gna2OperationTypeTransposition:
-            THROW_GNA_EXCEPTION << "Exporting of interleave operation not supported";
        default:
            THROW_GNA_EXCEPTION << "Exporting of unknown GNA operation type(" << layer.Type << ")  not supported";
        }
@@ -314,9 +334,18 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
 }
 #else

-void GNAModelSerial::Import(void *basePointer, size_t gnaGraphSize, std::istream & is) {
+void GNAModelSerial::Import(void *basePointer,
+        size_t gnaGraphSize,
+        std::istream & is,
+        std::shared_ptr<GNAPluginNS::InputDesc> inputsDesc,
+        std::vector<GNAPluginNS::OutputDesc> &desc,
+        InferenceEngine::InputsDataMap& inputsDataMap,
+        InferenceEngine::OutputsDataMap& outputsDataMap) {
    is.exceptions(std::istream::failbit);

+    ImportInputs(is, basePointer, inputsDesc, inputsDataMap);
+    ImportOutputs(is, basePointer, desc, outputsDataMap);
+
    auto readPwl = [&is, basePointer](intel_pwl_func_t & value) {
        readBits(value.nSegments, is);
        if (value.nSegments != 0) {
@@ -466,11 +495,12 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
    };

    auto convert_to_serial = [getOffsetFromBase](const GNAModelSerial::RuntimeEndPoint& ep){
-        ModelHeader::EndPoint out;
+        RuntimeEndPoint out;
        out.elements_count = ep.elements_count;
        out.element_size = ep.element_size;
        out.descriptor_offset = offsetFromBase(ep.descriptor_ptr);
        out.scaleFactor = ep.scaleFactor;
+        out.orientation = ep.orientation;
        return out;
    };
    /**
@@ -486,14 +516,16 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
    header.gnaMemSize = gnaGraphSize;
    header.layersCount = layers.size();
    header.nGroup = ptr_nnet->nGroup;
-    header.input  = convert_to_serial(input);
-    header.output = convert_to_serial(output);
+    header.nInputs = 1;
+    header.nOutputs = 1;
    header.headerSize = sizeof(ModelHeader);
    header.nRotateRows = nRotateRows;
    header.nRotateColumns = nRotateColumns;


    writeBits(header, os);
+    writeBits(convert_to_serial(inputs[0]), os);
+    writeBits(convert_to_serial(outputs[0]), os);

    for (auto & layer : layers) {
        writeBits(layer.nInputColumns, os);
@@ -572,3 +604,108 @@ void GNAModelSerial::Export(void * basePointer, size_t gnaGraphSize, std::ostrea
 }

 #endif
+
+std::vector<GNAModelSerial::RuntimeEndPoint> GNAModelSerial::serializeOutputs(const InferenceEngine::OutputsDataMap& outputsDataMap,
+        const std::vector<GNAPluginNS::OutputDesc>& outputsDesc) {
+    std::vector<GNAModelSerial::RuntimeEndPoint> endPoints;
+    std::size_t outputIndex = 0;
+    for (auto const &output : outputsDataMap) {
+        auto outputName = output.first;
+        auto inputDims = output.second->getTensorDesc().getDims();
+        uint32_t elementsCount = static_cast<uint32_t>(InferenceEngine::details::product(inputDims.begin(), inputDims.end()));
+
+        GNAModelSerial::RuntimeEndPoint endPoint(outputsDesc[outputIndex].scale_factor,
+                                                 outputsDesc[outputIndex].ptrs[0],
+                                                 outputsDesc[outputIndex].num_bytes_per_element,
+                                                 elementsCount,
+                                                 outputsDesc[outputIndex].orientation);
+        endPoints.push_back(endPoint);
+        outputIndex++;
+    }
+    return endPoints;
+}
+
+std::vector<GNAModelSerial::RuntimeEndPoint> GNAModelSerial::serializeInputs(const InferenceEngine::InputsDataMap& inputsDataMap,
+                                                                             std::shared_ptr<GNAPluginNS::InputDesc> inputDesc) {
+    std::vector<GNAModelSerial::RuntimeEndPoint> endPoints;
+
+    std::size_t inputIndex = 0;
+    for (auto const& input : inputsDataMap) {
+        auto inputName = input.first;
+        auto inputDims = input.second->getTensorDesc().getDims();
+
+        double scaleFactor = inputDesc->getScaleFactor(inputIndex);
+        std::vector<void *> descriptor_ptr = inputDesc->getPtrInputsGlobal(inputName);
+        IE_ASSERT(descriptor_ptr.size() > 0);
+        uint32_t element_size = 2u;
+        uint32_t elementsCount = static_cast<uint32_t>(InferenceEngine::details::product(inputDims.begin(), inputDims.end()));
+        intel_dnn_orientation_t orientation = inputDesc->getOrientation(inputName);
+
+        GNAModelSerial::RuntimeEndPoint endPoint(scaleFactor,
+                                                 descriptor_ptr[0],
+                                                 element_size,
+                                                 elementsCount,
+                                                 orientation);
+        endPoints.push_back(endPoint);
+        inputIndex++;
+    }
+    return endPoints;
+}
+
+void GNAModelSerial::ImportInputs(std::istream &is,
+        void* basePtr,
+        std::shared_ptr<GNAPluginNS::InputDesc> inputsDesc,
+        InferenceEngine::InputsDataMap& dataMap) {
+    dataMap.clear();
+
+    for (auto inputIndex = 0; inputIndex < modelHeader.nInputs; inputIndex++) {
+        std::string name = "input" + std::to_string(inputIndex);
+        RuntimeEndPoint input;
+        is.read(reinterpret_cast<char *>(&input), sizeof(input));
+        inputsDesc->getPtrInputsGlobal(name).push_back(reinterpret_cast<float*>(reinterpret_cast<uint8_t *> (basePtr) + input.descriptor_offset));
+        inputsDesc->orientation_in[name] = input.orientation;
+
+        auto inputDims = InferenceEngine::SizeVector({modelHeader.nGroup, input.elements_count / modelHeader.nGroup});
+
+        dataMap[name] = std::make_shared<InferenceEngine::InputInfo>();
+        dataMap[name]->setInputData(std::make_shared<InferenceEngine::Data>(name,
+                                                            InferenceEngine::TensorDesc(
+                                                                    InferenceEngine::Precision::FP32,
+                                                                    inputDims,
+                                                                    InferenceEngine::Layout::NC)));
+        inputsDesc->inputScaleFactors.push_back(input.scaleFactor);
+    }
+}
+
+void GNAModelSerial::ImportOutputs(std::istream &is,
+        void* basePtr,
+        std::vector<GNAPluginNS::OutputDesc> &desc,
+        InferenceEngine::OutputsDataMap& dataMap) {
+    desc.clear();
+    dataMap.clear();
+    desc.resize(modelHeader.nOutputs);
+
+    for (auto outputIndex = 0; outputIndex < modelHeader.nOutputs; outputIndex++) {
+        std::string name = "output" + std::to_string(outputIndex);
+        RuntimeEndPoint output;
+        is.read(reinterpret_cast<char *>(&output), sizeof(output));
+        GNAPluginNS::OutputDesc description;
+        description.ptrs.push_back(reinterpret_cast<float*>(reinterpret_cast<uint8_t *> (basePtr) + output.descriptor_offset));
+        description.orientation = kDnnInterleavedOrientation;
+        description.orientation = output.orientation;
+        description.num_bytes_per_element = output.element_size;
+        description.scale_factor = output.scaleFactor;
+
+        auto outputDims = InferenceEngine::SizeVector({modelHeader.nGroup, output.elements_count / modelHeader.nGroup});
+        dataMap[name] = std::make_shared<InferenceEngine::Data>(name,
+                                                 InferenceEngine::TensorDesc(
+                                                         InferenceEngine::Precision::FP32,
+                                                         outputDims,
+                                                         InferenceEngine::Layout::NC));
+        desc.at(outputIndex) = description;
+    }
+}
+
+void GNAModelSerial::setHeader(ModelHeader header) {
+    modelHeader = header;
+}
--- a/inference-engine/src/gna_plugin/gna_model_serial.hpp
+++ b/inference-engine/src/gna_plugin/gna_model_serial.hpp
@@ -7,7 +7,10 @@
 #include <istream>
 #include <vector>
 #include <utility>
-#include "gna-api.h"
+
+#include <gna-api.h>
+#include "descriptions/gna_input_desc.hpp"
+#include "descriptions/gna_output_desc.hpp"
 #include "gna_plugin_log.hpp"
 #if GNA_LIB_VER == 2
 #include "gna2-model-api.h"
@@ -20,18 +23,19 @@
 * 1.0 - basic support
 * 1.1 - added memory information
 * 2.0 - for use with GNA2 library
+ * 2.1 - multiple i/o support
 */
 #if GNA_LIB_VER == 2
 #define HEADER_MAJOR 2
-#define HEADER_MINOR 0
+#define HEADER_MINOR 1
 #else
 #define HEADER_MAJOR 1
-#define HEADER_MINOR 1
+#define HEADER_MINOR 2
 #endif


 /**
- * @brief Header version 1.0
+ * @brief Header version 2.1
 */
 struct ModelHeader {
    /**
@@ -74,27 +78,8 @@ struct ModelHeader {
    uint32_t nRotateRows = 0u;
    uint32_t nRotateColumns = 0u;

-
-    struct EndPoint {
-        /**
-         * if scale factor is different then pased into infer , network might need to be requantized
-         */
-        float scaleFactor = 0.f;
-        /**
-         * Offset in bytes of pointer descriptor
-         */
-        uint64_t descriptor_offset = 0ull;
-        /**
-         * Endpoint resolution in bytes.
-         */
-        uint32_t element_size = 0u;
-        /**
-         * Number of elements
-         */
-        uint32_t elements_count = 0u;
-    };
-    EndPoint input;
-    EndPoint output;
+    uint32_t nInputs = 0u;
+    uint32_t nOutputs = 0u;

    /**
     * Reserved Data might be here
@@ -127,15 +112,23 @@ class GNAModelSerial {
         * Number of elements
         */
        uint32_t elements_count = 0;
+        /**
+         * Offset in bytes of pointer descriptor
+        */
+        uint64_t descriptor_offset = 0ull;
+
+        intel_dnn_orientation_t orientation = kDnnUnknownOrientation;

        RuntimeEndPoint() = default;
        RuntimeEndPoint(double scaleFactor,
                    void* descriptor_ptr,
                    uint32_t element_size,
-                    uint32_t elements_count) : scaleFactor(scaleFactor),
+                    uint32_t elements_count,
+                    intel_dnn_orientation_t orientation) : scaleFactor(scaleFactor),
                                    descriptor_ptr(descriptor_ptr),
                                    element_size(element_size),
-                                    elements_count(elements_count) {
+                                    elements_count(elements_count),
+                                    orientation(orientation) {
        }
    };
    using MemoryType = std::vector<std::pair<void*, uint32_t>>;
@@ -146,11 +139,23 @@ private:
 #else
    intel_nnet_type_t *ptr_nnet;
 #endif
-    RuntimeEndPoint input, output;
+    std::vector<RuntimeEndPoint> inputs;
+    std::vector<RuntimeEndPoint> outputs;
    uint32_t nRotateRows = 0;
    uint32_t nRotateColumns = 0;

    MemoryType states, *pstates = nullptr;
+    ModelHeader modelHeader;
+
+    void ImportInputs(std::istream &is,
+            void* basePtr,
+            std::shared_ptr<GNAPluginNS::InputDesc> inputsDesc,
+            InferenceEngine::InputsDataMap& dataMap);
+
+    void ImportOutputs(std::istream &is,
+            void* basePtr,
+            std::vector<GNAPluginNS::OutputDesc> &desc,
+            InferenceEngine::OutputsDataMap& dataMap);

 public:
 #if GNA_LIB_VER == 2
@@ -160,8 +165,12 @@ private:

    GNAModelSerial(
        Gna2Model * model,
-        RuntimeEndPoint input,
-        RuntimeEndPoint output) : gna2Model(model), input(input), output(output) {
+        const std::shared_ptr<GNAPluginNS::InputDesc> inputDesc,
+        const std::vector<GNAPluginNS::OutputDesc>& outputsDesc,
+        const InferenceEngine::InputsDataMap& inputsDataMap,
+        const InferenceEngine::OutputsDataMap& outputsDataMap) : gna2Model(model),
+            inputs(serializeInputs(inputsDataMap, inputDesc)),
+            outputs(serializeOutputs(outputsDataMap, outputsDesc)) {
    }

 #else
@@ -183,8 +192,12 @@ private:
      */
     GNAModelSerial(
         intel_nnet_type_t *ptr_nnet,
-         RuntimeEndPoint input,
-         RuntimeEndPoint output) : ptr_nnet(ptr_nnet), input(input), output(output) {
+         const std::shared_ptr<GNAPluginNS::InputDesc> inputDesc,
+         const std::vector<GNAPluginNS::OutputDesc>& outputsDesc,
+         const InferenceEngine::InputsDataMap& inputsDataMap,
+         const InferenceEngine::OutputsDataMap& outputsDataMap) : ptr_nnet(ptr_nnet),
+                                                                  inputs(serializeInputs(inputsDataMap, inputDesc)),
+                                                                  outputs(serializeOutputs(outputsDataMap, outputsDesc)) {
     }
 #endif

@@ -219,7 +232,13 @@ private:
     * @param basePointer
     * @param is - stream without header structure - TBD heder might be needed
     */
-    void Import(void *basePointer, size_t gnaGraphSize, std::istream &is);
+    void Import(void *basePointer,
+                                size_t gnaGraphSize,
+                                std::istream & is,
+                                std::shared_ptr<GNAPluginNS::InputDesc> inputsDesc,
+                                std::vector<GNAPluginNS::OutputDesc> &desc,
+                                InferenceEngine::InputsDataMap& inputsDataMap,
+                                InferenceEngine::OutputsDataMap& outputsDataMap);

    /**
     * save gna graph to an outpus stream
@@ -231,4 +250,13 @@ private:
    void Export(void *basePtr,
                size_t gnaGraphSize,
                std::ostream &os) const;
+
+    static std::vector<GNAModelSerial::RuntimeEndPoint> serializeOutputs(const InferenceEngine::OutputsDataMap& outputsDataMap,
+            const std::vector<GNAPluginNS::OutputDesc>& outputsDesc);
+
+
+    static std::vector<GNAModelSerial::RuntimeEndPoint> serializeInputs(const InferenceEngine::InputsDataMap& inputsDataMap,
+                                                                        const std::shared_ptr<GNAPluginNS::InputDesc>);
+
+    void setHeader(ModelHeader header);
 };
--- a/inference-engine/src/gna_plugin/gna_plugin.cpp
+++ b/inference-engine/src/gna_plugin/gna_plugin.cpp
@@ -373,6 +373,7 @@ void GNAPlugin::LoadNetwork(ICNNNetwork &network) {
        passes->registerPass<InsertDiagonalLayerPass>();
        passes->registerPass<HandleMultipleActivationsForTheLayerPass>();
        passes->registerPass<SubstituteScaleShiftBroadCastPass>();
+        passes->registerPass<FuseMultipleIdentitiesPass>();
        passIdx = passes->run(passIdx);
    };

@@ -1140,13 +1141,15 @@ InferenceEngine::IExecutableNetwork::Ptr GNAPlugin::ImportNetwork(const std::str
 #else
    auto serial = GNAModelSerial(&std::get<0>(nnets.back())->obj, mt);
 #endif
-    serial.Import(basePtr, header.gnaMemSize, inputStream);

-    inputsDesc->getPtrInputsGlobal("input").push_back(reinterpret_cast<float*>(reinterpret_cast<uint8_t *> (basePtr) + header.input.descriptor_offset));
-    // TODO: import of multioutput network not supported
-    outputsDesc.resize(1);
-    auto &outputDesc = outputsDesc.front();
-    outputDesc.ptrs.push_back(reinterpret_cast<float*>(reinterpret_cast<uint8_t *> (basePtr) + header.output.descriptor_offset));
+    serial.setHeader(header);
+    serial.Import(basePtr,
+            header.gnaMemSize,
+            inputStream,
+            inputsDesc,
+            outputsDesc,
+            inputsDataMap,
+            outputsDataMap);

 #if GNA_LIB_VER == 2
    auto getOrientation = [](Gna2Operation & gnaOperation) {
@@ -1160,32 +1163,10 @@ InferenceEngine::IExecutableNetwork::Ptr GNAPlugin::ImportNetwork(const std::str
    };
 #endif

-#if GNA_LIB_VER == 2
-    inputsDesc->orientation_in["input"] = getOrientation(std::get<0>(gnaModels.back())->obj.Operations[0]);
-    outputDesc.orientation = getOrientation(std::get<0>(gnaModels.back())->obj.Operations[std::get<0>(gnaModels.back())->obj.NumberOfOperations - 1]);
-#else
+#if GNA_LIB_VER == 1
    inputsDesc->orientation_in["input"] = getOrientation(std::get<0>(nnets.back())->obj.pLayers[0]);
-    outputDesc.orientation = getOrientation(std::get<0>(nnets.back())->obj.pLayers[std::get<0>(nnets.back())->obj.nLayers - 1]);
+    outputsDesc[0].orientation = getOrientation(std::get<0>(nnets.back())->obj.pLayers[std::get<0>(nnets.back())->obj.nLayers - 1]);
 #endif
-    outputDesc.num_bytes_per_element = header.output.element_size;
-
-    auto outputDims = SizeVector({header.nGroup, header.output.elements_count / header.nGroup});
-    auto inputDims = SizeVector({header.nGroup, header.input.elements_count / header.nGroup});
-
-    inputsDataMap["input"] = std::make_shared<InputInfo>();
-    inputsDataMap["input"]->setInputData(make_shared<Data>("input",
-                                                           TensorDesc(
-                                                                   Precision::FP32,
-                                                                   inputDims,
-                                                                   Layout::NC)));
-    outputsDataMap["output"] = make_shared<Data>("output",
-                                                 TensorDesc(
-                                                         Precision::FP32,
-                                                         outputDims,
-                                                         Layout::NC));
-
-    outputDesc.scale_factor = header.output.scaleFactor;
-    inputsDesc->inputScaleFactors.push_back(header.input.scaleFactor);

    num_rotate_rows = header.nRotateRows;
    num_rotate_columns = header.nRotateColumns;
@@ -1214,9 +1195,11 @@ void GNAPlugin::Export(const std::string &fileName) {
        THROW_GNA_EXCEPTION << " network not loaded";
    }

+#if GNA_LIB_VER == 1
    if (inputsDesc->ptr_inputs_global_id.size() != 1) {
        THROW_GNA_EXCEPTION << " exporting network with multiple inputs not supported";
    }
+#endif

    std::fstream outStream(fileName, ios_base::out | ios_base::binary);

@@ -1229,19 +1212,16 @@ void GNAPlugin::Export(const std::string &fileName) {
 #endif
    }
 #if GNA_LIB_VER == 2
-    auto serial = GNAModelSerial(&std::get<0>(gnaModels.front())->obj,
+    Gna2Model* modelToSerial = &std::get<0>(gnaModels.front())->obj;
 #else
-    auto serial = GNAModelSerial(&std::get<0>(nnets.front())->obj,
+    intel_nnet_type_t* modelToSerial = &std::get<0>(nnets.front())->obj;
 #endif
-                   {inputsDesc->inputScaleFactors.front(),
-                    inputsDesc->ptr_inputs_global_storage.front()[0],
-                    2,
-                    static_cast<uint32_t>(InferenceEngine::details::product(inputsDataMap.begin()->second->getTensorDesc().getDims()))},
-                   {outputsDesc.front().scale_factor,
-                    outputsDesc.front().ptrs.front(),
-                    outputsDesc.front().num_bytes_per_element,
-                    static_cast<uint32_t>(InferenceEngine::details::product(outputsDataMap.begin()->second->getTensorDesc().getDims()))})
-        .SetInputRotation(dnn->num_rotate_rows, dnn->num_rotate_columns);
+    auto serial = GNAModelSerial(modelToSerial,
+                                 inputsDesc,
+                                 outputsDesc,
+                                 inputsDataMap,
+                                 outputsDataMap)
+                    .SetInputRotation(dnn->num_rotate_rows, dnn->num_rotate_columns);

    for (auto && memoryConnection : graphCompiler.memory_connection) {
        serial.AddState(memoryConnection.second.gna_ptr, memoryConnection.second.reserved_size);
--- a/inference-engine/src/gna_plugin/gna_plugin_config.cpp
+++ b/inference-engine/src/gna_plugin/gna_plugin_config.cpp
@@ -71,7 +71,7 @@ void Config::UpdateFromMap(const std::map<std::string, std::string>& config) {
                key.erase(0, 1);
                try {
                    input_index = std::stoi(key);
-                    if (input_index < 0 | input_index > 99) {
+                    if (input_index > 99) {
                        throw std::out_of_range("");
                    }
                } catch (std::invalid_argument&) {
--- a/inference-engine/src/gna_plugin/layers/gna_layer_info.hpp
+++ b/inference-engine/src/gna_plugin/layers/gna_layer_info.hpp
@@ -107,6 +107,9 @@ class LayerInfo {
    bool isConcatAlignFilter() const noexcept {
        return isOfType("ConcatAlignFilter");
    }
+    bool isLink() const noexcept {
+        return isOfType("Link");
+    }
    bool isAffineFilter() const noexcept {
        return isOfType("AffineFilter");
    }
@@ -204,6 +207,7 @@ class LayerInfo {
        if (layerOrder == std::vector<int>({ 0, 3, 2, 1 })) {
            return true;  // supported case
        }
+        IE_ASSERT(!layer->insData.empty());
        auto inputs = layer->insData.begin()->lock();
        auto inputsOrder = inputs->getTensorDesc().getDims();

--- a/inference-engine/src/gna_plugin/layers/gna_permute.hpp
+++ b/inference-engine/src/gna_plugin/layers/gna_permute.hpp
@@ -40,7 +40,6 @@ public:

        // length of current cycle
        std::list<cnt_type> permuteCycles;
-        int seqId = 0;
        bool newSeq = false;

        for (int i = 0; i != orderVec.size();) {
--- a/inference-engine/src/gna_plugin/optimizer/gna_pass_manager.cpp
+++ b/inference-engine/src/gna_plugin/optimizer/gna_pass_manager.cpp
@@ -609,31 +609,6 @@ void InsertIdentityLayerPass::run() {
    }
 }

-/**
- * @brief returns previous layers and insData index for it
- * @tparam T
- * @param origin
- * @param acceptanceCriteria
- * @param idx
- */
-// give previous layers while skipping certain layer according to expression
-template <class T>
-std::vector<std::pair<CNNLayerPtr, int> > CNNNetGetPrevLayersSkip(CNNLayerPtr origin, const T &acceptanceCriteria, int idx = -1) {
-    std::vector<std::pair<CNNLayerPtr, int> > prevLayers;
-    for (int i = idx == -1 ? 0 : idx; CNNNetHasPrevLayer(origin.get(), i) && (idx == -1 || i == idx); i++) {
-        auto prevLayer = CNNNetPrevLayer(origin, i);
-        if (acceptanceCriteria(prevLayer)) {
-            prevLayers.push_back({prevLayer, CNNLayerFindOutDataIdx(origin, i)});
-        } else {
-            // if for some input we need to look in upper layers - original index not used here intentionally
-            auto prevPrevLayers = CNNNetGetPrevLayersSkip(prevLayer, acceptanceCriteria);
-            prevLayers.insert(prevLayers.end(), prevPrevLayers.begin(), prevPrevLayers.end());
-        }
-    }
-
-    return prevLayers;
-}
-
 void InsertCopyLayerPass::run() {
    for (auto & l : *pLayers) {
        if (l->insData.empty()) continue;
@@ -1084,6 +1059,78 @@ void RemoveConstPass::run() {
    transformer.fullTrim();
 }

+void FuseMultipleIdentitiesPass::run() {
+    for (auto &l : *pLayers) {
+        if (l->insData.empty()) continue;
+
+        auto isNonFunctional = [](CNNLayerPtr ptr) {
+            return LayerInfo(ptr).isNonFunctional();
+        };
+        auto eltwise = dynamic_cast<InferenceEngine::EltwiseLayer *>(l.get());
+        auto concat = dynamic_cast<InferenceEngine::ConcatLayer *>(l.get());
+
+        if (LayerInfo(l).isNonFunctional() || LayerInfo(l).has32BInput())
+            continue;
+        gnalog() << "CNNNetPrevLayer skip non functional from :: " << l->name;
+        auto prevLayersReached = CNNNetGetPrevLayersSkip(l, [](CNNLayerPtr ptr) {
+            return !LayerInfo(ptr).isNonFunctional();
+        });
+        prevLayersReached.erase(std::remove_if(prevLayersReached.begin(),
+                                               prevLayersReached.end(),
+                                               [] (const std::pair<CNNLayerPtr, int> & candidate) {
+            return LayerInfo(candidate.first).isLink();
+        }), prevLayersReached.end());
+
+        if (prevLayersReached.size() != 1 && eltwise == nullptr && concat == nullptr) {
+            std::stringstream layers;
+            for (auto && prevLayer : prevLayersReached) {
+                layers << prevLayer.first->name;
+                layers << ", ";
+            }
+            THROW_GNA_LAYER_EXCEPTION(l) << "unsupported case: connected to "
+            << (prevLayersReached.empty() ? "zero" : "multiple") << " outputs : " << layers.str();
+        }
+        auto prevLayer = prevLayersReached.front().first;
+        auto outDataIdx = prevLayersReached.front().second;
+        gnalog() << ", reached " << prevLayer->name << " at " << outDataIdx << std::endl;
+
+        if (!LayerInfo(prevLayer).has32BOutput())
+            continue;
+
+        std::vector<CNNLayerPtr> resultSet = CNNNetGetAllNextLayersSkipCertain(prevLayer, outDataIdx, isNonFunctional);
+
+        // now result set should have all needed layers
+        // checking that result set consist of already identity
+        CNNLayerPtr  alreadyIdentity;
+        for (auto &&res : resultSet) {
+            if (LayerInfo(res).isIdentity()) {
+                alreadyIdentity = res;
+                break;
+            }
+        }
+        if (!alreadyIdentity) {
+            continue;
+        } else {
+            // just figure out how to connect to that "already identity"
+            // 1st stage - disconnect given layer from previous
+            auto directPrev = l->insData.front().lock()->getCreatorLayer().lock();
+            auto oDataIdx = CNNLayerFindOutDataIdx(directPrev, 0);
+            auto &inputTo = directPrev->outData[oDataIdx]->getInputTo();
+            for (auto inIterator = inputTo.begin(); inIterator != inputTo.end(); inIterator++) {
+                if (inIterator->second == l) {
+                    inputTo.erase(inIterator);
+                    break;
+                }
+            }
+            l->insData.clear();
+
+            //2nd stage - now setting up new connection
+            l->insData.push_back(alreadyIdentity->outData.front());
+            alreadyIdentity->outData.front()->getInputTo()[l->name] = l;
+        }
+    }
+}
+
 int PassManager::run(int index) {
 // #define PLOT
 #ifdef PLOT
--- a/inference-engine/src/gna_plugin/optimizer/gna_pass_manager.hpp
+++ b/inference-engine/src/gna_plugin/optimizer/gna_pass_manager.hpp
@@ -149,6 +149,11 @@ DECL_PASS_BEFORE_COPY(UnrollTI);
 */
 DECL_PASS_BEFORE_COPY(RemoveConst);

+/**
+ * @brief removed extra identity layer for multi-output
+ */
+DECL_PASS(FuseMultipleIdentities);
+
 struct PassManagerSettings {
    Policy policy;
    /// @brief whether to run passes before copy
--- a/inference-engine/src/inference_engine/cnn_network_ngraph_impl.hpp
+++ b/inference-engine/src/inference_engine/cnn_network_ngraph_impl.hpp
@@ -139,7 +139,7 @@ private:

    friend INFERENCE_ENGINE_API_CPP(std::shared_ptr<CNNNetworkImpl>)
    convertFunctionToICNNNetwork(const std::shared_ptr<const ::ngraph::Function>& graph,
-                                 const ICNNNetwork& nGraphImpl);
+                                 const ICNNNetwork& nGraphImpl, bool keep_constant_inputs);

    /**
     * @brief Reshape on the same shape
--- a/inference-engine/src/inference_engine/generic_ie.cpp
+++ b/inference-engine/src/inference_engine/generic_ie.cpp
@@ -63,9 +63,9 @@ ngraph::op::GenericIE::GenericIE(const ngraph::NodeVector& inputs,
    : GenericIE(as_output_vector(inputs), params, type, outputs) {}

 ngraph::op::GenericIE::GenericIE(const ngraph::OutputVector& inputs,
-                                 const std::map<std::string, InferenceEngine::Parameter>& params,
-                                 const std::string type, const std::vector<PortIE>& outputs)
-    : Op(inputs), params(params), outputs(outputs), type(type), initialized(0) {
+                                 const std::map<std::string, InferenceEngine::Parameter>& params_,
+                                 const std::string type_, const std::vector<PortIE>& outputs_)
+    : Op(inputs), params(params_), outputs(outputs_), type(type_), initialized(0) {
    constructor_validate_and_infer_types();
 }

--- a/inference-engine/src/inference_engine/ie_network_reader.cpp
+++ b/inference-engine/src/inference_engine/ie_network_reader.cpp
@@ -179,7 +179,9 @@ CNNNetwork details::ReadNetwork(const std::string& modelPath, const std::string&
                    THROW_IE_EXCEPTION << "Weights file " << bPath << " cannot be opened!";

                // read model with weights
-                return reader->read(modelStream, binStream, exts);
+                auto network = reader->read(modelStream, binStream, exts);
+                modelStream.close();
+                return network;
            }
            // read model without weights
            return reader->read(modelStream, exts);
--- a/inference-engine/src/legacy_api/include/convert_function_to_cnn_network.hpp
+++ b/inference-engine/src/legacy_api/include/convert_function_to_cnn_network.hpp
@@ -15,7 +15,8 @@ namespace InferenceEngine {
 namespace details {

 INFERENCE_ENGINE_API_CPP(std::shared_ptr<CNNNetworkImpl>)
-convertFunctionToICNNNetwork(const std::shared_ptr<const ::ngraph::Function>& graph, const ICNNNetwork &network);
+convertFunctionToICNNNetwork(const std::shared_ptr<const ::ngraph::Function>& graph,
+                             const ICNNNetwork &network, bool keep_constant_inputs = false);

 }  // namespace details
 }  // namespace InferenceEngine
--- a/inference-engine/src/legacy_api/src/convert_function_to_cnn_network.cpp
+++ b/inference-engine/src/legacy_api/src/convert_function_to_cnn_network.cpp
@@ -24,6 +24,8 @@
 #include "ngraph_ops/pad_ie.hpp"
 #include "ngraph_ops/onehot_ie.hpp"
 #include "ngraph_ops/power.hpp"
+#include "ngraph_ops/prior_box_clustered_ie.hpp"
+#include "ngraph_ops/prior_box_ie.hpp"
 #include "ngraph_ops/proposal_ie.hpp"
 #include "ngraph_ops/relu_ie.hpp"
 #include "ngraph_ops/scaleshift.hpp"
@@ -472,20 +474,6 @@ InferenceEngine::details::CNNLayerCreator::CNNLayerCreator(const std::shared_ptr
        return res;

    });
-
-    addSpecificCreator({"PriorBox"}, [](const std::shared_ptr<::ngraph::Node>& node,
-                                       const std::map<std::string, std::string> params) -> CNNLayerPtr {
-        THROW_IE_EXCEPTION << "PriorBox operation has a form that is not supported." << node->get_friendly_name()
-                           << " should be replaced by constant during constant folding.";
-        return nullptr;
-    });
-
-    addSpecificCreator({"PriorBoxClustered"}, [](const std::shared_ptr<::ngraph::Node>& node,
-                                       const std::map<std::string, std::string> params) -> CNNLayerPtr {
-        THROW_IE_EXCEPTION << "PriorBoxClustered operation has a form that is not supported." << node->get_friendly_name()
-                           << " should be replaced by constant during constant folding.";
-        return nullptr;
-    });
 }

 CNNLayerPtr InferenceEngine::details::CNNLayerCreator::create() {
@@ -499,7 +487,9 @@ CNNLayerPtr InferenceEngine::details::CNNLayerCreator::create() {
    return res;
 }

-std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_ptr<const ::ngraph::Function>& graph, const ICNNNetwork &network) {
+std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_ptr<const ::ngraph::Function> &graph,
+                                                             const ICNNNetwork &network,
+                                                             bool keep_constant_inputs) {
    IE_PROFILING_AUTO_SCOPE(convertFunctionToICNNNetwork)
    const auto createCNNLayer = [](const std::shared_ptr<::ngraph::Node> &node) -> CNNLayerPtr {
        class NGraphCNNLayer: public CNNLayer {
@@ -565,6 +555,10 @@ std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_p
                std::make_shared<Builder::NodeConverter<::ngraph::op::PadIE>>(),
                std::make_shared<Builder::NodeConverter<::ngraph::op::v1::Power>>(),
                std::make_shared<Builder::NodeConverter<::ngraph::op::PowerIE>>(),
+                std::make_shared<Builder::NodeConverter<::ngraph::op::PriorBox>>(),
+                std::make_shared<Builder::NodeConverter<::ngraph::op::PriorBoxClustered>>(),
+                std::make_shared<Builder::NodeConverter<::ngraph::op::PriorBoxClusteredIE>>(),
+                std::make_shared<Builder::NodeConverter<::ngraph::op::PriorBoxIE>>(),
                std::make_shared<Builder::NodeConverter<::ngraph::op::Proposal>>(),
                std::make_shared<Builder::NodeConverter<::ngraph::op::ProposalIE>>(),
                std::make_shared<Builder::NodeConverter<::ngraph::op::Relu>>(),
@@ -715,7 +709,7 @@ std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_p
    for (const auto &layer : nodes)
        op_names.insert(layer->get_name());

-    bool keep_constants = ::ngraph::op::util::has_op_with_type<::ngraph::op::FakeQuantize>(graph);
+    bool keep_constants = keep_constant_inputs || ::ngraph::op::util::has_op_with_type<::ngraph::op::FakeQuantize>(graph);

    // Create layers and output data
    for (const auto &layer : nodes) {
@@ -766,6 +760,20 @@ std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_p
        cnnLayer->insData.resize(inputCount);

        for (size_t i = 0; i < layer->get_output_size(); i++) {
+            // Memory node with index = 1 has no inputs according to the specification.
+            // For proper conversion, we must cut off all the layers and data nodes above ReadValue,
+            // if they are connected only with this layer.
+            // Now MO generates only constants or constant sub-graphs as input to ReadValue op.
+            if (std::dynamic_pointer_cast<::ngraph::op::Constant>(layer)) {
+                bool all_to_read_value = !layer->output(i).get_target_inputs().empty();
+                for (const auto &output_input : layer->output(i).get_target_inputs()) {
+                    all_to_read_value
+                            &= dynamic_cast<ngraph::op::ReadValue *>(output_input.get_node()) != nullptr;
+                }
+                if (all_to_read_value)
+                    continue;
+            }
+
            if (cnnLayer->type == "Memory" && cnnLayer->params["index"] == "0") {
                cnnLayer->outData.clear();
                continue;
@@ -773,7 +781,6 @@ std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_p
            std::string outName = layer->get_friendly_name();
            if (layer->get_output_size() != 1) outName += "." + std::to_string(i);
            DataPtr &ptr = cnnNetworkImpl->getData(outName.c_str());
-
            SizeVector dims;
            dims = layer->get_output_shape(i);
            for (const auto &dim : dims) {
@@ -889,6 +896,7 @@ std::shared_ptr<CNNNetworkImpl> convertFunctionToICNNNetwork(const std::shared_p
    for (const auto &ext : ::ngraph::op::GenericIE::getExtensions(graph)) {
        cnnNetworkImpl->AddExtension(ext, nullptr);
    }
+
    return cnnNetworkImpl;
 }
 }  // namespace details
--- a/inference-engine/src/legacy_api/src/graph_transformer.cpp
+++ b/inference-engine/src/legacy_api/src/graph_transformer.cpp
@@ -232,7 +232,8 @@ std::vector<CNNLayerPtr> ConstTransformer::foldConstSubgraphsInternal(const std:
 static std::vector<std::string> skipConstInfer = {
    "FakeQuantize",
    "Quantize",
-    "CumSum"        // Const inference function for CumSum is not implemented!
+    "CumSum",     // Const inference function for CumSum is not implemented
+    "Convolution" // Const inference function for Convolution is not implemented
 };

 const std::map<std::string, bool> ConstTransformer::getConstLayers(const std::vector<CNNLayerPtr>& sortedLayers) {
--- a/inference-engine/src/legacy_api/src/ie_cnn_layer_builder_ngraph.cpp
+++ b/inference-engine/src/legacy_api/src/ie_cnn_layer_builder_ngraph.cpp
@@ -34,6 +34,8 @@
 #include "ngraph_ops/onehot_ie.hpp"
 #include "ngraph_ops/pad_ie.hpp"
 #include "ngraph_ops/power.hpp"
+#include "ngraph_ops/prior_box_clustered_ie.hpp"
+#include "ngraph_ops/prior_box_ie.hpp"
 #include "ngraph_ops/proposal_ie.hpp"
 #include "ngraph_ops/relu_ie.hpp"
 #include "ngraph_ops/selu_ie.hpp"
@@ -1473,6 +1475,136 @@ CNNLayer::Ptr NodeConverter<ngraph::op::ProposalIE>::createLayer(const std::shar
    return res;
 }

+template <>
+CNNLayer::Ptr NodeConverter<ngraph::op::PriorBoxClusteredIE>::createLayer(
+    const std::shared_ptr<ngraph::Node>& layer) const {
+    LayerParams params = {layer->get_friendly_name(), "PriorBoxClustered",
+                          details::convertPrecision(layer->get_output_element_type(0))};
+    auto res = std::make_shared<InferenceEngine::CNNLayer>(params);
+    auto castedLayer = ngraph::as_type_ptr<ngraph::op::PriorBoxClusteredIE>(layer);
+    if (castedLayer == nullptr) THROW_IE_EXCEPTION << "Cannot get " << params.type << " layer " << params.name;
+
+    auto attr = castedLayer->get_attrs();
+    std::string param;
+    for (const auto& val : attr.widths) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["width"] = param;
+
+    param.clear();
+    for (const auto& val : attr.heights) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["height"] = param;
+
+    param.clear();
+    for (const auto& val : attr.variances) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["variance"] = param;
+
+    if (std::abs(attr.step_heights - attr.step_widths) < 1e-5) {
+        res->params["step"] = asString(attr.step_widths);
+    } else {
+        res->params["step_w"] = asString(attr.step_widths);
+        res->params["step_h"] = asString(attr.step_heights);
+    }
+    res->params["offset"] = asString(attr.offset);
+    res->params["clip"] = asString(attr.clip ? 1 : 0);
+    res->params["flip"] = "1";
+
+    return res;
+}
+
+template <>
+CNNLayer::Ptr NodeConverter<ngraph::op::PriorBoxClustered>::createLayer(
+    const std::shared_ptr<ngraph::Node>& layer) const {
+    THROW_IE_EXCEPTION << "PriorBoxClustered operation must be converted to PriorBoxClusteredIE operation.";
+}
+
+template <>
+CNNLayer::Ptr NodeConverter<ngraph::op::PriorBoxIE>::createLayer(const std::shared_ptr<ngraph::Node>& layer) const {
+    LayerParams params = {layer->get_friendly_name(), "PriorBox",
+                          details::convertPrecision(layer->get_output_element_type(0))};
+    auto res = std::make_shared<InferenceEngine::CNNLayer>(params);
+    auto castedLayer = ngraph::as_type_ptr<ngraph::op::PriorBoxIE>(layer);
+    auto layer_info = params.type + " layer " + params.name;
+
+    if (castedLayer == nullptr) THROW_IE_EXCEPTION << "Cannot get " << layer_info;
+
+    auto attr = castedLayer->get_attrs();
+    std::string param;
+
+    auto data_pshape = castedLayer->get_input_partial_shape(0);
+    if (data_pshape.is_dynamic()) THROW_IE_EXCEPTION << "Dynamic 0-port input of " << layer_info << " is not supported";
+    auto data_shape = data_pshape.to_shape();
+    if (data_shape.size() != 4) THROW_IE_EXCEPTION << layer_info << " has " << data_shape.size() << " items in 0-port input, 4 expected";
+
+    auto img_pshape = castedLayer->get_input_partial_shape(1);
+    if (img_pshape.is_dynamic()) THROW_IE_EXCEPTION << "Dynamic 1-port input of " << layer_info << " is not supported";
+    auto img_shape = img_pshape.to_shape();
+    if (img_shape.size() != 4) THROW_IE_EXCEPTION << layer_info << " has " << data_shape.size() << " items in 1-port input, 4 expected";
+
+    if (!attr.scale_all_sizes) {
+        // mxnet-like PriorBox
+        auto img_H = img_shape[2];
+        auto data_H = data_shape[2];
+        if (attr.step == -1)
+            attr.step = 1. * img_H / data_H;
+        else
+            attr.step *= img_H;
+        for (auto& size : attr.min_size)
+            size *= img_H;
+    }
+
+    for (const auto& val : attr.max_size) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["max_size"] = param;
+
+    param.clear();
+    for (const auto& val : attr.min_size) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["min_size"] = param;
+
+    param.clear();
+    for (const auto& val : attr.aspect_ratio) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["aspect_ratio"] = param;
+
+    param.clear();
+    for (const auto& val : attr.variance) {
+        if (!param.empty()) param += ",";
+        param += asString(val);
+    }
+    res->params["variance"] = param;
+
+    res->params["step"] = asString(attr.step);
+    res->params["offset"] = asString(attr.offset);
+    res->params["clip"] = asString(attr.clip ? 1 : 0);
+    res->params["flip"] = asString(attr.flip ? 1 : 0);
+    res->params["scale_all_sizes"] = asString(attr.scale_all_sizes ? 1 : 0);
+
+    res->params["density"] = asString(attr.density);
+    res->params["fixed_size"] = asString(attr.fixed_size);
+    res->params["fixed_ratio"] = asString(attr.fixed_ratio);
+
+    return res;
+}
+
+template <>
+CNNLayer::Ptr NodeConverter<ngraph::op::PriorBox>::createLayer(const std::shared_ptr<ngraph::Node>& layer) const {
+    THROW_IE_EXCEPTION << "PriorBox operation must be converted to PriorBoxIE operation.";
+}
+
 template <>
 CNNLayer::Ptr NodeConverter<ngraph::op::PowerIE>::createLayer(const std::shared_ptr<ngraph::Node>& layer) const {
    LayerParams params = {layer->get_friendly_name(), "Power",
--- a/inference-engine/src/legacy_api/src/net_pass.cpp
+++ b/inference-engine/src/legacy_api/src/net_pass.cpp
@@ -272,6 +272,48 @@ void CombineData(DataPtr& master, DataPtr& slave) {
    }
 }

+/**
+ * Preserve output data name and update output data map of the network
+ *
+ * @param in_data name to update
+ * @param out_data name to preserve
+ * @param net output data map to update with in_data
+ */
+template <typename NET>
+void SaveOutputDataName(InferenceEngine::DataPtr in_data, InferenceEngine::DataPtr out_data, NET &net) {
+    // TODO: update outputs of the network if out_data was output
+    if (out_data->getInputTo().empty()) {
+        auto data_name = out_data->getName();
+        in_data->setName(data_name);
+    }
+}
+
+/**
+ * void SaveOutputDataName(InferenceEngine::DataPtr in_data, InferenceEngine::DataPtr out_data, NET &net), where
+ * NET = ICNNNetwork
+ */
+void SaveOutputDataName(InferenceEngine::DataPtr in_data, InferenceEngine::DataPtr out_data, ICNNNetwork& net) {
+    if (out_data->getInputTo().empty()) {
+        InferenceEngine::OutputsDataMap outputs_data_map;
+        net.getOutputsInfo(outputs_data_map);
+        auto out_data_name = out_data->getName();
+        in_data->setName(out_data_name);
+        if (outputs_data_map.count(out_data_name)) {
+            auto parent_layer_ptr = in_data->getCreatorLayer().lock();
+            IE_ASSERT(parent_layer_ptr != nullptr);
+            auto parent_layer_name = parent_layer_ptr->name;
+            size_t in_data_out_index = 0;
+            for (size_t ind = 0; ind < parent_layer_ptr->outData.size(); ++ind) {
+                if (parent_layer_ptr->outData[ind] == in_data) {
+                    in_data_out_index = ind;
+                }
+            }
+            net.addOutput(parent_layer_name, in_data_out_index);
+        }
+    }
+}
+
+
 /**
 * Remove layer form graph
 * May be applied only for inplace layer. One input, one output,
@@ -279,7 +321,8 @@ void CombineData(DataPtr& master, DataPtr& slave) {
 *
 * @param layer to remove from graph
 */
-void RemoveLayer(CNNLayerPtr& layer) {
+template <typename NET>
+void RemoveLayer(CNNLayerPtr& layer, NET &net) {
    IE_ASSERT(layer->insData.size() == 1);
    IE_ASSERT(layer->outData.size() == 1);

@@ -299,10 +342,8 @@ void RemoveLayer(CNNLayerPtr& layer) {
    // transfer output connections into parent data
    CombineData(in_data, out_data);

-    // Save name for output data
-    if (out_data->getInputTo().empty()) {
-        in_data->setName(out_data->getName());
-    }
+    // save name for output data and update network output
+    SaveOutputDataName(in_data, out_data, net);
 }

 /************************************************************/
@@ -1371,7 +1412,7 @@ void fixConvertLayers(NET &net) {
        }
    }
    for (auto &layer : to_remove) {
-        RemoveLayer(layer);
+        RemoveLayer(layer, net);
    }
 }

--- a/inference-engine/src/low_precision_transformations/include/low_precision_transformations/gemm.hpp
+++ b/inference-engine/src/low_precision_transformations/include/low_precision_transformations/gemm.hpp
@@ -21,6 +21,8 @@ public:
    ~GemmTransformation() override {};
    bool canBeTransformed(const TransformationContext& context, const CNNLayer& layer) const override;
    void transform(TransformationContext& context, CNNLayer& layer) const override;
+
+    bool isQuantized(const CNNLayer& layer) const noexcept override;
 };

 IE_SUPPRESS_DEPRECATED_END
--- a/inference-engine/src/low_precision_transformations/include/low_precision_transformations/weightable_layer_transformation.hpp
+++ b/inference-engine/src/low_precision_transformations/include/low_precision_transformations/weightable_layer_transformation.hpp
@@ -83,6 +83,8 @@ protected:
        const std::vector<float>& originalWeightsDequantizationShifts,
        std::vector<float>& dequantizationScales,
        std::vector<float>& dequantizationShifts) const;
+
+    static bool getDequantizationDimIsSupported(const CNNLayer& weightableLayer);
 };

 typedef std::shared_ptr<WeightableLayerTransformation> WeightableLayerTransformationPtr;
--- a/inference-engine/src/low_precision_transformations/src/concat.cpp
+++ b/inference-engine/src/low_precision_transformations/src/concat.cpp
@@ -135,7 +135,6 @@ void ConcatTransformation::transform(TransformationContext& context, CNNLayer& c


        dequantizationScale = maxOutputInterval / (dataPrecision.max - dataPrecision.min);
-        const float max = maxOutputInterval / ((dataPrecision.max - dataPrecision.min) / dataPrecision.max);
        const float min = maxOutputInterval / ((dataPrecision.max - dataPrecision.min) / dataPrecision.min);
        dequantizationShift = outputLowValue - min;

--- a/inference-engine/src/low_precision_transformations/src/fully_connected.cpp
+++ b/inference-engine/src/low_precision_transformations/src/fully_connected.cpp
@@ -25,15 +25,6 @@
 using namespace InferenceEngine;
 using namespace InferenceEngine::details;

-bool getDequantizationValuesAreBroadcasted(const CNNLayer& fullyConnected) {
-    const DataPtr inputData = fullyConnected.insData[0].lock();
-    if (inputData == nullptr) {
-        THROW_IE_LPT_EXCEPTION(fullyConnected) << "input data is absent";
-    }
-
-    return inputData->getDims().size() == 3ul;
-}
-
 bool FullyConnectedTransformation::canBeTransformed(const TransformationContext& context, const CNNLayer& fullyConnected) const {
    if (!WeightableLayerTransformation::canBeTransformed(context, fullyConnected)) {
        return false;
@@ -72,7 +63,12 @@ bool FullyConnectedTransformation::canBeTransformed(const TransformationContext&
    std::vector<float> dequantizationShifts;
    fillFromDequantizationLayer(*scaleShift, dequantizationScales, dequantizationShifts);

-    if ((inTensorDims.size() == 3ul) && (!DequantizationDetails::isPerTensor(dequantizationScales, dequantizationShifts))) {
+    const bool dequantizationDimIsSupported = !getDequantizationDimIsSupported(fullyConnected);
+    if ((!dequantizationDimIsSupported) &&
+        (!DequantizationDetails::isPerTensor(dequantizationScales, dequantizationShifts) ||
+        // if asymmetric quantization is not supported then no shifts for dequantizationDimIsSupported = false case:
+        // in this case we can not dequantize with shifts
+        (!supportAsymmetricQuantization && (dequantizationShifts[0] != 0.f)))) {
        return false;
    }

@@ -318,7 +314,7 @@ void FullyConnectedTransformation::calculateDequantizationForSymmetric(
    const auto prevDequantizationScaleBuffer = CNNNetworkHelper::getFloatData(CNNNetworkHelper::getBlob(scaleShift, "weights"));
    const auto prevDequantizationShiftBuffer = CNNNetworkHelper::getFloatData(CNNNetworkHelper::getBlob(scaleShift, "biases"));

-    const bool dequantizationValuesAreBroadcasted = getDequantizationValuesAreBroadcasted(fullyConnected);
+    const bool dequantizationValuesAreBroadcasted = !getDequantizationDimIsSupported(fullyConnected);
    for (size_t i = 0; i < outputChannelsCount; ++i) {
        dequantizationScales[i] =
            prevDequantizationScaleBuffer.get()[0] *
@@ -401,7 +397,7 @@ void FullyConnectedTransformation::calculateDequantizationForAsymmetric(
        THROW_IE_EXCEPTION << "Unexpected layer type to calculate quantization values " << scaleShift->type;
    }

-    const bool dequantizationValuesAreBroadcasted = getDequantizationValuesAreBroadcasted(fullyConnected);
+    const bool dequantizationValuesAreBroadcasted = !getDequantizationDimIsSupported(fullyConnected);

    dequantizationScales.resize(outputChannelsCount);
    dequantizationShifts.resize(outputChannelsCount);
@@ -412,10 +408,10 @@ void FullyConnectedTransformation::calculateDequantizationForAsymmetric(
            prevDequantizationScaleBuffer.get()[0] *
            (originalWeightsDequantizationScales.size() == 0 ?
                1.0 :
-                (originalWeightsDequantizationScales.size() == 1 ? originalWeightsDequantizationScales[0] : originalWeightsDequantizationScales[i]));
+                originalWeightsDequantizationScales[((originalWeightsDequantizationScales.size() == 1) || dequantizationValuesAreBroadcasted) ? 0 : i]);
    }

-    if (CNNNetworkHelper::isQuantizedConstWeights(fullyConnected)) {
+    if (CNNNetworkHelper::isQuantizedConstWeights(fullyConnected) && (!dequantizationValuesAreBroadcasted)) {
        const Blob::Ptr weightsBlob = CNNNetworkHelper::getWeights(fullyConnected, roundQuantizedValues);
        const auto weightsBuffer = CNNNetworkHelper::getFloatData(weightsBlob);
        const Blob::Ptr biasesBlob = CNNNetworkHelper::getBiases(fullyConnected);
@@ -432,7 +428,7 @@ void FullyConnectedTransformation::calculateDequantizationForAsymmetric(

            for (size_t w = 0; w < inputChannelsCount; ++w) {
                const float kernel = weightsBuffer.get()[channel * inputChannelsCount + w];
-                const float shift = dequantizationValuesAreBroadcasted ? prevDequantizationShiftBuffer.get()[0] : prevDequantizationShiftBuffer.get()[w];
+                const float shift = prevDequantizationShiftBuffer.get()[w];
                sum1 += kernel * shift * weightsDequantizationScale;
                sum2 += kernel * dataZeroPoints[w] * weightsDequantizationScale;
            }
--- a/inference-engine/src/low_precision_transformations/src/gemm.cpp
+++ b/inference-engine/src/low_precision_transformations/src/gemm.cpp
@@ -133,3 +133,8 @@ void GemmTransformation::transform(TransformationContext& context, CNNLayer& gem

    addDequantizationLayer(context, gemm, dequantizationScales, dequantizationShifts);
 }
+
+bool GemmTransformation::isQuantized(const CNNLayer& layer) const noexcept {
+    // weightable layer version overriding
+    return true;
+}
--- a/inference-engine/src/low_precision_transformations/src/weightable_layer_transformation.cpp
+++ b/inference-engine/src/low_precision_transformations/src/weightable_layer_transformation.cpp
@@ -128,6 +128,15 @@ bool WeightableLayerTransformation::isPrecisionPreserved(const CNNLayer& layer)
    return false;
 }

+bool WeightableLayerTransformation::getDequantizationDimIsSupported(const CNNLayer& fullyConnected) {
+    const DataPtr inputData = fullyConnected.insData[0].lock();
+    if (inputData == nullptr) {
+        THROW_IE_LPT_EXCEPTION(fullyConnected) << "input data is absent";
+    }
+
+    return inputData->getDims().size() != 3ul;
+}
+
 void WeightableLayerTransformation::updateLayerBiases(
    TransformationContext& context,
    const CNNLayer& weightableLayer,
@@ -135,7 +144,17 @@ void WeightableLayerTransformation::updateLayerBiases(
    std::vector<float>& dequantizationScales,
    std::vector<float>& dequantizationShifts,
    std::vector<float>& biasesShifts) const {
-    if (!std::all_of(dequantizationShifts.begin(), dequantizationShifts.end(), [](float value) { return value == 0.0; })) {
+    const bool dequantizationShiftsAreZero = std::all_of(
+        dequantizationShifts.begin(),
+        dequantizationShifts.end(),
+        [](float value) { return value == 0.0; });
+
+    const bool dequantizationDimIsNotSupported = !getDequantizationDimIsSupported(weightableLayer);
+    CNNLayerPtr biasesLayer = CNNNetworkHelper::getParent(weightableLayer, 2);
+
+    // we need to correct biases if dequantization shifts values are not zero or
+    // dequantization dimention is not supported (as result dequantization shifts can not be calculated)
+    if ((dequantizationDimIsNotSupported && (biasesLayer != nullptr)) || (!dequantizationShiftsAreZero)) {
        const DataPtr insData = weightableLayer.insData[0].lock();
        if (insData == nullptr) {
            THROW_IE_LPT_EXCEPTION(weightableLayer) << "input data is absent";
@@ -144,7 +163,6 @@ void WeightableLayerTransformation::updateLayerBiases(

        std::shared_ptr<float> biasesBufferPtr;
        Blob::Ptr biasesBlob;
-        CNNLayerPtr biasesLayer = CNNNetworkHelper::getParent(weightableLayer, 2);
        if (biasesLayer == nullptr) {
            if (weightableLayer.outData.size() != 1ul) {
                THROW_IE_LPT_EXCEPTION(weightableLayer) << "unexpected output data count " << weightableLayer.outData.size();
--- a/inference-engine/src/mkldnn_plugin/mkldnn_memory.cpp
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_memory.cpp
@@ -661,6 +661,13 @@ MKLDNNMemoryDesc::operator InferenceEngine::TensorDesc() const {
            blkDims.push_back(8);
            layout = Layout::BLOCKED;
            break;
+        case memory::gOdhwi8o:
+            order = {0, 1, 2, 3, 4, 5, 1};
+            blkDims = dims;
+            blkDims[1] = blkDims[1] / 8 + (blkDims[1] % 8 ? 1 : 0);
+            blkDims.push_back(8);
+            layout = Layout::BLOCKED;
+            break;
        case memory::nChw16c:
            order = {0, 1, 2, 3, 1};
            blkDims = dims;
@@ -676,6 +683,13 @@ MKLDNNMemoryDesc::operator InferenceEngine::TensorDesc() const {
            blkDims.push_back(16);
            layout = Layout::BLOCKED;
            break;
+        case memory::gOdhwi16o:
+            order = {0, 1, 2, 3, 4, 5, 1};
+            blkDims = dims;
+            blkDims[1] = blkDims[1] / 16 + (blkDims[1] % 16 ? 1 : 0);
+            blkDims.push_back(16);
+            layout = Layout::BLOCKED;
+            break;
        case memory::Ohwi8o:
            order = {0, 1, 2, 3, 0};
            blkDims = dims;
@@ -1267,6 +1281,13 @@ MKLDNNMemoryDesc::MKLDNNMemoryDesc(const TensorDesc& tDesc):
                    } else if (blkdDims[6] == 16) {
                        mkldnnFormat = memory::format::Goidhw16g;
                    }
+                } else if (order.size() == 7 &&
+                           order[0] == 0 && order[1] == 1 && order[2] == 2 && order[3] == 3 && order[4] == 4 && order[5] == 5 && order[6] == 1) {
+                    if (blkdDims[6] == 8) {
+                        mkldnnFormat = memory::format::gOdhwi8o;
+                    } else if (blkdDims[6] == 16) {
+                        mkldnnFormat = memory::format::gOdhwi16o;
+                    }
                } else if (order.size() == 8 &&
                           order[0] == 0 && order[1] == 1 && order[2] == 3 && order[3] == 4 && order[4] == 2 && order[5] == 5 &&
                           order[6] == 1 && order[7] == 2) {
--- a/inference-engine/src/mkldnn_plugin/nodes/argmax_imp.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/argmax_imp.cpp
@@ -182,8 +182,6 @@ void argmax_many_classes_has_axis(const float* src_data, float* dst_data, Shape
            vmask_type vmask;
            int s_index = i0 * dim * after_num + ib1 * block_size;

-            std::memset(reinterpret_cast<void*>(&vmax_values[0]), 0, sizeof(vmax_values));
-
            auto vswap_func = [&](int index1, int index2) {
                vtmp = vmax_values[index1];
                vmax_values[index1] = _mm_uni_blendv_ps(vmax_values[index1], vmax_values[index2], vmask);
--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_depthwise_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_depthwise_node.cpp
@@ -157,7 +157,7 @@ void MKLDNNDepthwiseNode::createDescriptor(const std::vector<InferenceEngine::Te
                                           const std::vector<InferenceEngine::TensorDesc> &outputDesc) {
    MKLDNNMemoryDesc in_candidate(inputDesc[0]);
    MKLDNNMemoryDesc out_candidate(inputDesc[0]);
-    MKLDNNDims weightDims({in_candidate.getDims()[1]});
+    MKLDNNDims weightDims({in_candidate.getDims().ndims() == 1 ? in_candidate.getDims()[0] : in_candidate.getDims()[1]});

    MKLDNNMemoryDesc wgh_candidate{weightDims, in_candidate.getDataType(), memory::x};

--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_fullyconnected_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_fullyconnected_node.cpp
@@ -209,32 +209,34 @@ void MKLDNNFullyConnectedNode::setPostOps(mkldnn::primitive_attr &attr, bool ini
                PostOpsIntBlobMemory.push_back(MKLDNNMemoryPtr(new MKLDNNMemory(getEngine())));
                PostOpsIntBlobMemory[blob_idx]->Create(depthwiseDims, memory::data_type::f32, memory::format::x);

-                PostOpsIntBlobMemory[blob_idx]->SetData(memory::data_type::f32, memory::x,
-                                                        depthwiseLayer->_weights->buffer(),
-                                                        depthwiseLayer->_weights->size() *
-                                                        MKLDNNExtensionUtils::sizeOfDataType(memory::data_type::f32));
-
+                // In case ndims == 3 graph optimizer allows fusing only if all weights values are the same
                if (depthwiseNode->isBroadcast() || ndims == 3) {
-                    float broadcastValue = static_cast<float *>(PostOpsIntBlobMemory[blob_idx]->GetData())[0];
-                    for (int i = 1; i < PostOpsIntBlobMemory[blob_idx]->GetPrimitiveDescriptor().desc().data.dims[0]; i++) {
+                    float broadcastValue = static_cast<float *>(depthwiseLayer->_weights->buffer())[0];
+                    for (int i = 0; i < PostOpsIntBlobMemory[blob_idx]->GetPrimitiveDescriptor().desc().data.dims[0]; i++) {
                        static_cast<float *>(PostOpsIntBlobMemory[blob_idx]->GetData())[i] = broadcastValue;
                    }
+                } else {
+                    PostOpsIntBlobMemory[blob_idx]->SetData(memory::data_type::f32, memory::x,
+                                                            depthwiseLayer->_weights->buffer(),
+                                                            depthwiseLayer->_weights->size() *
+                                                            MKLDNNExtensionUtils::sizeOfDataType(memory::data_type::f32));
                }

                if (depthwiseNode->getAlgorithm() == depthwise_scale_shift) {
                    PostOpsIntBlobMemory.push_back(MKLDNNMemoryPtr(new MKLDNNMemory(getEngine())));
-                    PostOpsIntBlobMemory[blob_idx + 1]->Create(depthwiseDims, memory::data_type::f32,
-                                                               memory::format::x);
-                    PostOpsIntBlobMemory[blob_idx + 1]->SetData(memory::data_type::f32, memory::x,
-                                                                depthwiseLayer->_biases->buffer(),
-                                                                depthwiseLayer->_biases->size() *
-                                                                MKLDNNExtensionUtils::sizeOfDataType(memory::data_type::f32));
+                    PostOpsIntBlobMemory[blob_idx + 1]->Create(depthwiseDims, memory::data_type::f32, memory::format::x);

+                    // In case ndims == 3 graph optimizer allows fusing only if all biases values are the same
                    if (depthwiseNode->isBroadcast() || ndims == 3) {
-                        float broadcastValue = static_cast<float *>(PostOpsIntBlobMemory[blob_idx + 1]->GetData())[0];
-                        for (int i = 1; i < PostOpsIntBlobMemory[blob_idx + 1]->GetPrimitiveDescriptor().desc().data.dims[0]; i++) {
+                        float broadcastValue = static_cast<float *>(depthwiseLayer->_biases->buffer())[0];
+                        for (int i = 0; i < PostOpsIntBlobMemory[blob_idx + 1]->GetPrimitiveDescriptor().desc().data.dims[0]; i++) {
                            static_cast<float *>(PostOpsIntBlobMemory[blob_idx + 1]->GetData())[i] = broadcastValue;
                        }
+                    } else {
+                        PostOpsIntBlobMemory[blob_idx + 1]->SetData(memory::data_type::f32, memory::x,
+                                                                    depthwiseLayer->_biases->buffer(),
+                                                                    depthwiseLayer->_biases->size() *
+                                                                    MKLDNNExtensionUtils::sizeOfDataType(memory::data_type::f32));
                    }

                    ops.append_depthwise(depthwiseNode->getAlgorithm(),
--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_normalize_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_normalize_node.cpp
@@ -667,7 +667,8 @@ private:
 };

 MKLDNNNormalizeNode::MKLDNNNormalizeNode(const InferenceEngine::CNNLayerPtr& layer, const mkldnn::engine& eng, MKLDNNWeightsSharing::Ptr &cache)
-        : MKLDNNNode(layer, eng, cache) {}
+        : MKLDNNNode(layer, eng, cache), src_data_size(0lu), dst_data_size(0lu), weights_data_size(0lu),
+        input_prec(Precision::UNSPECIFIED), output_prec(Precision::UNSPECIFIED), weights_prec(Precision::UNSPECIFIED) {}

 void MKLDNNNormalizeNode::getSupportedDescriptors() {
    if (!descs.empty())
--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_reorder_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_reorder_node.cpp
@@ -120,13 +120,18 @@ void MKLDNNReorderNode::createReorderPrimitive(const mkldnn::memory::desc &srcDe
        // Code block below tries to detect such cases and reinterpret data planar formats (e.g. nchw)
        // as grouped weights planar formats (e.g. goihw) since they have same physical memory layout.
        if (MKLDNNMemory::GetPlainFormat(src_blocked->GetDims()) == src_blocked->GetFormat() &&
-            MKLDNNMemory::IsGroupedFormat(dst_blocked->GetFormat())) {
+            src_blocked->GetDims().size() + 1 == dst_blocked->GetDims().size()) {
            try {
                mkldnn::memory::dims newDims = dst_blocked->GetDims();
                mkldnn::memory::format newFormat;
-                newFormat = src_blocked->GetDims().size() == 4 ? memory::goihw :
-                            src_blocked->GetDims().size() == 5 ? memory::goidhw :
-                            src_blocked->GetFormat();
+                if (MKLDNNMemory::IsGroupedFormat(dst_blocked->GetFormat())) {
+                    newFormat = src_blocked->GetDims().size() == 4 ? memory::goihw :
+                                src_blocked->GetDims().size() == 5 ? memory::goidhw :
+                                src_blocked->GetFormat();
+                } else {
+                    newFormat = src_blocked->GetDims().size() == 4 ? memory::ncdhw :
+                                src_blocked->GetFormat();
+                }

                auto newDesc = mkldnn::memory::desc(newDims, src_blocked->GetDataType(), newFormat);
                src_blocked->Create(newDesc, srcPtr, false);
--- a/inference-engine/src/readers/ir_reader/ie_ir_parser.cpp
+++ b/inference-engine/src/readers/ir_reader/ie_ir_parser.cpp
@@ -413,6 +413,16 @@ std::shared_ptr<ngraph::Node> V10Parser::createNode(const std::vector<ngraph::Ou
        std::make_shared<LayerCreator<ngraph::op::v1::ReduceLogicalOr>>("ReduceLogicalOr"),
    };

+    // Check that operation in default opsets
+    auto isDefaultOpSet = [](const std::string& version) -> bool {
+        for (size_t i = 1; i <= 3; i++) {
+            std::string opset_name = "opset" + std::to_string(i);
+            if (version == opset_name)
+                return true;
+        }
+        return false;
+    };
+
    for (size_t i = 0; i < inputs.size(); i++) {
        if (!inputs[i].get_node())
            THROW_IE_EXCEPTION << params.type << " layer " << params.name << " with id: " << params.layerId
@@ -423,21 +433,23 @@ std::shared_ptr<ngraph::Node> V10Parser::createNode(const std::vector<ngraph::Ou
    }

    std::shared_ptr<ngraph::Node> ngraphNode;
-    // Try to create operation from creators
-    for (const auto& creator : creators) {
-        if (creator->shouldCreate(params.type)) {
-            bool useCreator = false;
-            // Check that opset is registered
-            useCreator |= opsets.find(params.version) == opsets.end();
-            if (!useCreator) {
-                // Check that creator can create operation with the version from opset
-                const auto opset = opsets.at(params.version);
-                // Opset should contains the same version of operation or doesn't contain operation with current type
-                useCreator |= opset.contains_type(creator->getNodeType()) || !opset.contains_type(params.type);
+    if (isDefaultOpSet(params.version)) {
+        // Try to create operation from creators
+        for (const auto& creator : creators) {
+            if (creator->shouldCreate(params.type)) {
+                bool useCreator = false;
+                // Check that opset is registered
+                useCreator |= opsets.find(params.version) == opsets.end();
+                if (!useCreator) {
+                    // Check that creator can create operation with the version from opset
+                    const auto opset = opsets.at(params.version);
+                    // Opset should contains the same version of operation or doesn't contain operation with current type
+                    useCreator |= opset.contains_type(creator->getNodeType()) || !opset.contains_type(params.type);
+                }
+                if (useCreator)
+                    ngraphNode = creator->createLayer(inputs, node, binStream, params);
+                break;
            }
-            if (useCreator)
-                ngraphNode = creator->createLayer(inputs, node, binStream, params);
-            break;
        }
    }

--- a/inference-engine/src/transformations/include/ngraph_ops/prior_box_clustered_ie.hpp
+++ b/inference-engine/src/transformations/include/ngraph_ops/prior_box_clustered_ie.hpp
@@ -0,0 +1,43 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+
+#include <transformations_visibility.hpp>
+
+#include <ngraph/op/op.hpp>
+#include <ngraph/op/experimental/layers/prior_box_clustered.hpp>
+
+namespace ngraph {
+namespace op {
+
+class TRANSFORMATIONS_API PriorBoxClusteredIE : public Op {
+public:
+    static constexpr NodeTypeInfo type_info{"PriorBoxClusteredIE", 1};
+    const NodeTypeInfo& get_type_info() const override { return type_info; }
+
+    /// \brief Constructs a PriorBoxClusteredIE operation
+    ///
+    /// \param layer    Layer for which prior boxes are computed
+    /// \param image    Input Input to which prior boxes are scaled
+    /// \param attrs          PriorBoxClustered attributes
+    PriorBoxClusteredIE(const Output<Node>& input,
+               const Output<Node>& image,
+               const ngraph::op::PriorBoxClusteredAttrs& attrs);
+
+    void validate_and_infer_types() override;
+
+    std::shared_ptr<Node> copy_with_new_args(const NodeVector& new_args) const override;
+
+    const PriorBoxClusteredAttrs& get_attrs() const { return m_attrs; }
+
+private:
+    PriorBoxClusteredAttrs m_attrs;
+};
+
+}  // namespace op
+}  // namespace ngraph
+
--- a/inference-engine/src/transformations/include/ngraph_ops/prior_box_ie.hpp
+++ b/inference-engine/src/transformations/include/ngraph_ops/prior_box_ie.hpp
@@ -0,0 +1,42 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+
+#include <transformations_visibility.hpp>
+
+#include "ngraph/op/op.hpp"
+#include "ngraph/op/experimental/layers/prior_box.hpp"
+
+namespace ngraph {
+namespace op {
+
+class TRANSFORMATIONS_API PriorBoxIE : public Op {
+public:
+    static constexpr NodeTypeInfo type_info{"PriorBoxIE", 1};
+    const NodeTypeInfo& get_type_info() const override { return type_info; }
+
+    /// \brief Constructs a PriorBoxIE operation
+    ///
+    /// \param layer    Layer for which prior boxes are computed
+    /// \param image    Input Input to which prior boxes are scaled
+    /// \param attrs          PriorBox attributes
+    PriorBoxIE(const Output<Node>& input,
+               const Output<Node>& image,
+               const ngraph::op::PriorBoxAttrs& attrs);
+
+    void validate_and_infer_types() override;
+
+    std::shared_ptr<Node> copy_with_new_args(const NodeVector& new_args) const override;
+
+    const PriorBoxAttrs& get_attrs() const { return m_attrs; }
+
+private:
+    PriorBoxAttrs m_attrs;
+};
+
+}  // namespace op
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/common_optimizations/common_optimizations_tbl.hpp
+++ b/inference-engine/src/transformations/include/transformations/common_optimizations/common_optimizations_tbl.hpp
@@ -16,6 +16,8 @@

 // This pass must be called first in pipeline
 NGRAPH_PASS(InitNodeInfo, ::ngraph::pass)
+NGRAPH_PASS(ConvertPriorBox, ::ngraph::pass)  // WA: ConvertPriorBox must be executed before CF
+NGRAPH_PASS(ConstantFolding, ::ngraph::pass)
 NGRAPH_PASS(RemoveFilteringBoxesBySize, ::ngraph::pass) // Resolves dynamism (replaces NonZero), CF needed
 NGRAPH_PASS(ConstantFolding, ::ngraph::pass)
 NGRAPH_PASS(StridedSliceOptimization, ::ngraph::pass) // depends on CF
--- a/inference-engine/src/transformations/include/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp
+++ b/inference-engine/src/transformations/include/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp
@@ -0,0 +1,33 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <vector>
+#include <memory>
+
+#include <transformations_visibility.hpp>
+
+#include <ngraph/pass/graph_rewrite.hpp>
+
+namespace ngraph {
+namespace pass {
+
+class TRANSFORMATIONS_API ConvertPriorBox;
+
+}  // namespace pass
+}  // namespace ngraph
+
+class ngraph::pass::ConvertPriorBox: public ngraph::pass::GraphRewrite {
+public:
+    ConvertPriorBox() : GraphRewrite() {
+        convert_prior_box();
+        convert_prior_box_clustered();
+    }
+
+private:
+    void convert_prior_box();
+
+    void convert_prior_box_clustered();
+};
--- a/inference-engine/src/transformations/src/ngraph_ops/prior_box_clustered_ie.cpp
+++ b/inference-engine/src/transformations/src/ngraph_ops/prior_box_clustered_ie.cpp
@@ -0,0 +1,39 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "ngraph_ops/prior_box_clustered_ie.hpp"
+
+#include <memory>
+
+#include "ngraph/op/constant.hpp"
+
+using namespace std;
+using namespace ngraph;
+
+constexpr NodeTypeInfo op::PriorBoxClusteredIE::type_info;
+
+op::PriorBoxClusteredIE::PriorBoxClusteredIE(const Output<Node>& input, const Output<Node>& image,
+                                             const PriorBoxClusteredAttrs& attrs)
+    : Op({input, image}), m_attrs(attrs) {
+    constructor_validate_and_infer_types();
+}
+
+void op::PriorBoxClusteredIE::validate_and_infer_types() {
+    if (get_input_partial_shape(0).is_dynamic() || get_input_partial_shape(1).is_dynamic()) {
+        set_output_type(0, element::f32, PartialShape::dynamic(3));
+        return;
+    }
+
+    auto input_shape = get_input_shape(0);
+    auto image_shape = get_input_shape(1);
+
+    size_t num_priors = m_attrs.widths.size();
+
+    set_output_type(0, element::f32, Shape {1, 2, 4 * input_shape[2] * input_shape[3] * num_priors});
+}
+
+shared_ptr<Node> op::PriorBoxClusteredIE::copy_with_new_args(const NodeVector& new_args) const {
+    check_new_args_count(this, new_args);
+    return make_shared<PriorBoxClusteredIE>(new_args.at(0), new_args.at(1), m_attrs);
+}
--- a/inference-engine/src/transformations/src/ngraph_ops/prior_box_ie.cpp
+++ b/inference-engine/src/transformations/src/ngraph_ops/prior_box_ie.cpp
@@ -0,0 +1,36 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "ngraph_ops/prior_box_ie.hpp"
+
+#include <memory>
+
+#include "ngraph/op/constant.hpp"
+
+using namespace std;
+using namespace ngraph;
+
+constexpr NodeTypeInfo op::PriorBoxIE::type_info;
+
+op::PriorBoxIE::PriorBoxIE(const Output<Node>& input, const Output<Node>& image, const PriorBoxAttrs& attrs)
+    : Op({input, image}), m_attrs(attrs) {
+    constructor_validate_and_infer_types();
+}
+
+void op::PriorBoxIE::validate_and_infer_types() {
+    if (get_input_partial_shape(0).is_dynamic() || get_input_partial_shape(1).is_dynamic()) {
+        set_output_type(0, element::f32, PartialShape::dynamic(3));
+        return;
+    }
+    auto input_shape = get_input_shape(0);
+    auto image_shape = get_input_shape(1);
+
+    set_output_type(0, element::f32, Shape {
+        1, 2, 4 * input_shape[2] * input_shape[3] * static_cast<size_t>(op::PriorBox::number_of_priors(m_attrs))});
+}
+
+shared_ptr<Node> op::PriorBoxIE::copy_with_new_args(const NodeVector& new_args) const {
+    check_new_args_count(this, new_args);
+    return make_shared<PriorBoxIE>(new_args.at(0), new_args.at(1), m_attrs);
+}
--- a/inference-engine/src/transformations/src/transformations/common_optimizations/common_optimizations.cpp
+++ b/inference-engine/src/transformations/src/transformations/common_optimizations/common_optimizations.cpp
@@ -5,6 +5,7 @@
 #include <memory>

 #include "transformations/common_optimizations/common_optimizations.hpp"
+#include "transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp"
 #include "transformations/depth_to_space_fusion.hpp"
 #include "transformations/optimize_strided_slice.hpp"
 #include "transformations/convert_scatter_elements_to_scatter.hpp"
--- a/inference-engine/src/transformations/src/transformations/convert_divide.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_divide.cpp
@@ -17,7 +17,8 @@ void ngraph::pass::ConvertDivide::convert_divide() {

    ngraph::graph_rewrite_callback callback = [](pattern::Matcher& m) {
        auto div = std::dynamic_pointer_cast<ngraph::opset1::Divide> (m.get_match_root());
-        if (!div) {
+        // We can not apply this transformation in case with integer input data type
+        if (!div || div->input(0).get_element_type().is_integral()) {
            return false;
        }

--- a/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.cpp
@@ -0,0 +1,294 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/convert_opset1_to_legacy/convert_prior_to_ie_prior.hpp"
+
+#include <memory>
+#include <vector>
+
+#include <ngraph/opsets/opset3.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+#include <ngraph_ops/prior_box_ie.hpp>
+#include <ngraph_ops/prior_box_clustered_ie.hpp>
+#include <ngraph/rt_info.hpp>
+
+void ngraph::pass::ConvertPriorBox::convert_prior_box() {
+    auto data = std::make_shared<pattern::op::Label>(element::i64, Shape{1, 1, 1, 1});
+    auto axes = ngraph::opset1::Constant::create(element::i64, Shape{1}, {0});
+    auto image = std::make_shared<pattern::op::Label>(element::i64, Shape{1, 1, 1, 1});
+
+    ngraph::op::PriorBoxAttrs attr;
+    attr.min_size = {162.0f};
+    attr.max_size = {213.0f};
+    attr.aspect_ratio = {2.0f, 3.0f};
+    attr.variance = {0.1f, 0.1f, 0.2f, 0.2f};
+    attr.step = 64.0f;
+    attr.offset = 0.5f;
+    attr.clip = 0;
+    attr.flip = 1;
+    attr.scale_all_sizes = true;
+
+    auto prior_box = std::make_shared<ngraph::opset1::PriorBox>(data, image, attr);
+    auto unsqueeze = std::make_shared<ngraph::opset1::Unsqueeze> (prior_box, axes);
+
+    ngraph::graph_rewrite_callback callback = [](pattern::Matcher& m) {
+        auto unsqueeze = std::dynamic_pointer_cast<ngraph::opset1::Unsqueeze> (m.get_match_root());
+        if (!unsqueeze) {
+            return false;
+        }
+        auto prior_box_node = std::dynamic_pointer_cast<ngraph::opset1::PriorBox> (unsqueeze->input_value(0).get_node_shared_ptr());
+
+        if (!prior_box_node) {
+            return false;
+        }
+
+        // vector of nGraph nodes that will be replaced
+        ngraph::NodeVector ops_to_replace{unsqueeze, prior_box_node};
+
+        std::shared_ptr<Node> input_1(prior_box_node->input_value(0).get_node_shared_ptr());
+        std::shared_ptr<Node> input_2(prior_box_node->input_value(1).get_node_shared_ptr());
+
+        auto convert1 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+        auto convert2 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+
+        if (convert1 && convert2) {
+            ops_to_replace.push_back(convert1);
+            ops_to_replace.push_back(convert2);
+            input_1 = convert1->input_value(0).get_node_shared_ptr();
+            input_2 = convert2->input_value(0).get_node_shared_ptr();
+        }
+
+        auto strided_slice1 = std::dynamic_pointer_cast<ngraph::opset1::StridedSlice> (input_1);
+        auto strided_slice2 = std::dynamic_pointer_cast<ngraph::opset1::StridedSlice> (input_2);
+
+        if (!strided_slice1 || !strided_slice2) {
+            return false;
+        }
+
+        ops_to_replace.push_back(strided_slice1);
+        ops_to_replace.push_back(strided_slice2);
+
+        //  Check that StridedSlice1 cuts H,W dims for PriorBox
+        auto begin = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->input_value(1).get_node_shared_ptr());
+        auto end = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->input_value(2).get_node_shared_ptr());
+        auto stride = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->input_value(3).get_node_shared_ptr());
+
+        if (!begin || !end || !stride) {
+            return false;
+        }
+
+        auto begin_val = begin->get_vector<int64_t>();
+        auto end_val = end->get_vector<int64_t>();
+        auto stride_val = stride->get_vector<int64_t>();
+
+        if (begin_val.size() != 1 && begin_val[0] != 2) {
+            return false;
+        }
+
+        if (end_val.size() != 1 && end_val[0] != 4) {
+            return false;
+        }
+
+        if (stride_val.size() != 1 && stride_val[0] != 1) {
+            return false;
+        }
+
+        // TODO: should we check second StridedSlice?
+        input_1 = strided_slice1->input_value(0).get_node_shared_ptr();
+        input_2 = strided_slice2->input_value(0).get_node_shared_ptr();
+
+        convert1 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+        convert2 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+
+        if (convert1 && convert2) {
+            ops_to_replace.push_back(convert1);
+            ops_to_replace.push_back(convert2);
+            input_1 = convert1->input_value(0).get_node_shared_ptr();
+            input_2 = convert2->input_value(0).get_node_shared_ptr();
+        }
+
+        // the input can be either ShapeOf-1 or ShapeOf-3
+        std::shared_ptr<ngraph::op::Op> shape_of1 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf> (input_1);
+        std::shared_ptr<ngraph::op::Op> shape_of2 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf> (input_2);
+
+        if (!shape_of1 || !shape_of2) {
+            shape_of1 = std::dynamic_pointer_cast<ngraph::opset3::ShapeOf>(input_1);
+            shape_of2 = std::dynamic_pointer_cast<ngraph::opset3::ShapeOf>(input_2);
+        }
+        if (!shape_of1 || !shape_of2) {
+            return false;
+        }
+        // keep this code for a while if will decide to run this transformation again in the opset1->legacy
+        // the input can be either ShapeOf or Convert(ShapeOf)
+//        if (!shape_of1 || !shape_of2) {
+//            auto shapeof1_convert = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+//            auto shapeof2_convert = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+//            if (!shapeof1_convert || !shapeof2_convert)
+//                return false;
+//            shape_of1 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf>(shapeof1_convert->input_value(0).get_node_shared_ptr());
+//            shape_of2 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf>(shapeof2_convert->input_value(0).get_node_shared_ptr());
+//            if (!shape_of1 || !shape_of2)
+//                return false;
+//            ops_to_replace.push_back(shapeof1_convert);
+//            ops_to_replace.push_back(shapeof2_convert);
+//        }
+
+        ops_to_replace.push_back(shape_of1);
+        ops_to_replace.push_back(shape_of2);
+
+        auto prior_box_ie = std::make_shared<ngraph::op::PriorBoxIE> (shape_of1->input_value(0),
+                                                                      shape_of2->input_value(0),
+                                                                      prior_box_node->get_attrs());
+
+        prior_box_ie->set_friendly_name(unsqueeze->get_friendly_name());
+
+        // Nodes in copy runtime info function should be in topological order
+        std::reverse(ops_to_replace.begin(), ops_to_replace.end());
+        ngraph::copy_runtime_info(ops_to_replace, prior_box_ie);
+        ngraph::replace_node(m.get_match_root(), prior_box_ie);
+        return true;
+    };
+
+    auto m = std::make_shared<ngraph::pattern::Matcher>(unsqueeze, "CPUFusion.ConvertPriorBoxToPriorBoxIE");
+    this->add_matcher(m, callback, PassProperty::CHANGE_DYNAMIC_STATE);
+}
+
+void ngraph::pass::ConvertPriorBox::convert_prior_box_clustered() {
+    auto data = std::make_shared<pattern::op::Label>(element::i64, Shape{1, 1, 1, 1});
+    auto axes = ngraph::opset1::Constant::create(element::i64, Shape{1}, {0});
+    auto image = std::make_shared<pattern::op::Label>(element::i64, Shape{1, 1, 1, 1});
+
+    ngraph::op::PriorBoxClusteredAttrs attr;
+    attr.widths = {0.1f, 0.1f, 0.2f, 0.2f};
+    attr.heights = {0.1f, 0.1f, 0.2f, 0.2f};
+    attr.variances = {0.1f, 0.1f, 0.2f, 0.2f};
+    attr.step_widths = 64.0f;
+    attr.step_heights = 64.0f;
+    attr.offset = 0.5f;
+    attr.clip = false;
+
+    auto prior_box = std::make_shared<ngraph::opset1::PriorBoxClustered>(data, image, attr);
+    auto unsqueeze = std::make_shared<ngraph::opset1::Unsqueeze> (prior_box, axes);
+
+    ngraph::graph_rewrite_callback callback = [](pattern::Matcher& m) {
+        auto unsqueeze = std::dynamic_pointer_cast<ngraph::opset1::Unsqueeze> (m.get_match_root());
+        if (!unsqueeze) {
+            return false;
+        }
+        auto prior_box_node = std::dynamic_pointer_cast<ngraph::opset1::PriorBoxClustered> (unsqueeze->get_argument(0));
+
+        if (!prior_box_node) {
+            return false;
+        }
+
+        // vector of nGraph nodes that will be replaced
+        ngraph::NodeVector ops_to_replace{unsqueeze, prior_box_node};
+
+        std::shared_ptr<Node> input_1(prior_box_node->input_value(0).get_node_shared_ptr());
+        std::shared_ptr<Node> input_2(prior_box_node->input_value(1).get_node_shared_ptr());
+
+        auto convert1 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+        auto convert2 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+
+        if (convert1 && convert2) {
+            ops_to_replace.push_back(convert1);
+            ops_to_replace.push_back(convert2);
+            input_1 = convert1->input_value(0).get_node_shared_ptr();
+            input_2 = convert2->input_value(0).get_node_shared_ptr();
+        }
+
+        auto strided_slice1 = std::dynamic_pointer_cast<ngraph::opset1::StridedSlice> (input_1);
+        auto strided_slice2 = std::dynamic_pointer_cast<ngraph::opset1::StridedSlice> (input_2);
+
+        if (!strided_slice1 || !strided_slice2) {
+            return false;
+        }
+
+        ops_to_replace.push_back(strided_slice1);
+        ops_to_replace.push_back(strided_slice2);
+
+        //  Check that StridedSlice1 cuts H,W dims for PriorBox
+        auto begin = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->get_argument(1));
+        auto end = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->get_argument(2));
+        auto stride = std::dynamic_pointer_cast<ngraph::opset1::Constant> (strided_slice1->get_argument(3));
+
+        if (!begin || !end || !stride) {
+            return false;
+        }
+
+        auto begin_val = begin->get_vector<int64_t>();
+        auto end_val = end->get_vector<int64_t>();
+        auto stride_val = stride->get_vector<int64_t>();
+
+        if (begin_val.size() != 1 && begin_val[0] != 2) {
+            return false;
+        }
+
+        if (end_val.size() != 1 && end_val[0] != 4) {
+            return false;
+        }
+
+        if (stride_val.size() != 1 && stride_val[0] != 1) {
+            return false;
+        }
+
+        // TODO: should we check second StridedSlice?
+        input_1 = strided_slice1->input_value(0).get_node_shared_ptr();
+        input_2 = strided_slice2->input_value(0).get_node_shared_ptr();
+
+        convert1 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+        convert2 = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+
+        if (convert1 && convert2) {
+            ops_to_replace.push_back(convert1);
+            ops_to_replace.push_back(convert2);
+            input_1 = convert1->input_value(0).get_node_shared_ptr();
+            input_2 = convert2->input_value(0).get_node_shared_ptr();
+        }
+
+        // the input can be either ShapeOf-1 or ShapeOf-3
+        std::shared_ptr<ngraph::op::Op> shape_of1 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf> (input_1);
+        std::shared_ptr<ngraph::op::Op> shape_of2 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf> (input_2);
+
+        if (!shape_of1 || !shape_of2) {
+            shape_of1 = std::dynamic_pointer_cast<ngraph::opset3::ShapeOf>(input_1);
+            shape_of2 = std::dynamic_pointer_cast<ngraph::opset3::ShapeOf>(input_2);
+        }
+        if (!shape_of1 || !shape_of2) {
+            return false;
+        }
+        // keep this code for a while if will decide to run this transformation again in the opset1->legacy
+        // the input can be either ShapeOf or Convert(ShapeOf)
+//        if (!shape_of1 || !shape_of2) {
+//            auto shapeof1_convert = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_1);
+//            auto shapeof2_convert = std::dynamic_pointer_cast<ngraph::opset1::Convert> (input_2);
+//            if (!shapeof1_convert || !shapeof2_convert)
+//                return false;
+//            shape_of1 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf>(shapeof1_convert->input_value(0).get_node_shared_ptr());
+//            shape_of2 = std::dynamic_pointer_cast<ngraph::opset1::ShapeOf>(shapeof2_convert->input_value(0).get_node_shared_ptr());
+//            if (!shape_of1 || !shape_of2)
+//                return false;
+//            ops_to_replace.push_back(shapeof1_convert);
+//            ops_to_replace.push_back(shapeof2_convert);
+//        }
+
+        ops_to_replace.push_back(shape_of1);
+        ops_to_replace.push_back(shape_of2);
+
+        auto prior_box_ie = std::make_shared<ngraph::op::PriorBoxClusteredIE> (shape_of1->get_argument(0),
+                                                                               shape_of2->get_argument(0),
+                                                                               prior_box_node->get_attrs());
+        prior_box_ie->set_friendly_name(unsqueeze->get_friendly_name());
+
+        // Nodes in copy runtime info function should be in topological order
+        std::reverse(ops_to_replace.begin(), ops_to_replace.end());
+        ngraph::copy_runtime_info(ops_to_replace, prior_box_ie);
+        ngraph::replace_node(unsqueeze, prior_box_ie);
+        return true;
+    };
+
+    auto m = std::make_shared<ngraph::pattern::Matcher>(unsqueeze, "CPUFusion.ConvertPriorBoxClusteredToPriorBoxClusteredIE");
+    this->add_matcher(m, callback, PassProperty::CHANGE_DYNAMIC_STATE);
+}
--- a/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_strided_slice_to_crop.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_strided_slice_to_crop.cpp
@@ -41,10 +41,6 @@ void ngraph::pass::ConvertStridedSliceToCrop::convert_strided_slice_to_crop() {

        auto input_shape = slice->get_input_shape(0);
        auto output_shape = slice->get_output_shape(0);
-        // MKLDNN: "Crop supports only 2d, 4d and 5d blobs."
-        if (input_shape.size() != 2 && input_shape.size() != 4 && input_shape.size() != 5) {
-            return false;
-        }

        auto begin = begin_node->cast_vector<int64_t>();
        auto end = end_node->cast_vector<int64_t>();
@@ -201,6 +197,12 @@ void ngraph::pass::ConvertStridedSliceToCrop::convert_strided_slice_to_crop() {
            new_ops.push_back(data_node);
        }

+        auto data_node_shape = data_node->get_output_shape(0);
+        // MKLDNN: "Crop supports only 2d, 4d and 5d blobs."
+        if (data_node_shape.size() != 2 && data_node_shape.size() != 4 && data_node_shape.size() != 5) {
+            return false;
+        }
+
        // Crop
        data_node = std::make_shared<ngraph::op::CropIE> (data_node, axes, dim, offset);
        data_node->set_friendly_name(slice->get_friendly_name());
--- a/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_topk_to_topk_ie.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_opset1_to_legacy/convert_topk_to_topk_ie.cpp
@@ -42,22 +42,37 @@ void ngraph::pass::ConvertTopKToTopKIE::convert_topk_to_topk_ie() {
                                                             topk->get_sort_type());
        new_ops.push_back(topk_ie);

+        Output<Node> element_output;
        Output<Node> index_output;
-        // insert Convert if index element type not equal to i32
-        if (topk->get_index_element_type() == element::i32) {
+        // insert Convert if index element type not equal to i32 and output #1 of TopK has consumers
+        if (topk->get_index_element_type() == element::i32 || topk->get_output_target_inputs(1).size() == 0) {
+            element_output = topk_ie->output(0);
            index_output = topk_ie->output(1);
-        } else {
+            topk_ie->set_friendly_name(topk->get_friendly_name());
+        } else if (topk->get_output_target_inputs(0).size() == 0) {
            index_output = std::make_shared<opset1::Convert>(topk_ie->output(1), topk->get_index_element_type());
            new_ops.push_back(index_output.get_node_shared_ptr());
+
+            // workaround for naming output #1 of TopK
+            index_output.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".1");
+        } else {
+            // create fake convert for 0 output, it is a workaround in purpose of correct output names preserving
+            element_output = std::make_shared<opset1::Convert>(topk_ie->output(0), topk->get_output_element_type(0));
+            index_output = std::make_shared<opset1::Convert>(topk_ie->output(1), topk->get_index_element_type());
+            new_ops.push_back(element_output.get_node_shared_ptr());
+            new_ops.push_back(index_output.get_node_shared_ptr());
+
+            // workaround for naming two outputs of TopK
+            element_output.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".0");
+            index_output.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".1");
        }

-        topk_ie->set_friendly_name(topk->get_friendly_name());
        ngraph::copy_runtime_info(topk, new_ops);
-        topk->output(0).replace(topk_ie->output(0));
+        topk->output(0).replace(element_output);
        topk->output(1).replace(index_output);
        return true;
    };

    auto m = std::make_shared<ngraph::pattern::Matcher>(topk, "ConvertTopKToTopKIE");
    this->add_matcher(m, callback, PassProperty::CHANGE_DYNAMIC_STATE);
-}
+}
--- a/inference-engine/src/transformations/src/transformations/convert_opset3_to_opset2/convert_topk3.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_opset3_to_opset2/convert_topk3.cpp
@@ -20,24 +20,40 @@ void ngraph::pass::ConvertTopK3::convert_topk3() {
        if (!topk) {
            return false;
        }
-        Output<Node> last;
+        Output<Node> last0;
+        Output<Node> last1;
        ngraph::NodeVector new_ops;

        auto new_topk = std::make_shared<ngraph::opset2::TopK>(topk->input_value(0), topk->input_value(1),
                topk->get_axis(), topk->get_mode(), topk->get_sort_type(), element::i32);
        new_ops.push_back(new_topk);
-        // if the output is the i32 then it matches behavior of the v1::TopK otherwise need to insert Convert
-        if (topk->get_index_element_type() == element::i32) {
-            last = new_topk->output(1);
+        // if the output is the i32 or output #1 has no consumers
+        // then it matches behavior of the v1::TopK otherwise need to insert Convert
+        if (topk->get_index_element_type() == element::i32 || topk->get_output_target_inputs(1).size() == 0) {
+            last0 = new_topk->output(0);
+            last1 = new_topk->output(1);
+            new_topk->set_friendly_name(topk->get_friendly_name());
+        } else if (topk->get_output_target_inputs(0).size() == 0) {
+            last1 = std::make_shared<ngraph::opset2::Convert>(new_topk->output(1), topk->get_index_element_type());
+            new_ops.push_back(last1.get_node_shared_ptr());
+
+            // workaround for naming two outputs of TopK
+            last1.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".1");
        } else {
-            last = std::make_shared<ngraph::opset2::Convert>(new_topk->output(1), topk->get_index_element_type());
-            new_ops.push_back(last.get_node_shared_ptr());
+            // create fake convert for 0 output, it is a workaround in purpose of correct output names preserving
+            last0 = std::make_shared<ngraph::opset2::Convert>(new_topk->output(0), topk->get_output_element_type(0));
+            last1 = std::make_shared<ngraph::opset2::Convert>(new_topk->output(1), topk->get_index_element_type());
+            new_ops.push_back(last0.get_node_shared_ptr());
+            new_ops.push_back(last1.get_node_shared_ptr());
+
+            // workaround for naming two outputs of TopK
+            last0.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".0");
+            last1.get_node_shared_ptr()->set_friendly_name(topk->get_friendly_name() + ".1");
        }

-        new_topk->set_friendly_name(topk->get_friendly_name());
        ngraph::copy_runtime_info(topk, new_ops);
-        topk->output(0).replace(new_topk->output(0));
-        topk->output(1).replace(last);
+        topk->output(0).replace(last0);
+        topk->output(1).replace(last1);
        return true;
    };

--- a/inference-engine/src/transformations/src/transformations/depth_to_space_fusion.cpp
+++ b/inference-engine/src/transformations/src/transformations/depth_to_space_fusion.cpp
@@ -30,7 +30,7 @@ bool check_block_first(const ngraph::Shape& shape_input, const ngraph::Shape& sh
    is_transformation_valid &= (expected_shape == shape_reshape_before);

    // x'' = transpose(x', [0,  K + 1,  K + 2, 1, K + 3, 2, K + 4, 3, ..., K + (K + 1), K])
-    ngraph::AxisVector expected_permutation = {0, spatial_dims + 1};
+    ngraph::AxisVector expected_permutation = {0, static_cast<size_t>(spatial_dims + 1)};
    for (uint64_t i = 2; i < shape_input.size(); ++i) {
        expected_permutation.push_back(spatial_dims + i);
        expected_permutation.push_back(i - 1);
@@ -38,7 +38,7 @@ bool check_block_first(const ngraph::Shape& shape_input, const ngraph::Shape& sh
    is_transformation_valid &= (expected_permutation == permutation);

    // y = reshape(x'', [N, C / (block_size ^ K), D1 * block_size, D2 * block_size, D3 * block_size, ..., DK * block_size])
-    expected_shape = {shape_input[0], c_dim};
+    expected_shape = {shape_input[0], static_cast<size_t>(c_dim)};
    for (uint64_t i = 2; i < shape_input.size(); ++i)
        expected_shape.push_back(shape_input[i] * possible_block_size);
    is_transformation_valid &= (expected_shape == shape_reshape_after);
@@ -57,7 +57,7 @@ bool check_depth_first(const ngraph::Shape& shape_input, const ngraph::Shape& sh
    uint64_t c_dim = shape_input[1] / std::pow(possible_block_size, spatial_dims);

    // x' = reshape(data, [N, C / (block_size ^ K), block_size, block_size, ..., block_size, D1, D2, ..., DK])
-    ngraph::Shape expected_shape = {shape_input[0], c_dim};
+    ngraph::Shape expected_shape = {shape_input[0], static_cast<size_t>(c_dim)};
    for (uint64_t i = 0; i < spatial_dims; ++i)
        expected_shape.push_back(possible_block_size);
    for (uint64_t i = 2; i < shape_input.size(); ++i)
@@ -73,7 +73,7 @@ bool check_depth_first(const ngraph::Shape& shape_input, const ngraph::Shape& sh
    is_transformation_valid &= (expected_permutation == permutation);

    // y = reshape(x'', [N, C / (block_size ^ K), D1 * block_size, D2 * block_size, D3 * block_size, ..., DK * block_size])
-    expected_shape = {shape_input[0], c_dim};
+    expected_shape = {shape_input[0], static_cast<size_t>(c_dim)};
    for (uint64_t i = 2; i < shape_input.size(); ++i)
        expected_shape.push_back(shape_input[i] * possible_block_size);
    is_transformation_valid &= (expected_shape == shape_reshape_after);
--- a/inference-engine/src/vpu/common/include/vpu/utils/simple_math.hpp
+++ b/inference-engine/src/vpu/common/include/vpu/utils/simple_math.hpp
@@ -26,7 +26,7 @@ namespace vpu {

 template <typename T>
 Optional<int> parseNumber(const std::string& s) {
-    T value;
+    auto value = T{};
    if ((std::istringstream(s) >> value >> std::ws).eof()) {
        return {value};
    }
--- a/inference-engine/src/vpu/common/src/ngraph/transformations/dynamic_to_static_shape_binary_elementwise.cpp
+++ b/inference-engine/src/vpu/common/src/ngraph/transformations/dynamic_to_static_shape_binary_elementwise.cpp
@@ -39,7 +39,7 @@ void dynamicToStaticShapeBinaryEltwise(std::shared_ptr<ngraph::Node> eltwise) {
    const auto diff = std::abs(lhsRank.get_length() - rhsRank.get_length());
    if (diff) {
        auto & broadcastInput = lhsRank.get_length() < rhsRank.get_length() ? lhsInput : rhsInput;
-        const auto broadcastConst = ngraph::opset3::Constant::create(broadcastInput.get_element_type(), {static_cast<uint64_t>(diff)}, {1});
+        const auto broadcastConst = ngraph::opset3::Constant::create(broadcastInput.get_element_type(), {static_cast<size_t>(diff)}, {1});
        broadcastInput = std::make_shared<ngraph::opset3::Concat>(ngraph::OutputVector{broadcastConst, broadcastInput}, 0);
    }

--- a/inference-engine/src/vpu/graph_transformer/include/vpu/model/model.hpp
+++ b/inference-engine/src/vpu/graph_transformer/include/vpu/model/model.hpp
@@ -392,8 +392,17 @@ inline Stage ModelObj::addNewStage(
 // runAllocator
 //

+VPU_DECLARE_ENUM(EnableShapeAllocation,
+                 YES,
+                 NO)
+
+VPU_DECLARE_ENUM(CheckOnlyCMX,
+                 YES,
+                 NO)
+
 AllocationResult runAllocator(
        const Model& model,
-        bool onlyCheckCMX = false);
+        EnableShapeAllocation = EnableShapeAllocation::NO,
+        CheckOnlyCMX = CheckOnlyCMX::NO);

 }  // namespace vpu
--- a/inference-engine/src/vpu/graph_transformer/src/backend/get_meta_data.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/backend/get_meta_data.cpp
@@ -84,9 +84,11 @@ void BackEnd::getMetaData(
            stageMeta.layerName = "<Extra>";
            stageMeta.layerType = "<Extra>";
        } else {
-            stageMeta.layerName = stage->origLayer()->name;
-            stageMeta.layerType = stage->origLayer()->type;
-            visitedLayers.insert(stage->origLayer());
+            const auto& origLayer = stage->origLayer();
+            stageMeta.layerName = origLayer->params.count("originalLayersNames") ? origLayer->params["originalLayersNames"] :
+                                                                                   origLayer->name;
+            stageMeta.layerType = origLayer->type;
+            visitedLayers.insert(origLayer);
        }

        return stageMeta;
--- a/inference-engine/src/vpu/graph_transformer/src/frontend/custom_layer.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/frontend/custom_layer.cpp
@@ -184,9 +184,9 @@ CustomLayer::CustomLayer(std::string configDir, const pugi::xml_node& customLaye
            stageOrder.emplace(stageNum, CustomKernel{kernel, _configDir});
        }

-        VPU_THROW_UNLESS(stageOrder.begin()->first == 0,
+        VPU_THROW_UNLESS(!stageOrder.empty() && stageOrder.begin()->first == 0,
            "Error while binding %s custom layer: Stage 0 is not found.", _layerName);
-        VPU_THROW_UNLESS(stageOrder.rbegin()->first == stageOrder.size() - 1,
+        VPU_THROW_UNLESS(!stageOrder.empty() && stageOrder.rbegin()->first == stageOrder.size() - 1,
            "Error while binding %s custom layer: Kernels should have stage id from 0 to N.", _layerName);

        for (auto& stage : stageOrder) {
--- a/inference-engine/src/vpu/graph_transformer/src/middleend/hw/tiling.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/hw/tiling.cpp
@@ -430,6 +430,19 @@ bool checkHWRestrictions(
        int kernelSizeX, int kernelSizeY,
        int kernelStride,
        HwOpMode mode, HwOpType type) {
+    // Workaround for HW ops failure if too wide input:
+    // Looks like HW operations (primarily Pooling) can
+    // use only part of available CMX, up to 1014 * 128
+    // bits (i.e. 1014 * 16 bytes)
+    // Provided HwOpMode is 16x16, this means HW needs
+    // to read up to 16 lines of input tensor, so each
+    // line mustn't exceed 1014 bytes or 507 pixels if
+    // precision is FP16
+    // More details available with the ticket #-33366
+    if (inTileWidth > 507) {
+        return false;
+    }
+
    const int chansPerBlock = 1 << static_cast<int>(mode);
    int noOfBlocks    = divUp(inTileChannels, chansPerBlock);

--- a/inference-engine/src/vpu/graph_transformer/src/middleend/passes/adjust_data_batch.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/passes/adjust_data_batch.cpp
@@ -193,10 +193,10 @@ void PassImpl::wrapInLoop(const Model& model, const StageList& subgraph) {
                loopEndOutputs.push_back(originalOutput);
                const auto rule = IterationRule{Dim::N, 0, 1, -1};
                endIterationComponents.emplace(std::make_pair(loopEndOutputs.size() - 1, rule), loopEndInputs.size() - 1);
-            } else {
-                for (const auto& consumerEdge : originalOutput->consumerEdges()) {
+            }
+            for (const auto& consumerEdge : originalOutput->consumerEdges()) {
+                if (subgraph.has(consumerEdge->consumer()))
                    model->replaceStageInput(consumerEdge, output);
-                }
            }
        }
    }
--- a/inference-engine/src/vpu/graph_transformer/src/middleend/passes/adjust_data_location.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/passes/adjust_data_location.cpp
@@ -458,7 +458,7 @@ void PassImpl::packDataInCmx(const Model& model) {
            return DataLoopStatus::NextChild;
        });

-        auto allocRes = runAllocator(model, true);
+        auto allocRes = runAllocator(model, EnableShapeAllocation::NO, CheckOnlyCMX::YES);
        env.log->trace("Allocation result : %v", allocRes.status);

        if (allocRes.status != AllocationStatus::OK) {
--- a/inference-engine/src/vpu/graph_transformer/src/middleend/passes/allocate_resources.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/passes/allocate_resources.cpp
@@ -25,7 +25,7 @@ namespace vpu {
 // runAllocator
 //

-AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
+AllocationResult runAllocator(const Model& model, EnableShapeAllocation enableShapeAllocation, CheckOnlyCMX checkOnlyCmx) {
    VPU_PROFILE(runAllocator);

    auto& allocator = model->getAllocator();
@@ -40,7 +40,7 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
    // Allocate Const/Input/Output datas.
    //

-    if (!onlyCheckCMX) {
+    if (checkOnlyCmx == CheckOnlyCMX::NO) {
        auto result = allocator.preprocess(model);
        if (result.status != vpu::AllocationStatus::OK) {
            return result;
@@ -86,14 +86,14 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
        // Allocate stage outputs.
        //

-        const auto allocateStageOutputs = [onlyCheckCMX, &allocator](const Stage& stage) -> AllocationResult {
+        const auto allocateStageOutputs = [checkOnlyCmx, &allocator](const Stage& stage) -> AllocationResult {
            for (const auto& output : stage->outputs()) {
-                if (onlyCheckCMX && output->memReqs() != MemoryType::CMX) {
+                if (checkOnlyCmx == CheckOnlyCMX::YES && output->memReqs() != MemoryType::CMX) {
                    continue;
                }

                if (!allocator.allocateData(output)) {
-                    if (output->memReqs() == MemoryType::CMX && !onlyCheckCMX) {
+                    if (output->memReqs() == MemoryType::CMX && checkOnlyCmx == CheckOnlyCMX::NO) {
                        if (allocator.removeCMXCandidates(output)) {
                            if (allocator.allocateData(output)) {
                                continue;
@@ -123,7 +123,7 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
        // Allocate stage temporary buffers.
        //

-        if (!onlyCheckCMX) {
+        if (checkOnlyCmx == CheckOnlyCMX::NO) {
            for (const auto& tempBufferEdge : stage->tempBufferEdges()) {
                if (!allocator.allocateData(tempBufferEdge->tempBuffer())) {
                    allocator.setNeedToAllocNonIntermData();
@@ -157,7 +157,7 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
        //

        for (const auto& input : stage->inputs()) {
-            if (onlyCheckCMX && input->memReqs() != MemoryType::CMX) {
+            if (checkOnlyCmx == CheckOnlyCMX::YES && input->memReqs() != MemoryType::CMX) {
                continue;
            }

@@ -168,7 +168,7 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
        // Release stage temporary buffers.
        //

-        if (!onlyCheckCMX) {
+        if (checkOnlyCmx == CheckOnlyCMX::NO) {
            for (const auto& tempBufferEdge : stage->tempBufferEdges()) {
                allocator.freeData(tempBufferEdge->tempBuffer());
            }
@@ -195,7 +195,7 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {

        if (const auto& parentEdge = data->parentDataToShapeEdge()) {
            const auto& parent = parentEdge->parent();
-            if (parent->usage() == DataUsage::Intermediate && (!onlyCheckCMX || parent->memReqs() == MemoryType::CMX)) {
+            if (parent->usage() == DataUsage::Intermediate && (checkOnlyCmx == CheckOnlyCMX::NO || parent->memReqs() == MemoryType::CMX)) {
                allocator.freeData(parent);
            }
        }
@@ -205,9 +205,11 @@ AllocationResult runAllocator(const Model& model, bool onlyCheckCMX) {
    // Allocate shape for all datas
    //

-    for (auto data : model->datas()) {
-        const auto shapeLocation = allocator.allocateShape(data);
-        data->setShapeAllocationInfo(shapeLocation);
+    if (enableShapeAllocation == EnableShapeAllocation::YES) {
+        for (auto data : model->datas()) {
+            const auto shapeLocation = allocator.allocateShape(data);
+            data->setShapeAllocationInfo(shapeLocation);
+        }
    }

    return AllocationResult();
@@ -233,7 +235,7 @@ void PassImpl::run(const Model& model) {
    // Allocate all resources
    //

-    auto allocRes = runAllocator(model);
+    auto allocRes = runAllocator(model, EnableShapeAllocation::YES);
    IE_ASSERT(allocRes.status == AllocationStatus::OK);

    //
--- a/inference-engine/src/vpu/graph_transformer/src/middleend/passes/eliminate_copy.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/passes/eliminate_copy.cpp
@@ -160,7 +160,7 @@ void PassImpl::run(const Model& model) {
            model->replaceStageInput(consumerEdge, copyOutput);
        }

-        auto allocRes = runAllocator(model, true);
+        auto allocRes = runAllocator(model, EnableShapeAllocation::NO, CheckOnlyCMX::YES);
        if (allocRes.status != AllocationStatus::OK) {
            model->replaceStageOutput(copyProducer->outputEdge(0), copyInput);

--- a/inference-engine/src/vpu/graph_transformer/src/middleend/passes/inject_sw.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/middleend/passes/inject_sw.cpp
@@ -171,7 +171,7 @@ void PassImpl::run(const Model& model) {
                    .childSW(swStage)
                    .done();

-            auto allocRes = runAllocator(model, true);
+            auto allocRes = runAllocator(model, EnableShapeAllocation::NO, CheckOnlyCMX::YES);
            if (allocRes.status == AllocationStatus::OK) {
                // TODO: try to merge more than one SW stage?
                break;
--- a/inference-engine/src/vpu/graph_transformer/src/parsed_config.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/parsed_config.cpp
@@ -160,7 +160,9 @@ void ParsedConfig::parse(const std::map<std::string, std::string>& config) {
    setOption(_compileConfig.hwExtraSplit,                   switches, config, VPU_CONFIG_KEY(HW_EXTRA_SPLIT));
    setOption(_compileConfig.injectSwOps,                    switches, config, VPU_CONFIG_KEY(HW_INJECT_STAGES));
    setOption(_compileConfig.mergeHwPoolToConv,              switches, config, VPU_CONFIG_KEY(HW_POOL_CONV_MERGE));
+IE_SUPPRESS_DEPRECATED_START
    setOption(_compileConfig.ignoreIRStatistic,              switches, config, VPU_CONFIG_KEY(IGNORE_IR_STATISTIC));
+IE_SUPPRESS_DEPRECATED_END
    setOption(_compileConfig.hwDilation,                     switches, config, VPU_CONFIG_KEY(HW_DILATION));
    setOption(_compileConfig.forceDeprecatedCnnConversion,   switches, config, VPU_CONFIG_KEY(FORCE_DEPRECATED_CNN_CONVERSION));
    setOption(_compileConfig.disableReorder,                 switches, config, VPU_CONFIG_KEY(DISABLE_REORDER));
--- a/inference-engine/src/vpu/graph_transformer/src/stages/concat.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/concat.cpp
@@ -266,6 +266,8 @@ void FrontEnd::parseConcat(
        const ie::CNNLayerPtr& layer,
        const DataVector& inputs,
        const DataVector& outputs) const {
+    VPU_THROW_UNLESS(layer != nullptr, "parseConcat expects valid CNNLayerPtr, actually got nullptr");
+
    VPU_THROW_UNLESS(!inputs.empty(),
                     "{} layer with name {} must have no less than 1 input, "
                     "actually provided 0 inputs", layer->type, layer->name);
@@ -275,10 +277,8 @@ void FrontEnd::parseConcat(

    auto output = outputs[0];

-    auto concat = std::dynamic_pointer_cast<ie::ConcatLayer>(layer);
-    VPU_THROW_UNLESS(layer != nullptr,
-                     "{} layer with name {} must be able to convert to ie::ConcatLayer",
-                     layer->type, layer->name);
+    const auto& concat = std::dynamic_pointer_cast<ie::ConcatLayer>(layer);
+    VPU_THROW_UNLESS(concat != nullptr, "{} layer with name {} must be convertable to ie::ConcatLayer", layer->type, layer->name);

    VPU_THROW_UNLESS(concat->_axis < output->desc().numDims(),
                     "{} layer with name {} must have axis attribute no grater than number of "
--- a/inference-engine/src/vpu/graph_transformer/src/stages/reduce.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/reduce.cpp
@@ -128,9 +128,8 @@ private:

 void FrontEnd::parseReduce(const Model& model, const ie::CNNLayerPtr& _layer, const DataVector& inputs, const DataVector& outputs) const {
    auto layer = std::dynamic_pointer_cast<ie::ReduceLayer>(_layer);
-    VPU_THROW_UNLESS(layer != nullptr,
-                     "Layer {} of type {} is nullptr",
-                     layer->name, layer->type);
+    VPU_THROW_UNLESS(layer != nullptr, "parseReduce expects valid ReduceLayer, actually got nullptr");
+
    VPU_THROW_UNLESS(inputs.size() == 2,
                     "Layer {} of type {} expects {} inputs, but provided {}",
                     layer->name, layer->type, 2, inputs.size());
--- a/inference-engine/src/vpu/myriad_plugin/myriad_plugin.cpp
+++ b/inference-engine/src/vpu/myriad_plugin/myriad_plugin.cpp
@@ -107,6 +107,7 @@ Engine::Engine(std::shared_ptr<IMvnc> mvnc) :

    _pluginName = "MYRIAD";

+IE_SUPPRESS_DEPRECATED_START
    _config = {
        { KEY_VPU_HW_STAGES_OPTIMIZATION, "ON" },
        { KEY_LOG_LEVEL, "LOG_NONE" },
@@ -120,6 +121,7 @@ Engine::Engine(std::shared_ptr<IMvnc> mvnc) :
        { KEY_CONFIG_FILE, "" },
        { KEY_DEVICE_ID, "" },
    };
+IE_SUPPRESS_DEPRECATED_END
 }

 InferenceEngine::ExecutableNetwork Engine::ImportNetwork(
--- a/inference-engine/tests/functional/inference_engine/cnn_network/cnn_ngraph_impl_tests.cpp
+++ b/inference-engine/tests/functional/inference_engine/cnn_network/cnn_ngraph_impl_tests.cpp
@@ -17,6 +17,7 @@
 #include <ie_core.hpp>
 #include <net_pass.h>

+#include <ngraph/opsets/opset3.hpp>
 #include <ngraph/function.hpp>
 #include <ngraph/variant.hpp>
 #include <ngraph/op/maximum.hpp>
@@ -680,4 +681,25 @@ TEST(CNNNGraphImplTests, TestCheckStats) {
    ASSERT_EQ(nullptr, _stats);
 }

+TEST(CNNNGraphImplTests, CanSetBatchReadValue) {
+    std::shared_ptr<ngraph::Function> ngraph;
+    {
+        auto input = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::f32, ngraph::Shape{1, 2});
+        auto constant = std::make_shared<ngraph::opset3::Constant>(ngraph::element::f32, ngraph::Shape{1, 2},
+                std::vector<float>{1, 2});
+
+        auto read_value = std::make_shared<ngraph::opset3::ReadValue>(constant, "variable_id");
+        auto add = std::make_shared<ngraph::opset3::Add>(input, read_value);
+        auto result = std::make_shared<ngraph::op::Result>(add);
+
+        ngraph::ParameterVector params = {input};
+        ngraph::ResultVector results = {result};
+
+        ngraph = std::make_shared<ngraph::Function>(results, params);
+    }
+
+    InferenceEngine::details::CNNNetworkNGraphImpl cnnNet(ngraph);
+    auto status = cnnNet.getCNNNetwork()->setBatchSize(4, nullptr);
+    EXPECT_EQ(status, StatusCode::OK);
+}
 IE_SUPPRESS_DEPRECATED_END
--- a/inference-engine/tests/functional/inference_engine/net_reader_test.cpp
+++ b/inference-engine/tests/functional/inference_engine/net_reader_test.cpp
@@ -60,9 +60,11 @@ protected:
    /* validates a read network with the reference map of CNN layers */
    void compareWithRef(const InferenceEngine::CNNNetwork &network,
                        const std::vector<InferenceEngine::CNNLayerPtr> &refLayersVec) {
+        IE_SUPPRESS_DEPRECATED_START
        ASSERT_NO_THROW(FuncTestUtils::compareLayerByLayer<std::vector<InferenceEngine::CNNLayerPtr>>(
                InferenceEngine::details::CNNNetSortTopologically(network),
                refLayersVec, false));
+        IE_SUPPRESS_DEPRECATED_END
    }

    const std::string _modelPath = "NetReader_test.xml";
--- a/inference-engine/tests/functional/inference_engine/ngraph_reader/prior_box_tests.cpp
+++ b/inference-engine/tests/functional/inference_engine/ngraph_reader/prior_box_tests.cpp
@@ -30,16 +30,6 @@ TEST_F(NGraphReaderTests, ReadPriorBoxClusteredNetwork) {
                </port>
            </output>
        </layer>
-        <layer id="15" name="in3" type="Parameter" version="opset1">
-            <data element_type="f32" shape="1,2,32400"/>
-            <output>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>32400</dim>
-                </port>
-            </output>
-        </layer>
        <layer id="2" name="shape_of1" type="ShapeOf" version="opset1">
            <input>
                <port id="0" precision="FP32">
@@ -182,63 +172,19 @@ TEST_F(NGraphReaderTests, ReadPriorBoxClusteredNetwork) {
                </port>
            </output>
        </layer>
-        <layer name="concat" id="16" type="Concat" version="opset1">
-            <data axis="1"/>
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>32400</dim>
-                </port>
-                <port id="1" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>32400</dim>
-                </port>
-            </input>
-            <output>
-                <port id="2" precision="FP32">
-                    <dim>1</dim>
-                    <dim>4</dim>
-                    <dim>32400</dim>
-                </port>
-            </output>
-        </layer>
        <layer id="10" name="output" type="Result" version="opset1">
            <input>
                <port id="0" precision="FP32">
                    <dim>1</dim>
-                    <dim>4</dim>
+                    <dim>2</dim>
                    <dim>32400</dim>
                </port>
            </input>
        </layer>
-       <layer id="13" name="output_2" type="Result" version="opset1">
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>768</dim>
-                    <dim>30</dim>
-                    <dim>30</dim>
-                </port>
-            </input>
-        </layer>
-        <layer id="14" name="output_3" type="Result" version="opset1">
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>3</dim>
-                    <dim>512</dim>
-                    <dim>512</dim>
-                </port>
-            </input>
-        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
-        <edge from-layer="0" from-port="0" to-layer="13" to-port="0"/>
        <edge from-layer="1" from-port="0" to-layer="6" to-port="0"/>
-        <edge from-layer="1" from-port="0" to-layer="14" to-port="0"/>
        <edge from-layer="2" from-port="1" to-layer="5" to-port="0"/>
        <edge from-layer="6" from-port="1" to-layer="7" to-port="0"/>
        <edge from-layer="3" from-port="1" to-layer="5" to-port="1"/>
@@ -251,90 +197,66 @@ TEST_F(NGraphReaderTests, ReadPriorBoxClusteredNetwork) {
        <edge from-layer="7" from-port="4" to-layer="8" to-port="1"/>
        <edge from-layer="8" from-port="2" to-layer="11" to-port="0"/>
        <edge from-layer="12" from-port="0" to-layer="11" to-port="1"/>
-        <edge from-layer="11" from-port="2" to-layer="16" to-port="1"/>
-        <edge from-layer="16" from-port="2" to-layer="10" to-port="0"/>
-        <edge from-layer="15" from-port="0" to-layer="16" to-port="0"/>
+        <edge from-layer="11" from-port="2" to-layer="10" to-port="0"/>
    </edges>
 </net>
 )V0G0N";
    std::string modelV5 = R"V0G0N(
 <net name="Network" version="5" precision="FP32" batch="1">
- 	<layers>
-		<layer name="in2" type="Input" precision="FP32" id="0">
-			<data originalLayersNames="in2" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>3</dim>
-					<dim>512</dim>
-					<dim>512</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="in1" type="Input" precision="FP32" id="1">
-			<data originalLayersNames="in1" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>768</dim>
-					<dim>30</dim>
-					<dim>30</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="in3" type="Input" precision="FP32" id="2">
-			<data originalLayersNames="in3" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>32400</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="Constant_49" type="Const" precision="FP32" id="3">
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>32400</dim>
-				</port>
-			</output>
-			<blobs>
-				<custom offset="0" size="259200" precision="FP32" />
-			</blobs>
-		</layer>
-		<layer name="concat" type="Concat" precision="FP32" id="4">
-			<data axis="1" originalLayersNames="concat" />
-			<input>
-				<port id="0">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>32400</dim>
-				</port>
-				<port id="1">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>32400</dim>
-				</port>
-			</input>
-			<output>
-				<port id="2" precision="FP32">
-					<dim>1</dim>
-					<dim>4</dim>
-					<dim>32400</dim>
-				</port>
-			</output>
-		</layer>
-	</layers>
-	<edges>
-		<edge from-layer="2" from-port="0" to-layer="4" to-port="0" />
-		<edge from-layer="3" from-port="0" to-layer="4" to-port="1" />
-	</edges>
+    <layers>
+        <layer id="0" name="in1" type="Input" precision="FP32">
+            <output>
+                <port id="0">
+                    <dim>1</dim>
+                    <dim>768</dim>
+                    <dim>30</dim>
+                    <dim>30</dim>
+                </port>
+            </output>
+        </layer>
+        <layer id="1" name="in2" type="Input" precision="FP32">
+            <output>
+                <port id="0">
+                    <dim>1</dim>
+                    <dim>3</dim>
+                    <dim>512</dim>
+                    <dim>512</dim>
+                </port>
+            </output>
+        </layer>
+        <layer name="ExpandDims" id="2" type="PriorBoxClustered" precision="FP32">
+            <data clip="0" step_h="16.000000" step_w="16.000000" flip="1" height="44,10,30,19,94,32,61,53,17" offset="0.500000" step="16.000000" variance="0.1,0.1,0.2,0.2" width="86,13,57,39,68,34,142,50,23" originalLayersNames="ExpandDims,prior,shape_of1,shape_of2,ss1,ss2"/>
+            <input>
+                <port id="1">
+                    <dim>1</dim>
+                    <dim>768</dim>
+                    <dim>30</dim>
+                    <dim>30</dim>
+                </port>
+                <port id="2">
+                    <dim>1</dim>
+                    <dim>3</dim>
+                    <dim>512</dim>
+                    <dim>512</dim>
+                </port>
+            </input>
+            <output>
+                <port id="3">
+                    <dim>1</dim>
+                    <dim>2</dim>
+                    <dim>32400</dim>
+                </port>
+            </output>
+        </layer>
+    </layers>
+    <edges>
+        <edge from-layer="0" from-port="0" to-layer="2" to-port="1"/>
+        <edge from-layer="1" from-port="0" to-layer="2" to-port="2"/>
+    </edges>
 </net>
 )V0G0N";

-    compareIRs(model, modelV5, 259200, [](Blob::Ptr& weights) {
+    compareIRs(model, modelV5, 50, [](Blob::Ptr& weights) {
                auto* buffer = weights->buffer().as<int64_t*>();
                buffer[0] = 2;
                buffer[1] = 4;
@@ -369,16 +291,6 @@ TEST_F(NGraphReaderTests, ReadPriorBoxNetwork) {
                </port>
            </output>
        </layer>
-        <layer id="15" name="in3" type="Parameter" version="opset1">
-            <data element_type="f32" shape="1,2,14400"/>
-            <output>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>14400</dim>
-                </port>
-            </output>
-        </layer>
        <layer id="2" name="shape_of1" type="ShapeOf" version="opset1">
            <input>
                <port id="0" precision="FP32">
@@ -520,63 +432,19 @@ TEST_F(NGraphReaderTests, ReadPriorBoxNetwork) {
                </port>
            </output>
        </layer>
-        <layer name="concat" id="16" type="Concat" version="opset1">
-            <data axis="1"/>
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>14400</dim>
-                </port>
-                <port id="1" precision="FP32">
-                    <dim>1</dim>
-                    <dim>2</dim>
-                    <dim>14400</dim>
-                </port>
-            </input>
-            <output>
-                <port id="2" precision="FP32">
-                    <dim>1</dim>
-                    <dim>4</dim>
-                    <dim>14400</dim>
-                </port>
-            </output>
-        </layer>
        <layer id="10" name="output" type="Result" version="opset1">
            <input>
                <port id="0" precision="FP32">
                    <dim>1</dim>
-                    <dim>4</dim>
+                    <dim>2</dim>
                    <dim>14400</dim>
                </port>
            </input>
        </layer>
-        <layer id="13" name="output_2" type="Result" version="opset1">
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>768</dim>
-                    <dim>30</dim>
-                    <dim>30</dim>
-                </port>
-            </input>
-        </layer>
-        <layer id="14" name="output_3" type="Result" version="opset1">
-            <input>
-                <port id="0" precision="FP32">
-                    <dim>1</dim>
-                    <dim>3</dim>
-                    <dim>512</dim>
-                    <dim>512</dim>
-                </port>
-            </input>
-        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="2" to-port="0"/>
-        <edge from-layer="0" from-port="0" to-layer="13" to-port="0"/>
        <edge from-layer="1" from-port="0" to-layer="6" to-port="0"/>
-        <edge from-layer="1" from-port="0" to-layer="14" to-port="0"/>
        <edge from-layer="2" from-port="1" to-layer="5" to-port="0"/>
        <edge from-layer="6" from-port="1" to-layer="7" to-port="0"/>
        <edge from-layer="3" from-port="1" to-layer="5" to-port="1"/>
@@ -589,90 +457,66 @@ TEST_F(NGraphReaderTests, ReadPriorBoxNetwork) {
        <edge from-layer="7" from-port="4" to-layer="8" to-port="1"/>
        <edge from-layer="8" from-port="2" to-layer="11" to-port="0"/>
        <edge from-layer="12" from-port="0" to-layer="11" to-port="1"/>
-        <edge from-layer="11" from-port="2" to-layer="16" to-port="0"/>
-        <edge from-layer="15" from-port="0" to-layer="16" to-port="1"/>
-        <edge from-layer="16" from-port="2" to-layer="10" to-port="0"/>
+        <edge from-layer="11" from-port="2" to-layer="10" to-port="0"/>
    </edges>
 </net>
 )V0G0N";
    std::string modelV5 = R"V0G0N(
 <net name="Network" version="5" precision="FP32" batch="1">
-	<layers>
-		<layer name="in2" type="Input" precision="FP32" id="0">
-			<data originalLayersNames="in2" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>3</dim>
-					<dim>512</dim>
-					<dim>512</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="in1" type="Input" precision="FP32" id="1">
-			<data originalLayersNames="in1" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>768</dim>
-					<dim>30</dim>
-					<dim>30</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="Constant_49" type="Const" precision="FP32" id="2">
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>14400</dim>
-				</port>
-			</output>
-			<blobs>
-				<custom offset="0" size="115200" precision="FP32" />
-			</blobs>
-		</layer>
-		<layer name="in3" type="Input" precision="FP32" id="3">
-			<data originalLayersNames="in3" />
-			<output>
-				<port id="0" precision="FP32">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>14400</dim>
-				</port>
-			</output>
-		</layer>
-		<layer name="concat" type="Concat" precision="FP32" id="4">
-			<data axis="1" originalLayersNames="concat" />
-			<input>
-				<port id="0">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>14400</dim>
-				</port>
-				<port id="1">
-					<dim>1</dim>
-					<dim>2</dim>
-					<dim>14400</dim>
-				</port>
-			</input>
-			<output>
-				<port id="2" precision="FP32">
-					<dim>1</dim>
-					<dim>4</dim>
-					<dim>14400</dim>
-				</port>
-			</output>
-		</layer>
-	</layers>
-	<edges>
-		<edge from-layer="2" from-port="0" to-layer="4" to-port="0" />
-		<edge from-layer="3" from-port="0" to-layer="4" to-port="1" />
-	</edges>
+    <layers>
+        <layer id="0" name="in1" type="Input" precision="FP32">
+            <output>
+                <port id="0">
+                    <dim>1</dim>
+                    <dim>768</dim>
+                    <dim>30</dim>
+                    <dim>30</dim>
+                </port>
+            </output>
+        </layer>
+        <layer id="1" name="in2" type="Input" precision="FP32">
+            <output>
+                <port id="0">
+                    <dim>1</dim>
+                    <dim>3</dim>
+                    <dim>512</dim>
+                    <dim>512</dim>
+                </port>
+            </output>
+        </layer>
+        <layer name="ExpandDims" id="2" type="PriorBox" precision="FP32">
+            <data density="" fixed_ratio="" fixed_size="" aspect_ratio="2,0.5" clip="0" flip="0" img_h="0" img_size="0" img_w="0" max_size="" min_size="51.200001,72.407555" offset="0.500000" scale_all_sizes="0" step="17.066666666666666" step_h="0" step_w="0" variance="0.1,0.1,0.2,0.2" originalLayersNames="ExpandDims,prior,shape_of1,shape_of2,ss1,ss2"/>
+            <input>
+                <port id="1">
+                    <dim>1</dim>
+                    <dim>768</dim>
+                    <dim>30</dim>
+                    <dim>30</dim>
+                </port>
+                <port id="2">
+                    <dim>1</dim>
+                    <dim>3</dim>
+                    <dim>512</dim>
+                    <dim>512</dim>
+                </port>
+            </input>
+            <output>
+                <port id="3">
+                    <dim>1</dim>
+                    <dim>2</dim>
+                    <dim>14400</dim>
+                </port>
+            </output>
+        </layer>
+    </layers>
+    <edges>
+        <edge from-layer="0" from-port="0" to-layer="2" to-port="1"/>
+        <edge from-layer="1" from-port="0" to-layer="2" to-port="2"/>
+    </edges>
 </net>
 )V0G0N";

-    compareIRs(model, modelV5, 115200, [](Blob::Ptr& weights) {
+    compareIRs(model, modelV5, 40, [](Blob::Ptr& weights) {
                auto* buffer = weights->buffer().as<int64_t*>();
                buffer[0] = 2;
                buffer[1] = 4;
--- a/inference-engine/tests/functional/inference_engine/ngraph_reader/proposal_tests.cpp
+++ b/inference-engine/tests/functional/inference_engine/ngraph_reader/proposal_tests.cpp
@@ -3,6 +3,7 @@
 //

 #include <string>
+#include <generic_ie.hpp>
 #include "ngraph_reader_tests.hpp"
 TEST_F(NGraphReaderTests, ReadProposalNetwork) {
    std::string model_v10 = R"V0G0N(
@@ -305,3 +306,100 @@ TEST_F(NGraphReaderTests, ReadProposalNetwork_2) {

    compareIRs(model_v10, model_v6, 32);
 }
+
+TEST_F(NGraphReaderTests, ReadExtensionProposalNetwork) {
+    std::string model_v10 = R"V0G0N(
+<net name="Network" version="10">
+    <layers>
+        <layer id="0" name="in1" type="Parameter" version="opset1">
+            <data element_type="f32" shape="1,12,34,62"/>
+            <output>
+                <port id="0" precision="FP32">
+                    <dim>1</dim>
+                    <dim>12</dim>
+                    <dim>34</dim>
+                    <dim>62</dim>
+                </port>
+            </output>
+        </layer>
+        <layer id="1" name="in2" type="Parameter" version="opset1">
+            <data element_type="f32" shape="1,24,34,62"/>
+            <output>
+                <port id="0" precision="FP32">
+                    <dim>1</dim>
+                    <dim>24</dim>
+                    <dim>34</dim>
+                    <dim>62</dim>
+                </port>
+            </output>
+        </layer>
+        <layer id="2" name="in3" type="Const" version="opset1">
+            <data offset="0" size="24"/>
+            <output>
+                <port id="0" precision="I64">
+                    <dim>3</dim>
+                </port>
+            </output>
+        </layer>
+        <layer name="proposal" type="Proposal" precision="FP32" id="3" version="extension">
+            <data feat_stride="16" base_size="16" min_size="16" ratio="2.669000" scale="4.000000,6.000000,9.000000,16.000000,24.000000,32.000000" pre_nms_topn="6000" post_nms_topn="200" nms_thresh="0.600000"/>
+            <input>
+                <port id="1">
+                    <dim>1</dim>
+                    <dim>12</dim>
+                    <dim>34</dim>
+                    <dim>62</dim>
+                </port>
+                <port id="2">
+                    <dim>1</dim>
+                    <dim>24</dim>
+                    <dim>34</dim>
+                    <dim>62</dim>
+                </port>
+                <port id="3">
+                    <dim>3</dim>
+                </port>
+            </input>
+            <output>
+                <port id="3" precision="FP32">
+                    <dim>1000</dim>
+                    <dim>5</dim>
+                </port>
+                <port id="4" precision="FP32">
+                    <dim>1000</dim>
+                </port>
+            </output>
+        </layer>
+        <layer id="4" name="output" type="Result" version="opset1">
+            <input>
+                <port id="0" precision="FP32">
+                    <dim>200</dim>
+                    <dim>5</dim>
+                </port>
+            </input>
+        </layer>
+    </layers>
+    <edges>
+        <edge from-layer="0" from-port="0" to-layer="3" to-port="1"/>
+        <edge from-layer="1" from-port="0" to-layer="3" to-port="2"/>
+        <edge from-layer="2" from-port="0" to-layer="3" to-port="3"/>
+        <edge from-layer="3" from-port="4" to-layer="4" to-port="0"/>
+    </edges>
+    </net>
+    )V0G0N";
+
+    Core ie;
+    Blob::Ptr weights;
+
+    weights = make_shared_blob<uint8_t>(TensorDesc(Precision::U8, {24}, Layout::C));
+    weights->allocate();
+    CommonTestUtils::fill_data(weights->buffer().as<float *>(), weights->size() / sizeof(float));
+
+    auto func = ie.ReadNetwork(model_v10, weights).getFunction();
+    for (auto op : func->get_ordered_ops()) {
+        if (op->get_friendly_name() == "proposal" && op->get_type_info() == ngraph::op::GenericIE::type_info) {
+            return;
+        }
+    }
+    FAIL() << "Custom proposal layer is not a Generic operation!";
+}
--- a/inference-engine/tests/functional/inference_engine/transformations/const_folding_prior_box.cpp
+++ b/inference-engine/tests/functional/inference_engine/transformations/const_folding_prior_box.cpp
@@ -1,218 +0,0 @@
-// Copyright (C) 2020 Intel Corporation
-// SPDX-License-Identifier: Apache-2.0
-//
-
-#include <gtest/gtest.h>
-
-#include "common_test_utils/test_common.hpp"
-#include <string>
-#include <memory>
-
-#include <ngraph/opsets/opset3.hpp>
-#include <ngraph/function.hpp>
-#include <transformations/init_node_info.hpp>
-#include <ngraph/pass/constant_folding.hpp>
-#include <ngraph/ops.hpp>
-#include "ngraph_test_utils.hpp"
-
-using namespace testing;
-
-TEST(TransformationTests, ConstFoldingPriorBox) {
-    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
-
-    {
-        auto in = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        ngraph::op::PriorBoxAttrs attrs;
-        attrs.min_size = {256.0f};
-        attrs.max_size = {315.0f};
-        attrs.aspect_ratio = {2.0f};
-        attrs.flip = true;
-        attrs.scale_all_sizes = true;
-
-        auto layer_shape = ngraph::opset3::Constant::create<int64_t>(ngraph::element::i64, ngraph::Shape{2}, {1, 1});
-        auto image_shape = ngraph::opset3::Constant::create<int64_t>(ngraph::element::i64, ngraph::Shape{2}, {300, 300});
-        auto pb = std::make_shared<ngraph::opset3::PriorBox>(layer_shape, image_shape, attrs);
-        auto res = std::make_shared<ngraph::opset3::Result>(pb);
-        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{in});
-        ngraph::pass::InitNodeInfo().run_on_function(f);
-        ngraph::pass::ConstantFolding().run_on_function(f);
-        ASSERT_NO_THROW(check_rt_info(f));
-    }
-
-    {
-        auto layer_shape = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        auto const_prior_box = ngraph::opset3::Constant::create<float>(ngraph::element::f32, ngraph::Shape{2, 16},
-                { -0.426667, -0.426667, 0.426667, 0.426667, -0.473286, -0.473286, 0.473286, 0.473286,
-                          -0.603398, -0.301699, 0.603398, 0.301699, -0.301699, -0.603398, 0.301699, 0.603398,
-                          0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
-                });
-        auto res = std::make_shared<ngraph::opset3::Result>(const_prior_box);
-        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{layer_shape});
-    }
-
-    auto res = compare_functions(f, f_ref);
-    ASSERT_TRUE(res.first) << res.second;
-
-    auto fused = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-    auto ref = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-
-    EXPECT_TRUE(fused != nullptr);
-    EXPECT_TRUE(ref != nullptr);
-    EXPECT_TRUE(fused->get_vector<float>() == ref->get_vector<float>());
-}
-
-TEST(TransformationTests, ConstFoldingPriorBoxClustered) {
-    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
-
-    {
-        auto in = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        ngraph::op::PriorBoxClusteredAttrs attrs;
-        attrs.widths = {4.0f, 2.0f, 3.2f};
-        attrs.heights = {1.0f, 2.0f, 1.1f};
-
-        auto layer_shape = ngraph::opset3::Constant::create<int64_t>(ngraph::element::i64, ngraph::Shape{2}, {2, 2});
-        auto image_shape = ngraph::opset3::Constant::create<int64_t>(ngraph::element::i64, ngraph::Shape{2}, {300, 300});
-        auto pb = std::make_shared<ngraph::opset3::PriorBoxClustered>(layer_shape, image_shape, attrs);
-        auto res = std::make_shared<ngraph::opset3::Result>(pb);
-        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{in});
-        ngraph::pass::InitNodeInfo().run_on_function(f);
-        ngraph::pass::ConstantFolding().run_on_function(f);
-        ASSERT_NO_THROW(check_rt_info(f));
-    }
-
-    {
-        auto layer_shape = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        auto const_prior_box = ngraph::opset3::Constant::create<float>(ngraph::element::f32, ngraph::Shape{2, 48},
-                { -0.00666667, -0.00166667, 0.00666667, 0.00166667, -0.00333333, -0.00333333, 0.00333333,
-                          0.00333333, -0.00533333, -0.00183333, 0.00533333, 0.00183333, -0.00333333, -0.00166667,
-                          0.01, 0.00166667, 0, -0.00333333, 0.00666667, 0.00333333, -0.002, -0.00183333, 0.00866667,
-                          0.00183333, -0.00666667, 0.00166667, 0.00666667, 0.005, -0.00333333, 0, 0.00333333,
-                          0.00666667, -0.00533333, 0.0015, 0.00533333, 0.00516667, -0.00333333, 0.00166667, 0.01,
-                          0.005, 0, 0, 0.00666667, 0.00666667, -0.002, 0.0015, 0.00866667, 0.00516667, 0.1, 0.1,
-                          0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
-                });
-        auto res = std::make_shared<ngraph::opset3::Result>(const_prior_box);
-        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{layer_shape});
-    }
-
-    auto res = compare_functions(f, f_ref);
-    ASSERT_TRUE(res.first) << res.second;
-
-    auto fused = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-    auto ref = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-
-    EXPECT_TRUE(fused != nullptr);
-    EXPECT_TRUE(ref != nullptr);
-    EXPECT_TRUE(fused->get_vector<float>() == ref->get_vector<float>());
-}
-
-TEST(TransformationTests, ConstFoldingPriorBoxSubgraph) {
-    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
-
-    {
-        auto in = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2, 3, 1, 1});
-        auto in_2 = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2, 3, 300, 300});
-        ngraph::op::PriorBoxAttrs attrs;
-        attrs.min_size = {256.0f};
-        attrs.max_size = {315.0f};
-        attrs.aspect_ratio = {2.0f};
-        attrs.flip = true;
-        attrs.scale_all_sizes = true;
-
-        auto layer_shape = std::make_shared<ngraph::opset3::ShapeOf>(in);
-        auto image_shape = std::make_shared<ngraph::opset3::ShapeOf>(in_2);
-
-        auto begin  = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {2});
-        auto end    = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {4});
-        auto stride = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {1});
-        auto ss_data = std::make_shared<ngraph::opset3::StridedSlice>(layer_shape, begin, end, stride,
-                std::vector<int64_t>{0}, std::vector<int64_t>{0});
-
-        auto ss_image = std::make_shared<ngraph::opset3::StridedSlice>(image_shape, begin, end, stride,
-                                                                      std::vector<int64_t>{0}, std::vector<int64_t>{0});
-        auto pb = std::make_shared<ngraph::opset3::PriorBox>(ss_data, ss_image, attrs);
-        auto res = std::make_shared<ngraph::opset3::Result>(pb);
-        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{in, in_2});
-        ngraph::pass::InitNodeInfo().run_on_function(f);
-        ngraph::pass::ConstantFolding().run_on_function(f);
-        ASSERT_NO_THROW(check_rt_info(f));
-    }
-
-    {
-        auto layer_shape = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        auto const_prior_box = ngraph::opset3::Constant::create<float>(ngraph::element::f32, ngraph::Shape{2, 16},
-                { -0.426667, -0.426667, 0.426667, 0.426667, -0.473286, -0.473286, 0.473286, 0.473286,
-                          -0.603398, -0.301699, 0.603398, 0.301699, -0.301699, -0.603398, 0.301699, 0.603398,
-                          0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1
-                });
-        auto res = std::make_shared<ngraph::opset3::Result>(const_prior_box);
-        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{layer_shape});
-    }
-
-    auto res = compare_functions(f, f_ref);
-    ASSERT_TRUE(res.first) << res.second;
-
-    auto fused = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-    auto ref = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-
-    EXPECT_TRUE(fused != nullptr);
-    EXPECT_TRUE(ref != nullptr);
-    EXPECT_TRUE(fused->get_vector<float>() == ref->get_vector<float>());
-}
-
-TEST(TransformationTests, ConstFoldingPriorBoxClusteredSubgraph) {
-    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
-
-    {
-        auto in = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2, 3, 2, 2});
-        auto in_2 = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2, 3, 300, 300});
-        ngraph::op::PriorBoxClusteredAttrs attrs;
-        attrs.widths = {4.0f, 2.0f, 3.2f};
-        attrs.heights = {1.0f, 2.0f, 1.1f};
-
-        auto layer_shape = std::make_shared<ngraph::opset3::ShapeOf>(in);
-        auto image_shape = std::make_shared<ngraph::opset3::ShapeOf>(in_2);
-
-        auto begin  = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {2});
-        auto end    = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {4});
-        auto stride = ngraph::opset3::Constant::create(ngraph::element::i64, ngraph::Shape{1}, {1});
-        auto ss_data = std::make_shared<ngraph::opset3::StridedSlice>(layer_shape, begin, end, stride,
-                                                                      std::vector<int64_t>{0}, std::vector<int64_t>{0});
-
-        auto ss_image = std::make_shared<ngraph::opset3::StridedSlice>(image_shape, begin, end, stride,
-                                                                       std::vector<int64_t>{0}, std::vector<int64_t>{0});
-        auto pb = std::make_shared<ngraph::opset3::PriorBoxClustered>(ss_data, ss_image, attrs);
-        auto res = std::make_shared<ngraph::opset3::Result>(pb);
-        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{in, in_2});
-        ngraph::pass::InitNodeInfo().run_on_function(f);
-        ngraph::pass::ConstantFolding().run_on_function(f);
-        ASSERT_NO_THROW(check_rt_info(f));
-    }
-
-    {
-        auto layer_shape = std::make_shared<ngraph::opset3::Parameter>(ngraph::element::i64, ngraph::Shape{2});
-        auto const_prior_box = ngraph::opset3::Constant::create<float>(ngraph::element::f32, ngraph::Shape{2, 48},
-                { -0.00666667, -0.00166667, 0.00666667, 0.00166667, -0.00333333, -0.00333333, 0.00333333,
-                          0.00333333, -0.00533333, -0.00183333, 0.00533333, 0.00183333, -0.00333333, -0.00166667,
-                          0.01, 0.00166667, 0, -0.00333333, 0.00666667, 0.00333333, -0.002, -0.00183333, 0.00866667,
-                          0.00183333, -0.00666667, 0.00166667, 0.00666667, 0.005, -0.00333333, 0, 0.00333333,
-                          0.00666667, -0.00533333, 0.0015, 0.00533333, 0.00516667, -0.00333333, 0.00166667, 0.01,
-                          0.005, 0, 0, 0.00666667, 0.00666667, -0.002, 0.0015, 0.00866667, 0.00516667, 0.1, 0.1,
-                          0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-                          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
-                });
-        auto res = std::make_shared<ngraph::opset3::Result>(const_prior_box);
-        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{res}, ngraph::ParameterVector{layer_shape});
-    }
-
-    auto res = compare_functions(f, f_ref);
-    ASSERT_TRUE(res.first) << res.second;
-
-    auto fused = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-    auto ref = std::dynamic_pointer_cast<ngraph::opset3::Constant>(f->get_result()->input_value(0).get_node_shared_ptr());
-
-    EXPECT_TRUE(fused != nullptr);
-    EXPECT_TRUE(ref != nullptr);
-    EXPECT_TRUE(fused->get_vector<float>() == ref->get_vector<float>());
-}
--- a/inference-engine/tests/functional/inference_engine/transformations/convert_divide.cpp
+++ b/inference-engine/tests/functional/inference_engine/transformations/convert_divide.cpp
@@ -0,0 +1,73 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <gtest/gtest.h>
+
+#include <string>
+#include <memory>
+#include <queue>
+
+#include <ngraph/function.hpp>
+#include <ngraph/opsets/opset1.hpp>
+#include <transformations/convert_divide.hpp>
+#include <transformations/init_node_info.hpp>
+#include <transformations/utils/utils.hpp>
+
+#include "ngraph_test_utils.hpp"
+
+using namespace testing;
+
+TEST(TransformationTests, ConvertDivide) {
+    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
+    {
+        auto data = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::f32, ngraph::Shape{3, 1, 2});
+        auto divide_constant = ngraph::opset1::Constant::create(ngraph::element::f32, ngraph::Shape{1}, {1.5});
+        auto divide = std::make_shared<ngraph::opset1::Divide>(data, divide_constant);
+
+        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{divide}, ngraph::ParameterVector{data});
+
+        ngraph::pass::InitNodeInfo().run_on_function(f);
+        ngraph::pass::ConvertDivide().run_on_function(f);
+        ASSERT_NO_THROW(check_rt_info(f));
+    }
+
+    {
+        auto data = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::f32, ngraph::Shape{3, 1, 2});
+        auto divide_constant = ngraph::opset1::Constant::create(ngraph::element::f32, ngraph::Shape{1}, {1.5});
+        auto pow = std::make_shared<ngraph::opset1::Power>(divide_constant,
+                                                           ngraph::opset1::Constant::create(ngraph::element::f32, ngraph::Shape{1}, {-1}));
+        auto mul = std::make_shared<ngraph::opset1::Multiply>(data, pow);
+
+        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{mul}, ngraph::ParameterVector{data});
+    }
+
+    auto res = compare_functions(f, f_ref);
+    ASSERT_TRUE(res.first) << res.second;
+}
+
+TEST(TransformationTests, ConvertDivideNegative) {
+    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
+    {
+        auto data = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::i32, ngraph::Shape{3, 1, 2});
+        auto divide_constant = ngraph::opset1::Constant::create(ngraph::element::i32, ngraph::Shape{1}, {2});
+        auto divide = std::make_shared<ngraph::opset1::Divide>(data, divide_constant);
+
+        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{divide}, ngraph::ParameterVector{data});
+
+        ngraph::pass::InitNodeInfo().run_on_function(f);
+        ngraph::pass::ConvertDivide().run_on_function(f);
+        ASSERT_NO_THROW(check_rt_info(f));
+    }
+
+    {
+        auto data = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::i32, ngraph::Shape{3, 1, 2});
+        auto divide_constant = ngraph::opset1::Constant::create(ngraph::element::i32, ngraph::Shape{1}, {2});
+        auto divide = std::make_shared<ngraph::opset1::Divide>(data, divide_constant);
+
+        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{divide}, ngraph::ParameterVector{data});
+    }
+
+    auto res = compare_functions(f, f_ref);
+    ASSERT_TRUE(res.first) << res.second;
+}
--- a/inference-engine/tests/functional/inference_engine/transformations/convert_strided_slice_to_crop_test.cpp
+++ b/inference-engine/tests/functional/inference_engine/transformations/convert_strided_slice_to_crop_test.cpp
@@ -177,6 +177,56 @@ TEST(TransformationTests, ConvertStridedSliceToCropNegative) {
        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{sslice}, ngraph::ParameterVector{input});
    }

+    auto res = compare_functions(f, f_ref);
+    ASSERT_TRUE(res.first) << res.second;
+}
+
+// in this test the Crop will get 3D input which is not supported so the transformation will not be applied
+TEST(TransformationTests, ConvertStridedSliceToCropNegative2) {
+    std::shared_ptr<ngraph::Function> f(nullptr), f_ref(nullptr);
+    {
+        auto input        = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::f32, ngraph::Shape{128, 1});
+        auto slice_begin  = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {0, 0, 0});
+        auto slice_end    = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {0, 0, 0});
+        auto slice_stride = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {1, 1, 1});
+
+        std::vector<int64_t> begin_mask       = {0, 1, 1};
+        std::vector<int64_t> end_mask         = {0, 1, 1};
+        std::vector<int64_t> new_axis_mask    = {1, 0, 0};
+        std::vector<int64_t> shrink_axis_mask = {0, 0, 0};
+        std::vector<int64_t> ellipsis_mask    = {0, 0, 0};
+
+        auto sslice = std::make_shared<ngraph::opset1::StridedSlice>(input, slice_begin, slice_end, slice_stride,
+                                                                     begin_mask, end_mask,
+                                                                     new_axis_mask, shrink_axis_mask, ellipsis_mask);
+        sslice->set_friendly_name("strided_slice");
+
+        f = std::make_shared<ngraph::Function>(ngraph::NodeVector{sslice}, ngraph::ParameterVector{input});
+        ngraph::pass::InitNodeInfo().run_on_function(f);
+        ngraph::pass::ConvertStridedSliceToCrop().run_on_function(f);
+        ASSERT_NO_THROW(check_rt_info(f));
+    }
+
+    {
+        auto input        = std::make_shared<ngraph::opset1::Parameter>(ngraph::element::f32, ngraph::Shape{128, 1});
+        auto slice_begin  = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {0, 0, 0});
+        auto slice_end    = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {0, 0, 0});
+        auto slice_stride = ngraph::opset1::Constant::create(ngraph::element::i64, ngraph::Shape{3}, {1, 1, 1});
+
+        std::vector<int64_t> begin_mask       = {0, 1, 1};
+        std::vector<int64_t> end_mask         = {0, 1, 1};
+        std::vector<int64_t> new_axis_mask    = {1, 0, 0};
+        std::vector<int64_t> shrink_axis_mask = {0, 0, 0};
+        std::vector<int64_t> ellipsis_mask    = {0, 0, 0};
+
+        auto sslice = std::make_shared<ngraph::opset1::StridedSlice>(input, slice_begin, slice_end, slice_stride,
+                                                                     begin_mask, end_mask,
+                                                                     new_axis_mask, shrink_axis_mask, ellipsis_mask);
+        sslice->set_friendly_name("strided_slice");
+
+        f_ref = std::make_shared<ngraph::Function>(ngraph::NodeVector{sslice}, ngraph::ParameterVector{input});
+    }
+
    auto res = compare_functions(f, f_ref);
    ASSERT_TRUE(res.first) << res.second;
 }
--- a/inference-engine/tests/functional/inference_engine/transformations/convert_topk3_test.cpp
+++ b/inference-engine/tests/functional/inference_engine/transformations/convert_topk3_test.cpp
@@ -157,5 +157,6 @@ TEST(TransformationTests, ConvertTopK3I64Output1) {
    ASSERT_TRUE(res.first) << res.second;

    auto result_node_of_converted_f = f->get_output_op(0);
-    auto topk_node = result_node_of_converted_f->input(0).get_source_output().get_node_shared_ptr();
+    auto convert_node = result_node_of_converted_f->input(0).get_source_output().get_node_shared_ptr();
+    ASSERT_TRUE(convert_node->get_friendly_name() == "topk.1") << "Transformation ConvertTopK3 should keep output names.\n";
 }
--- a/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/skip_tests_config.cpp
+++ b/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/skip_tests_config.cpp
@@ -11,14 +11,15 @@ std::vector<std::string> disabledTestPatterns() {
    return {
        // TODO: Issue 26264
        R"(.*(MaxPool|AvgPool).*S\(1\.2\).*Rounding=CEIL.*)",
-        // TODO: Issue 31839
-        R"(.*(QuantConvBackpropData3D).*)",
        // TODO: Issue 31841
        R"(.*(QuantGroupConvBackpropData3D).*)",
        // TODO: Issue 31843
-        R"(.*(QuantGroupConvBackpropData2D)*QG=Perchannel.*)",
-        // TODO: Issue 32023
-        R"(.*(QuantGroupConvBackpropData2D)*QG=Pertensor.*)",
+        R"(.*(QuantConvBackpropData3D).*)",
+        R"(.*(QuantConvBackpropData2D).*(QG=Perchannel).*)",
+        R"(.*(QuantGroupConvBackpropData2D).*(QG=Perchannel).*)",
+        // TODO: Issue 33886
+        R"(.*(QuantGroupConv2D).*)",
+        R"(.*(QuantGroupConv3D).*)",
        // TODO: Issue 31845
        R"(.*(FakeQuantize).*)",
        R"(.*(EltwiseLayerTest).*IS=\(.*\..*\..*\..*\..*\).*secondaryInputType=PARAMETER.*opType=SCALAR.*)",
--- a/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/subgraph_tests/quantized_convolution_backprop_data.cpp
+++ b/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/subgraph_tests/quantized_convolution_backprop_data.cpp
@@ -19,7 +19,6 @@ const std::vector<InferenceEngine::Precision> netPrecisions = {
 const std::vector<size_t> numOutChannels = {16, 32};

 const std::vector<size_t > levels = {256};
-// FIXME: Perchannel tests fail because of bug in LPT
 const std::vector<QuantizationGranularity > granularity = {Pertensor, Perchannel};

 /* ============= 2D GroupConvolutionBackpropData ============= */
--- a/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/subgraph_tests/quantized_group_convolution.cpp
+++ b/inference-engine/tests/functional/plugin/cpu/shared_tests_instances/subgraph_tests/quantized_group_convolution.cpp
@@ -0,0 +1,86 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <vector>
+
+#include "subgraph_tests/quantized_group_convolution.hpp"
+#include "common_test_utils/test_constants.hpp"
+
+using namespace LayerTestsDefinitions;
+using namespace ngraph::helpers;
+
+namespace {
+
+const std::vector<InferenceEngine::Precision> netPrecisions = {
+    InferenceEngine::Precision::FP32
+};
+
+
+const std::vector<size_t> numOutChannels = {3, 24, 48};
+const std::vector<size_t> numGroups = {3};
+
+const std::vector<size_t > levels = {256};
+const std::vector<QuantizationGranularity> granularity = {Pertensor, Perchannel};
+const std::vector<bool> quantizeWeights = {false, true};
+
+/* ============= 2D GroupConvolution ============= */
+const std::vector<std::vector<size_t >> inputShapes2D = {{1, 3, 10, 10}, {1, 24, 10, 10}};
+const std::vector<std::vector<size_t >> kernels2D = {{1, 1}, {3, 3}};
+const std::vector<std::vector<size_t >> strides2D = {{1, 1}};
+const std::vector<std::vector<ptrdiff_t>> padBegins2D = {{0, 0}};
+const std::vector<std::vector<ptrdiff_t>> padEnds2D = {{0, 0}};
+const std::vector<std::vector<size_t >> dilations2D = {{1, 1}};
+
+
+const auto quantGroupConv2DParams = ::testing::Combine(
+        ::testing::ValuesIn(kernels2D),
+        ::testing::ValuesIn(strides2D),
+        ::testing::ValuesIn(padBegins2D),
+        ::testing::ValuesIn(padEnds2D),
+        ::testing::ValuesIn(dilations2D),
+        ::testing::ValuesIn(numOutChannels),
+        ::testing::ValuesIn(numGroups),
+        ::testing::ValuesIn(levels),
+        ::testing::ValuesIn(granularity),
+        ::testing::ValuesIn(quantizeWeights)
+);
+
+INSTANTIATE_TEST_CASE_P(QuantGroupConv2D, QuantGroupConvLayerTest,
+                        ::testing::Combine(
+                                quantGroupConv2DParams,
+                                ::testing::ValuesIn(netPrecisions),
+                                ::testing::ValuesIn(inputShapes2D),
+                                ::testing::Values(CommonTestUtils::DEVICE_CPU)),
+                        QuantGroupConvLayerTest::getTestCaseName);
+
+/* ============= 3D GroupConvolution ============= */
+const std::vector<std::vector<size_t >> inputShapes3D = {{1, 3, 5, 5, 5}, {1, 24, 5, 5, 5}};
+const std::vector<std::vector<size_t >> kernels3D = {{3, 3, 3}};
+const std::vector<std::vector<size_t >> strides3D = {{1, 1, 1}};
+const std::vector<std::vector<ptrdiff_t>> padBegins3D = {{0, 0, 0}};
+const std::vector<std::vector<ptrdiff_t>> padEnds3D = {{0, 0, 0}};
+const std::vector<std::vector<size_t >> dilations3D = {{1, 1, 1}};
+
+const auto quantGroupConv3DParams = ::testing::Combine(
+        ::testing::ValuesIn(kernels3D),
+        ::testing::ValuesIn(strides3D),
+        ::testing::ValuesIn(padBegins3D),
+        ::testing::ValuesIn(padEnds3D),
+        ::testing::ValuesIn(dilations3D),
+        ::testing::ValuesIn(numOutChannels),
+        ::testing::ValuesIn(numGroups),
+        ::testing::ValuesIn(levels),
+        ::testing::ValuesIn(granularity),
+        ::testing::ValuesIn(quantizeWeights)
+);
+
+INSTANTIATE_TEST_CASE_P(QuantGroupConv3D, QuantGroupConvLayerTest,
+                        ::testing::Combine(
+                                quantGroupConv3DParams,
+                                ::testing::ValuesIn(netPrecisions),
+                                ::testing::ValuesIn(inputShapes3D),
+                                ::testing::Values(CommonTestUtils::DEVICE_CPU)),
+                        QuantGroupConvLayerTest::getTestCaseName);
+
+}  // namespace
--- a/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/concat_quantization.cpp
+++ b/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/concat_quantization.cpp
@@ -21,7 +21,7 @@ const std::vector<std::map<std::string, std::string>> configs = {
    }
 };

-INSTANTIATE_TEST_CASE_P(ConcatQuantization, ConcatQuantization,
+INSTANTIATE_TEST_CASE_P(smoke_ConcatQuantization, ConcatQuantization,
                        ::testing::Combine(
                                ::testing::ValuesIn(netPrecisions),
                                ::testing::Values(CommonTestUtils::DEVICE_GNA),
--- a/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/multioutput_eltwise_squeeze_eltwise.cpp
+++ b/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/multioutput_eltwise_squeeze_eltwise.cpp
@@ -0,0 +1,39 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+#include <vector>
+#include "subgraph_tests/multioutput_eltwise_squeeze_eltwise.hpp"
+#include "common_test_utils/test_constants.hpp"
+
+using namespace LayerTestsDefinitions;
+
+namespace {
+    std::vector<std::vector<std::vector<size_t>>> inputs{
+            {{1, 16}},
+            {{2, 16}},
+            {{1, 160}},
+            {{8, 40}},
+            {{3, 8}},
+            {{4, 32}},
+            {{5, 64}},
+            {{6, 128}},
+            {{7, 256}},
+            {{8, 512}},
+            {{8, 1024}}
+    };
+
+    std::map<std::string, std::string> additional_config = {
+            {"GNA_COMPACT_MODE", "NO"},
+    };
+
+    std::vector<InferenceEngine::Precision> netPrecisions = {InferenceEngine::Precision::FP32,
+                                                             InferenceEngine::Precision::FP16,
+    };
+
+    INSTANTIATE_TEST_CASE_P(multioutput_eltwise_identity, MultioutputEltwiseReshapeEltwise,
+                            ::testing::Combine(
+                                    ::testing::ValuesIn(inputs),
+                                    ::testing::ValuesIn(netPrecisions),
+                                    ::testing::Values(CommonTestUtils::DEVICE_GNA),
+                                    ::testing::Values(additional_config)),
+                            MultioutputEltwiseReshapeEltwise::getTestCaseName);
+}  // namespace
--- a/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/reshapre_permute_reshape.cpp
+++ b/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/reshapre_permute_reshape.cpp
@@ -9,6 +9,7 @@ using namespace LayerTestsDefinitions;
 namespace {
    std::vector<std::vector<std::vector<size_t>>> inputs{
            {{1, 4 , 160}, {0, 2, 1}},
+            {{1, 160, 4}, {0, 2, 1}},
            {{8, 16}, {1, 0}},
            {{1, 1, 4, 16}, {3, 1, 2, 0}},
            {{1, 8, 200}, {0, 2, 1}},
--- a/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/scale_shift.cpp
+++ b/inference-engine/tests/functional/plugin/gna/shared_tests_instances/subgraph_tests/scale_shift.cpp
@@ -0,0 +1,53 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <vector>
+#include "subgraph_tests/scaleshift.hpp"
+#include "common_test_utils/test_constants.hpp"
+
+using namespace LayerTestsDefinitions;
+
+namespace {
+
+    std::vector<std::vector<std::vector<size_t>>> inShapes = {
+            {{1, 8}},
+            {{2, 16}},
+            {{3, 32}},
+            {{4, 64}},
+            {{5, 128}},
+            {{6, 256}},
+            {{7, 512}},
+            {{8, 1024}}
+    };
+
+    std::vector<std::vector<float >> Scales = {
+            {2.0f},
+            {3.0f},
+            {-1.0f},
+            {-2.0f},
+            {-3.0f}
+    };
+
+    std::vector<std::vector<float >> Shifts = {
+            {1.0f},
+            {2.0f},
+            {3.0f},
+            {-1.0f},
+            {-2.0f},
+            {-3.0f}
+    };
+
+    std::vector<InferenceEngine::Precision> netPrecisions = {InferenceEngine::Precision::FP32,
+                                                             InferenceEngine::Precision::FP16,
+    };
+
+    INSTANTIATE_TEST_CASE_P(scale_shift, ScaleShiftLayerTest,
+                            ::testing::Combine(
+                                    ::testing::ValuesIn(inShapes),
+                                    ::testing::ValuesIn(netPrecisions),
+                                    ::testing::Values(CommonTestUtils::DEVICE_GNA),
+                                    ::testing::ValuesIn(Scales),
+                                    ::testing::ValuesIn(Shifts)),
+                            ScaleShiftLayerTest::getTestCaseName);
+}  // namespace
--- a/inference-engine/tests/functional/plugin/gpu/shared_tests_instances/single_layer_tests/prior_box_clustered.cpp
+++ b/inference-engine/tests/functional/plugin/gpu/shared_tests_instances/single_layer_tests/prior_box_clustered.cpp
@@ -60,8 +60,8 @@ INSTANTIATE_TEST_CASE_P(PriorBoxClustered_Basic, PriorBoxClusteredLayerTest,
                        ::testing::Combine(
                            layerSpeficParams,
                            ::testing::ValuesIn(netPrecisions),
-                            ::testing::Values(std::vector<size_t>({ 4, 4 })),
-                            ::testing::Values(std::vector<size_t>({ 50, 50 })),
+                            ::testing::Values(std::vector<size_t>({ 1, 16, 4, 4 })),
+                            ::testing::Values(std::vector<size_t>({ 1, 3, 50, 50 })),
                            ::testing::Values(CommonTestUtils::DEVICE_GPU)),
                        PriorBoxClusteredLayerTest::getTestCaseName
 );
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Alexey Suhov	023e7c2c3f	update system requirements (#1321 ) * update system requirements * update release version in readme	2020-07-14 20:25:39 +03:00
Alexey Suhov	34ddb70f7d	fix build target name in demos for Windows (#1248 )	2020-07-07 18:26:50 +03:00
Andrew Bakalin	21e092122f	[VPU] WA for statis shape allocation (#1106 )	2020-06-24 16:28:59 +03:00
Roman Kazantsev	92c1333653	Correct removing nodes from graph and add test for ConstToResult transform (#1083 ) Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>	2020-06-24 15:39:08 +03:00
Roman Kazantsev	c26ec8b312	[IE] Preserve output data name after merging and update output data map (#1092 ) Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>	2020-06-24 12:30:25 +03:00
Andrew Bakalin	32054ff180	[VPU] Support for originalLayersNames attribute in exec graph (#1073 )	2020-06-23 12:19:15 +03:00
Ilya Churaev	7cff005ada	Disable ref implementations (#951 ) * Add NGRAPH_EVALUATE_ENABLE flag and disable all reference implementations * Enable some evaluate methods * Added dynamic library with reference implementations * Fixed tests * Enabled unsqueeze CF * Removed nGraph test library * Disable all nGraph tests to check * Enable some reference implementations * Added debug message * EVALUATE true * Revert "Disable all nGraph tests to check" This reverts commit 38bca3ed3dfed029e892fe609ea7e48c5cfadb67. * Enable some implementations * Removed some TYPE_CASE reference implementations * Fixed reshape * Revert types for Broadcast and Add * Disabled failing gpu_engine.user_context test * Disabled failed nGraph tests * Add u8 for non_zero * Revert "Added debug message" This reverts commit 4b9f4894f5ae9963426830ac5e5eb833af8847aa. * Revert "Enable some reference implementations" This reverts commit d2001a636df7504e0ad5abe5c98725ef0be07379. Revert "Enabled unsqueeze CF" This reverts commit 814a8e52cb2b673446d24e54ed11af1dd3d80fad. Revert "Enable some evaluate methods" This reverts commit 73767b8942d857bf60317f29120c98c528344a04. * Revert "Add NGRAPH_EVALUATE_ENABLE flag and disable all reference implementations" This reverts commit cfaa7d7e7bf34b617f53a556d24fea2189372592.	2020-06-23 12:17:40 +03:00
Ivan Tikhonov	06707cc53f	Fix for Kaldi models with Memory layers and a batch more than 1 (#1025 ) * fix kaldi models with memory (batch > 1) * apply review comments * Added test for the case using the SetBatchSize function when ReadValue op is in the network * Check status code instead of message * Use new ngraph api	2020-06-23 11:47:18 +03:00
Konrad Dobros	fff93d8f05	[IE CLDNN] Add work-around for 1d input to Gather (#1069 )	2020-06-23 11:44:20 +03:00
Gladilov, Gleb	637ddd5dfb	[IE][VPU]: Fixes klocwork issues (#1075 )	2020-06-23 09:58:12 +03:00
Ivan Tikhonov	fa4c5e8e38	Fix ARM build: explicit type conversion (#1061 ) * fix arm build: explicit type conversion * Use explicit conversion in prior_box_ie.cpp	2020-06-22 23:37:54 +03:00
Maxim Vafin	c9fc6f0531	Fix OneHot transformation for Bert Squad opset 10 (#954 ) * Add transformation for squeezing depth input for ONNX OneHot operation because from some TF models it has shape [1] instead of []	2020-06-22 18:58:07 +03:00
Denis Orlov	c9eb6ae62b	[GNA] Initialize a local variable (#1066 )	2020-06-22 18:49:22 +03:00
Alexander Chaiko	eef56ca80c	[IE CLDNN] WA to 1d input for concat (#1040 )	2020-06-22 15:25:17 +03:00
Gorokhov Dmitriy	36f1c00e02	[CPU] Fixed issue with unsupported reorder case for groupped convolutions (#893 )	2020-06-22 14:06:53 +03:00
Konrad Dobros	5c43765011	[IE CLDNN] Fix activation implementation for fsv16 format (#1038 ) For b_fs_yx_fsv16 format in reference kernel features for dispatch are rounded to multiple of 16. This change adds correct check in kernel to return work-items that are inside this dispatch padding. Previously those work-items could corrupt memory expected to be filled with 0s, and for parametrized activation due to bounds checking with modulo operator they could have been corrupting actual layer output. Issue: CVS-27672	2020-06-22 09:17:00 +03:00
Ilya Lavrenov	bbfc9bbc14	Deprecated IGNORE_IR_STATISTIC VPU option (#1028 )	2020-06-20 10:38:47 +03:00
Pavel Rodionov	9c607528ef	[GNA] Support export model with multiple inputs/outputs and Permute layer (#1024 )	2020-06-19 18:06:38 +03:00
Denis Orlov	ae9e0510f0	[GNA] Additional checks (#998 )	2020-06-19 13:14:32 +03:00
Edward Shogulin	76af547c17	[LPT] BERT with specific biases support & improvement (#968 ) * [LPT] BERT with biases support * [LPT] Gemm biases and quantization * [CPU] Fixed FullyConnected + Depthwise node fusing * [LPT] FullyConnected 3D: symmetric quantization support * [LPT] FullyConnected 3D: symmetric quantization support fix * [CPU] Fixed FullyConnected + Depthwise fusing initialization Co-authored-by: dmitrygo <dmitry.gorokhov@intel.com>	2020-06-19 13:14:20 +03:00
Kamil Magierski	5e97a3123f	Fix cases then const blob precision is not FP32/FP16 (#1000 ) Co-authored-by: kmagiers <kmagiers@intel.com>	2020-06-19 13:13:19 +03:00
Andrey Dmitriev	532dec140b	[GNA] fix permute 0_2_1 (#993 )	2020-06-19 10:20:55 +03:00
Vladimir Paramuzov	c41c6294f9	[IE CLDNN] Fix strided slice (#953 )	2020-06-19 08:23:25 +03:00
Gorokhov Dmitriy	3bbe88e659	[IE Common][WA] Skipped const folding for Convolution layer (#1002 )	2020-06-19 01:25:20 +03:00
Maxim Andronov	2f3d5f68cd	[CPU] fix one dims scale shift (#983 )	2020-06-18 14:21:07 +03:00
Evgeny Talanin	843f81a1cc	[IE TESTS] disable Some myriad tests on Win (#763 ) (#988 ) * [IE TESTS] disable Some myriad tests on Windisable Some myriad tests on Win * Skip test with todo Co-authored-by: Irina Efode <irina.efode@intel.com>	2020-06-18 13:57:21 +03:00
Pavel Esir	c596707a09	fixed some typos in MO help (#979 )	2020-06-18 11:02:28 +03:00
Konrad Dobros	cf60baf2f0	[IE CLDNN] Fix gather dimensions calculation (#960 )	2020-06-18 00:31:17 +03:00
Nikita Kudriavtsev	aeb70036d7	[IE Myriad] Remove Myriad 2 from supported devices in XLink (#978 )	2020-06-17 17:47:55 +03:00
Daria Mityagina	dea04dae8c	[IE Myriad] - WrapInLoop fix: if data has consumer's input inside subgraph - replace them (#958 )	2020-06-17 17:27:17 +03:00
Ilya Churaev	14b44803ba	Fixed cpack information, removed some links (#975 )	2020-06-17 17:17:10 +03:00
Andrey Dmitriev	06286f2aae	[GNA] Added fix multiple output with one go to memory and test (#888 ) [GNA] Added fix multiple output with one go to memory and test [GNA] Added fix multiple output with one go to memory and test [GNA] Added fix multiple output with one go to memory and test Added multi output Update gna_pass_manager.cpp test [GNA] Added fix multiple output with one go to memory and test [GNA] Added fix multiple output with one go to memory and test [GNA] Added fix multiple output with one go to memory and test Added multi output Update gna_pass_manager.cpp test tests [GNA] Added fix multiple output with one go to memory and test [GNA] Added fix multiple output with one go to memory and test Added multi output Update gna_pass_manager.cpp test tests Added pass Test test tests_2 return old	2020-06-17 11:23:56 +03:00
Ilya Churaev	97e5fc4bae	Use creators only for default opsets (#932 )	2020-06-16 12:25:06 +03:00
Alexey Tarakanov	47218284b2	Support fp16 networks for releases_2020_4 (#936 )	2020-06-16 10:31:57 +03:00
Andrey Dmitriev	6079a35b81	[GNA] Added test for ScaleShift and fixed power layer with non-zero shift (#922 ) * [GNA] Added test ScaleShift and fixed power layer with non zero shift added tests [GNA] Added test ScaleShift and fixed power layer with non zero shift * Test Assert * rebuild	2020-06-16 00:32:28 +03:00
Roman Kazantsev	4f4352f301	Fix preserving names of output layers after TopK NGraph transformation (#928 ) * Fix preserving names of output layers after TopK NGraph transformation (#843) * Fix preserving names of output layers after TopK NGraph transformation It helps to infer semantic-segmentation-adas-0001 model. See CVS-31977. Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com> * Fix a test for TopK Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com> * Fix TopK NGraph transformation and its test Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com> * Disable smoke_LoadNetworkAccuracy due to sporadic failure Signed-off-by: Roman Kazantsev <roman.kazantsev@intel.com>	2020-06-15 20:57:45 +03:00
Anastasia Kuporosova	a67d74c41f	[Python API] Fix long inference (#897 )	2020-06-15 16:21:41 +03:00
Ivan Tikhonov	26c563132d	Revert prior box constant folding (#906 ) * Revert "Const folding and reference implementation for PriorBox(Clustered) ops (#785)" This reverts commit `9fc818478a`. * apply codestyle for ngraph part	2020-06-15 12:38:27 +03:00
Ilya Lavrenov	dc1ca195dd	Updated dates of removal for deprecated API (#911 )	2020-06-15 12:24:27 +03:00
Vladimir Paramuzov	f5ad3e6f89	[IE CLDNN] Fixed clone network to preserve original CNNNetwork (#870 )	2020-06-12 15:53:30 +03:00
Konrad Dobros	6c736ce001	[IE CLDNN] Fix fsv16 -> bfyx reorder removal (#873 )	2020-06-12 15:43:54 +03:00
Anastasia Kuporosova	30ab6534e1	[Python API] Fixate requirements (#905 )	2020-06-12 12:06:11 +03:00
Ilya Lavrenov	259a4c25ce	TESTS: Added test for parallel LoadNetwork with accuracy check (#858 )	2020-06-12 11:56:59 +03:00
Andrey Somsikov	347930008c	Use default thread sanitizer linkage (#899 ) GCC and CLang default sanitizer linkage differs (static vs. dynamic). Prefer default behavior as alternate seen having issues. Default (GN)U linker fails with unresolved symbols linking Clang built binaries with sanitizer enabled. Force use LLVM linker lld for Clang builds. Sanitizer instrumentation and link flags should be retained for all binaries. Updating samples cmake configuration to keep those flags after unset logic at the ie_build_samples().	2020-06-12 00:36:03 +03:00
Evgeny Latkin	4fa251483a	[IE][Myriad] fix HW tiling (#894 )	2020-06-11 20:48:56 +03:00
Vladimir Paramuzov	30f8af70fc	[IE CLDNN] fix perf for fsv16 global avg pooling (#666 )	2020-06-11 20:44:37 +03:00
Andrew Bakalin	3fc6d8a188	[VPU] Update firmware (#898 )	2020-06-11 20:44:20 +03:00
Denis Orlov	66c8df6a87	[GNA] Fixes in checks, asserts, etc. (#867 )	2020-06-11 20:04:46 +03:00
Nikolay Shchegolev	e53eb86334	[Common] Static analysed issues. Part II.	2020-06-11 19:59:44 +03:00
Edward Shogulin	2df99d4263	[LPT] Static code analysis issues fix (#889 )	2020-06-11 15:09:20 +03:00
Gleb Kazantaev	deab4d38b0	Fix NopElimination (#869 )	2020-06-11 13:28:27 +03:00
Vladimir Paramuzov	412428f1dd	[IE CLDNN] Always use FP32 as intermediate type for fused quantize (#829 )	2020-06-11 12:22:27 +03:00
Evgeny Lazarev	167c96a8af	Relaxed MO requirements for "protobuf" package (#862 )	2020-06-10 18:26:16 +03:00
Gleb Kazantaev	b7363ba711	Fix divide conversion for integer input type (#853 )	2020-06-10 16:25:57 +03:00
Evgeny Lazarev	5cef9f3734	Fixed StridedSlice to Crop transformation (#836 ) (#845 ) * Fixed StridedSlice to Crop transformation to not apply when rank of data is changed * Added unit test for StridedSlice to Crop transformation	2020-06-10 11:54:02 +03:00