Es/lpt/lpt to ngraph fixes2 with master (#2671)

* [LPT] Replace creation of dequantization with factory * [ngraph][LPT] Add ScaleShift replace for dequantization operations * [LPT] SubtractMultiplyToMultiplyAdd refactoring * [LPT] Code style fix * [LPT] Edit SubtractMultiplyToMultiplyAdd transformation for dequantization * [LPT] Linux compilation quick fix * [LPT] [WIP] runtime info applying * [LPT] Concat transformation functional tests extending * [LPT] MultiplyToConvolution + Subtract to add fusing + improvements in LowPrecisionTransformer * [LPT] linux compilation error fix * [LPT] compilation error * [LPT] MultiplyToGroupConvolution fix: 5D support * [LPT] Multiply transformation extending: FQ weights support - wip * [LPT] FQ folding & precision selection * [LPT] code style fixes * [LPT] code style fixes * [LPT] Linux compilation error fix * [LPT] SubtractMultiplyToMultiplyAdd: refactoring * [LPT] Tests fixes * [LPT] MultiplyToGroupConvolution tests * [LPT] Convert subtract with int inputs to Eltwise sub * [LPT] Constant folding fix for quant models * [LPT] 1) Asymmetric quantization improvement 2) tests extending * [LPT] 2 fixes for se_resnext_50 * [LPT] Add transformation priority branch selection test * [LPT] AddMultiplyFusion: legacy transformation quick fix * [LPT] nGraph tests temporary disabling * [LPT] Fix for eltwise inputs with multiple outputs * [LPT] Fix for FQ fuse * [LPT] Reshape by channel, batch temporary disabled * [nGraph][LPT] MatMul fix for reading FP16 models * [LPT] 1) Add (not after Convolution/GroupConvolution/MatMul with Constant) to Subtract 2) precision selection fix: MultiplyToGroupConvolution quick fix * [LPT] DenseNet improvments: AddTransformation: Add to Subtract + tests * [LPT] AddTransformarion refactoring * [LPT] AddTransformation tests temporay disabled * [LPT] ReshapeTransformation improvements: degradation fix * [LPT] code style fix * [LPT] Concat tests temporary disabling * [LPT] tests unification 1) plugin tests: added test-cases and nGraph-validation for clamp, split and variadic split 2) func tests: added test-cases 3) transformNGraph: added the ability to run additional transformations * [LPT] split & variadic split merge fix * [LPT] Clamp: added support for asymmetric quantization * [LPT] added DequantizationAttr run-time attribute * [LPT] debug info removal * [LPT] ConcatTransformation: zero point fix * [LPT] CNNNetwork ReLU transformation quick fix * [LPT] 1) Concat fix 2) ConcatMultiChannels fix 3) Added "Concat with Split" test-cases 4) Subgraph fix * [LPT] 1) Concat fix 2) Added "Concat with different precision on childs" test-case * [LPT] concat fix Ubuntu18 * [LPT] Concat test fixes * [LPT] Not fp32 FQ input support * [LPT] MatMul Fix + separateInStandaloneBranch Fix * [LPT] Fix reference input types in mish fusion tests * [LPT] Fix cpuFuncTests on CentOS building * [nGraph][LPT] ScaleShift 2d, 3d nGraph conversion enabling * [LPT] 1) FullyConnected workaround removing 2) validate_nodes_and_infer_types for LPT * [ngraph] Add check for childs for ConvertSubtract * [LPT] Squeeze/Unsqueeze tests unification * [LPT] Squeeze/Unsqueeze change signature for getReference/getOriginal * [LPT] Mul & Add -> ScaleShift quick fix * [LPT] nGraph tests emporary disabling * [LPT] code style fix * [LPT] code style fix #2 * [LPT] nGraph tests temporary disabling * [LPT] code styl fix #3 * [LPT] shared plugin tests temporary disabling * [LPT] cleanup * [LPT] nGraph unit_tests tests temproary disabling * [LPT] nGraph unit tests disabling #2 * [LPT] nGraph tests disabling * [LPT] nGraph tests temporary disabling * [LPT] WA removing * [LPT] CentOS compilation fix * [LPT] KMB wa to avoid compilation error * [LPT] functional test temporary disabling * [nGraph] code style fixes * [LPT] ConcatTransformation: data movement operation as intermediate handling * [LPT] FuseSubtractToFakeQuantize after VariadicSplit * [LPT] ConcatWithSplitTransformation functional test temporary disabling * [LPT] Clamp and ConcatWithDifferentPrecisionsOnChilds: tests fix * [LPT] MatMul: bert-nv-mlperf-quantized fix * [LPT] Add to convolution biases fuse fix * [LPT] GPU plugin tests fixes * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] CLDNN Plugin FP16 conversion * [LPT] AvgPool update precision if there is not FQ after + convolution precision limitation on activation * [LPT] Convolution fixes * [LPT] FuseSubtractToFakequantize & FuseMultiplyToFakeQuantize improvement * [LPT] FuseSubtractToFakeQuantize test fix * [LPT] FuseSubtractToFakeQuantizeTransformation tests * [LPT] code style fix * [LPT] AvgPool child recursive extend * [LPT] AvgPool tests + fix * [LPT] compilation quick fix * [LPT] Add to convolution biases fuse fix * [LPT] Linux issues: MatMulWithOptimizedConstantFakeQuantizeTransformation temporary disabled * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] 1) added the ability to create sub without dequantizationAttribute 2) fixed optimizeMulAfter: added copying rt_info 3) Tests Unification: Convolution transformation 4) added cleanRunTimeInfo into Network Helper * [LPT] Tests Unification: GroupConvolution * [LPT] removed debug info * [LPT] functional tests for Convolution & GroupConvolution extending * [LPT] [MatMul] Quick fix ubuntu error * [LPT] MatMulTransformation quick test fix: one constant for both intervals * [nGraph] code style fix * [LPT] added output_precision to NormalizeIE * [nGraph] NormalizeIE fix for LPT support * [LPT] nGraph WA removal * [LPT] fixed fillSubgraph for concat multi channels * [LPT] MatMul fix * [nGraph] WA removal: 1) nGraph tests enabling 2) LPT extanding: not handle in FP32 * [LPT] nGraph WA removal: function tests skip config rollback * [LPT] WA removal: precision propagation fix * [LPT] ConvertMulOrAddFinally transformation extending * [nGraph] ConvolutionMultiplyFusion rollback (move from legacy to common) * [nGraph] ConvertMulAddToScaleShiftOrPower: WA removal * [nGraph] TypeRelaxed: WA removal * [nGraph] WA removal: TypeRelaxed * [LPT] WA removal: ConcatTransformation * [nGraph] WA removal: Eltwise & ConvertMulOrAddFinally fixes to support LPT * [nGraph] MulAddConversion fix: 2D & 3D ScaleShift are supproted * [nGraph] VisualizeTree extending * [LPT] FakeQuantizeDequantization extending: check element wise dequantization operation * [LPT] FakeQuantizeDequantization extending: SubtractMultiplyToMultiplyAddTransformation & WeightableLayerTransformation * [LPT] Convolution + test infrastructure update * [LPT] GPU compilation error * [nGraph] BatchNorm plugin tests: input tensor definition * [LPT] LowPrecisionTransformer::isFunctionQuantized was added * [nGraph] WA final cleanup * [nGraph] ScaleShiftIE quick fix * [LPT] Functional tests: added test-cases "Concat with intermediate with constant" * [LPT] Transformer::isNetworkquantized fix * [LPT] SubtractMultiplyToMultiplyAdd zero Add remove: fix for ssd300 on gpu * [LPT] MultiplyToGroupConvolution not transform on Const * [LPT] workaround for negative scales * [LPT] Convert standalone dequantization Mul,Sub,Add to ScaleShift * [LPT] SubtractMultiplyToMultiplyAdd test fix * [LPT] Clamp transformation: GPU tests fix * [LPT] Transformer tests * [LPT] FakeQuantizePrecisionSelectionTransformation was disabled for GPU * [LPT] TransformerIsFunctionQuantized refactoring * [nGraph] code style fix * [LPT] mobilenet_v2_tf_depthwise test update * [LPT] TMP: dequantization folding * [LPT] Elementwise transformation fix: dequantization operations constant folding * [LPT] cleanup * [LPT] denormal values fix * [LPT] FuseFakeQuantize test fixed + negative multiply case * [LPT] FP32 -> FP16 conversion info * [LPT] FQ dot interval support + swapMultiplyAdd safely division * [LPT] test fix * [LPT] Tests for dot interval on FQ + tests for addTransformation enabling * [LPT] Clamp transformation fix * [LPT] FQ prec selection test fix * [LPT] Clamp test case * [LPT] Concat division precision fix * [LPT] cleanup * [LPT] merge fix * [LPT] WIP: MatMul asymmetric quantization fix (BERT) * [LPT] MatMulWithOptimizedConstantFakeQuantizeTransformation disabled * [LPT] GPU Plugin set config fix * [LPT] Fix merge mistakes * [LPT] Rollback device specific INT8 * [LPT] ReshapeFullyConnected fix: FullyConnected output fix * [LPT] bert-base-chinese GPU fix * [ngraph/LPT] Tests for fix convert_mul_or_add_finally with dequantization [ngraph/LPT] Fix convert mul_or_add_finally with dequantization * [LPT] ScaleShift dim < 4 only dequantization conversion * [LPT] MatMul transformation tests extensing * [LPT] ReshapeFullyConnected legacy transformation: LPT test case addition * [nGraph] VisualizeTree extending: property names displying to simplify search * [LPT] getDequantization extending * [LPT] MulAddToScaleshiftOrPower: out precision fix & tests * [LPT] Multiply to ScaleShiftIE: Multiply transformation: remove DEQUANTIZATION if not valid * [LPT] Concat test case * [nGraph] try to fix opencv compatibility * [nGraph] nGraph code style fix * [LPT] InPlace dequantization folding * [LPT] Multiply constant folding test * [LPT] Fix plugin test case for MatMulWithOptimizedConstantFakeQuantize [LPT] Enable MatMulWithOptimizedConstantFakeQuantize plugin test * [LPT] Convolution transformation: mulConst shape fix * [LPT] INT8 Constant folding branch for elementwise ops optimization removal * [LPT] eltwise for const branch fix * [LPT] linux fix * [LPT] Multiply test refactoring * [LPT] Convert Fuse in Constant + tests * [LPT] function comparation: runtime info comparation rollback * [LPT] linux build fix * [LPT] linux build fix2 * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT * [LPT] Reshape transformation update: don't broadcast by batch * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT - refactoring * [LPT] MatMul transformation: transpose input tensors fix * [LPT] checkElementwise for AddTransformation WA: should be moved to getDequantization * [LPT] merge fix * [LPT] MatMul fix & tests * [LPT] AddTransformation tests * [LPT] Interpolate transformation enabled * [LPT] constant folding before LPT * [LPT] WIP: not completed tests * [LPT] GPU degradation fix * [LPT] FuseConvert workaround * [LPT] code cleanup * [LPT] Interpolate GPU test quick fix * [LPT] GroupConvolution fix * [LPT] Fix fusing multiply for non-dequantization layers * [LPT] GPU pipeline update: enableInt8 initialization place update * [LPT] tests compilation fix * [LPT] merge fix * [LPT] tests enabling * [LPT] merge issue resolving * [LPT] LPT CNNNetwork usage macros: part #1: source code * [LPT] LPT CNNNetwork usage macros: part #2: cmake files update and tests addoption * [LPT] LPT workaround from nGraph core removing * [LPT] previous LPT version tests * [LPT] inference_engine_lp_transformations was returned back * [LPT] replace_node rollback * [LPT] ConvertSubtract fix * [LPT] GPU: baselineIsFP16 reuse fix * [LPT] FakeQuantizeTransformation: GPU workaround: I32 -> FP32 Convert is not fused * [LPT] AvgPool output precision workaround * [LPT] Group convolution precision + Subtract to ScaleShift const fix * [LPT] SubMulToMulAdd & Transpose: action-recognition-0001 fix * [LPT] Transpose: added test with per-tensor quantization Co-authored-by: Aleksandr Pertovsky <aleksandr.pertovsky@intel.com> Co-authored-by: Zinoviev, Vladimir <vladimir.zinoviev@intel.com> Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com> Co-authored-by: Gorokhov Dmitriy <dmitry.gorokhov@intel.com>
2020-10-23 13:22:55 +03:00 · 2020-10-23 13:22:55 +03:00 · c2271da637
commit c2271da637
parent ca95240c91
537 changed files with 37312 additions and 2406 deletions
--- a/inference-engine/src/cldnn_engine/CMakeLists.txt
+++ b/inference-engine/src/cldnn_engine/CMakeLists.txt
@ -21,9 +21,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
              SOURCES ${MAIN_SRC} ${LIBRARY_HEADERS}
              VERSION_DEFINES_FOR cldnn_engine.cpp)

-target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
+target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
                                             clDNN_lib pugixml inference_engine_transformations)

+if (USE_CNNNETWORK_LPT)
+        target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
+endif()
+
 set (CLDNN_TOP_FOLDER ${IE_MAIN_SOURCE_DIR}/thirdparty/clDNN)
 target_include_directories(${TARGET_NAME} PRIVATE
        ${CMAKE_CURRENT_SOURCE_DIR}
--- a/inference-engine/src/cldnn_engine/cldnn_engine.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_engine.cpp
@ -34,7 +34,9 @@
 #include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
 #include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
 #include <transformations/init_node_info.hpp>
+#include <transformations/convert_precision.hpp>
 #include <transformations/rt_info/fused_names_attribute.hpp>
+
 #include <legacy/convert_function_to_cnn_network.hpp>
 #include <legacy/ie_util_internal.hpp>
 #include <legacy/graph_transformer.h>
@ -43,6 +45,9 @@
 #include "cldnn_executable_network.h"
 #include "cldnn_custom_layer.h"

+#include <transformations/low_precision/transformer.hpp>
+#include <transformations/low_precision/mat_mul.hpp>
+
 #ifdef __linux__
 #include <dlfcn.h>
 #endif
@ -73,8 +78,10 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
    return device_info;
 }

-InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const {
+InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network, CLDNNPlugin::Config config) const {
    std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
+    bool baselineIsFP16 = false;
+
    if (clonedNetwork->getFunction()) {
        const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
            // Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
@ -113,6 +120,12 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
                return can_use_reduce;
            }

+            if (auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(node)) {
+                return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
+                       ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
+                       ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
+            }
+
            return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
                   std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node) ||
                   std::dynamic_pointer_cast<const ::ngraph::opset2::BatchToSpace>(node) ||
@ -128,24 +141,64 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
        // Disable shape inference (WA for generic operations)
        ::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);

-        // Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
-        ngraph::pass::Manager manager;
-        manager.register_pass<ngraph::pass::InitNodeInfo>();
-        // WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
-        manager.register_pass<ngraph::pass::ConvertPriorBox>();
-        manager.register_pass<ngraph::pass::CommonOptimizations>();
-        manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
-        manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
-        manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
+#ifndef USE_CNNNETWORK_LPT
+        bool enableInt8;
+#endif

-        manager.set_callback(transformations_callback);
-        manager.run_passes(nGraphFunc);
+        {
+            // Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
+            ngraph::pass::Manager manager;
+            manager.register_pass<ngraph::pass::InitNodeInfo>();
+            // WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
+            manager.register_pass<ngraph::pass::ConvertPriorBox>();
+            manager.register_pass<ngraph::pass::CommonOptimizations>();
+            manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
+            manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();

-        ngraph::pass::Manager ti_manager;
-        // Unroll will be called after all conversions
-        // temporarily switch back to plugin unroller from NGraph unroller until TI output names are corrected
-        // ti_manager.register_pass<ngraph::pass::UnrollTensorIterator>();
-        ti_manager.run_passes(nGraphFunc);
+            manager.set_callback(transformations_callback);
+            manager.run_passes(nGraphFunc);
+
+#ifndef USE_CNNNETWORK_LPT
+            enableInt8 = config.enableInt8 && ngraph::pass::low_precision::LowPrecisionTransformer::isFunctionQuantized(nGraphFunc);
+            if (enableInt8) {
+                const auto fp16_callback = [&baselineIsFP16](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
+                    if (!baselineIsFP16 && node->get_output_element_type(0) == ngraph::element::f16) {
+                        baselineIsFP16 = true;
+                    }
+
+                    return true;
+                };
+
+                ngraph::pass::Manager conversion_manager;
+                // [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
+                conversion_manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::f16, ngraph::element::f32);
+                conversion_manager.set_callback(fp16_callback);
+                conversion_manager.run_passes(nGraphFunc);
+            }
+#endif
+        }
+
+#ifndef USE_CNNNETWORK_LPT
+        using namespace ngraph::pass::low_precision;
+        if (enableInt8) {
+            auto params = LayerTransformation::Params(
+                true,  // updatePrecisions
+                LayerTransformation::QuantizedTensorAlignment::UpdateLevel,  // quantizedTensorAlignmentOnActivations
+                LayerTransformation::QuantizedTensorAlignment::None,  // quantizedTensorAlignmentOnWeights
+                true);  // supportAsymmetricQuantization
+            LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
+                .add<MatMulTransformation, ngraph::opset1::MatMul>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false)));
+
+            transformer.transform(nGraphFunc);
+        }
+#endif
+
+        {
+            ngraph::pass::Manager manager = ngraph::pass::Manager();
+            manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
+            manager.set_callback(transformations_callback);
+            manager.run_passes(nGraphFunc);
+        }

        clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
    }
@ -157,6 +210,17 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
        transformator.fullTrim();
    }

+    if (baselineIsFP16) {
+        // [WA part1] Store 'lpt_back_to_fp16' flag to convert FP32 operations to original FP16 after LPT
+        InputsDataMap inputsMap;
+        clonedNetwork->getInputsInfo(inputsMap);
+
+        if (!inputsMap.empty()) {
+            auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
+            input0.begin()->second->params["lpt_back_to_fp16"];
+        }
+    }
+
    return clonedNetwork;
 }

@ -259,7 +323,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn

    context = m_defaultContext;

-    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), context, conf);
+    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), context, conf);
 }

 ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network,
@ -283,7 +347,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
        conf.max_dynamic_batch = static_cast<int>(network.getBatchSize());
    }

-    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), casted, conf);
+    return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), casted, conf);
 }

 RemoteContext::Ptr clDNNEngine::CreateContext(const ParamMap& params) {
@ -326,7 +390,7 @@ QueryNetworkResult clDNNEngine::QueryNetwork(const ICNNNetwork& network,
        for (auto&& node : function->get_ops()) {
            originalOps.emplace(node->get_friendly_name());
        }
-        auto clonedNetwork = CloneAndTransformNetwork(network);
+        auto clonedNetwork = CloneAndTransformNetwork(network, _impl->m_config);
        std::unordered_set<std::string> supported;
        std::unordered_set<std::string> unsupported;

--- a/inference-engine/src/cldnn_engine/cldnn_engine.h
+++ b/inference-engine/src/cldnn_engine/cldnn_engine.h
@ -27,7 +27,8 @@ class clDNNEngine : public InferenceEngine::InferencePluginInternal,
    CLDNNRemoteCLContext::Ptr m_defaultContext;

    cldnn::device_info GetDeviceInfo(const std::map<std::string, std::string> &config) const;
-    InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const;
+    InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network,
+                                                               CLDNNPlugin::Config config) const;
 public:
    clDNNEngine();

--- a/inference-engine/src/cldnn_engine/cldnn_program.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_program.cpp
@ -88,9 +88,11 @@
 #include <sys/stat.h>
 #include <exec_graph_info.hpp>

+#ifdef USE_CNNNETWORK_LPT
 #include "low_precision_transformations/transformer.hpp"
 #include "low_precision_transformations/fully_connected.hpp"
 #include "low_precision_transformations/gemm.hpp"
+#endif

 #include <iostream>
 #include <iomanip>
@ -397,6 +399,41 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
    , p_currentOutputs({}) {
    InitFormat(network);

+    bool fqFound = false;
+
+    bool baselineIsFP16 = false;
+    InputsDataMap inputsMap;
+    network.getInputsInfo(inputsMap);
+    if (!inputsMap.empty()) {
+        auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
+        if (!input0.empty() && (input0.begin()->second->params.count("lpt_back_to_fp16") != 0)) {
+            baselineIsFP16 = true;
+            fqFound = true;
+        }
+    }
+
+#ifdef USE_CNNNETWORK_LPT
+    bool allFQareSupported = true;
+    if (config.enableInt8) {
+        auto it = details::CNNNetworkIterator(&network);
+        auto end = details::CNNNetworkIterator();
+        while (it != end) {
+            auto& layer = *it;
+            if (layer->precision == Precision::FP16) {
+                baselineIsFP16 = true;
+            }
+
+            if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
+                fqFound = true;
+                auto levels = layer->GetParamAsUInt("levels");
+                if (levels != 255 && levels != 256) {
+                    allFQareSupported = false;
+                }
+            }
+            it++;
+        }
+    }
+
    if (config.enableInt8) {
        auto params = LayerTransformation::Params(true,  // updatePrecisions
                                                  true,  // quantizeOutputs
@ -413,29 +450,6 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
                .add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
                .add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");

-        bool fqFound = false;
-        bool allFQareSupported = true;
-        bool baselineIsFP16 = false;
-        {
-            auto it = details::CNNNetworkIterator(&network);
-            auto end = details::CNNNetworkIterator();
-            while (it != end) {
-                auto& layer = *it;
-                if (layer->precision == Precision::FP16) {
-                    baselineIsFP16 = true;
-                }
-
-                if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
-                    fqFound = true;
-                    auto levels = layer->GetParamAsUInt("levels");
-                    if (levels != 255 && levels != 256) {
-                        allFQareSupported = false;
-                    }
-                }
-                it++;
-            }
-        }
-
        // [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
        if (fqFound && allFQareSupported) {
            NetPass::ConvertPrecision(network, Precision::FP16, Precision::FP32);
@ -443,8 +457,11 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl

        LowPrecisionTransformer transformer(transforms);
        transformer.transform(network);
+    }
+#endif

-        // [WA part2] Try to find non-quantized layers and convert them back to FP16
+    // [WA part2] Try to find non-quantized layers and convert them back to FP16
+    if (config.enableInt8) {
        if (fqFound && baselineIsFP16 && config.enable_fp16_for_quantized_models) {
            auto layersSorted = BFSSort(network);

--- a/inference-engine/src/gna_plugin/CMakeLists.txt
+++ b/inference-engine/src/gna_plugin/CMakeLists.txt
@ -57,7 +57,12 @@ target_compile_definitions(${TARGET_NAME}_test_static
            INTEGER_LOW_P
            USE_STATIC_IE)

-target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_lp_transformations libGNA::API)
+target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
+
+if (USE_CNNNETWORK_LPT)
+        target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_lp_transformations)
+endif()
+
 target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
 set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)

--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/eltwise.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/eltwise.hpp
@ -22,13 +22,17 @@ public:

    Eltwise(const Output<Node>& data1,
            const Output<Node>& data2,
-            const ELTWISE_TYPE eltwise_type);
+            const ELTWISE_TYPE eltwise_type,
+            const element::Type output_type = element::undefined);

    void validate_and_infer_types() override;

    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

    ELTWISE_TYPE eltwise_type;
+
+private:
+    element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/fully_connected.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/fully_connected.hpp
@ -29,17 +29,21 @@ public:
    FullyConnected(const Output<Node> & A,
                   const Output<Node> & B,
                   const Output<Node> & C,
-                   const Shape & output_shape);
+                   const Shape & output_shape,
+                   const element::Type output_type = element::undefined);

    void validate_and_infer_types() override;

    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

-    size_t get_out_size() { return m_output_size; }
+    size_t get_out_size() const { return m_output_size; }
+
+    element::Type get_output_type() const { return m_output_type; }

 private:
    size_t m_output_size = 0;
    Shape m_output_shape = {};
+    element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/normalize_ie.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/normalize_ie.hpp
@ -25,7 +25,8 @@ public:
                const Output<Node>& weights,
                float eps,
                bool across_spatial,
-                bool channel_shared);
+                bool channel_shared,
+                const ngraph::element::Type output_type);

    float get_eps() const { return m_eps; }
    bool get_channel_shared() const  { return m_channel_shared;}
@ -39,6 +40,7 @@ protected:
    float m_eps;
    bool m_across_spatial;
    bool m_channel_shared;
+    ngraph::element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/power.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/power.hpp
@ -19,13 +19,16 @@ public:
    const NodeTypeInfo& get_type_info() const override { return type_info; }

    PowerIE(const Output<Node>& data_batch,
-            const float power, const float scale, const float shift);
+            const float power, const float scale, const float shift, const element::Type output_type = element::undefined);

    void validate_and_infer_types() override;

    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;

    float scale, power, shift;
+
+private:
+    element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/relu_ie.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/relu_ie.hpp
@ -18,7 +18,7 @@ public:
    static constexpr NodeTypeInfo type_info{"ReLUIE", 1};
    const NodeTypeInfo& get_type_info() const override { return type_info; }

-    ReLUIE(const Output<Node> & data, const float & negative_slope);
+    ReLUIE(const Output<Node> & data, const float & negative_slope, const element::Type output_type);

    void validate_and_infer_types() override;

@ -26,8 +26,11 @@ public:

    float get_slope() { return m_negative_slope; }

+    element::Type get_output_type() const { return m_output_type; }
+
 private:
    float m_negative_slope;
+    element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/ngraph_ops/scaleshift.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/ngraph_ops/scaleshift.hpp
@ -20,11 +20,15 @@ public:

    ScaleShiftIE(const Output<Node>& data_batch,
                 const Output<Node>& weights,
-                 const Output<Node>& bias);
+                 const Output<Node>& bias,
+                 const element::Type output_type = element::undefined);

    void validate_and_infer_types() override;

    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
+
+private:
+    element::Type output_type;
 };

 }  // namespace op
--- a/inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/convert_mul_or_add_finally.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/convert_mul_or_add_finally.hpp
@ -35,6 +35,7 @@ public:
    // This pass finally converts single Multiply and Add operations to ScaleShift or Power operation
    ConvertMulOrAddFinally() : GraphRewrite() {
        convert_mul_or_add_finally<ngraph::opset1::Add>();
+        convert_mul_or_add_finally<ngraph::opset1::Subtract>();
        convert_mul_or_add_finally<ngraph::opset1::Multiply>();
    }

@ -52,11 +53,13 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
        et = ELTWISE_TYPE::Prod;
    } else if (std::is_same<T, ngraph::opset1::Add>()) {
        et = ELTWISE_TYPE::Sum;
+    } else if (std::is_same<T, ngraph::opset1::Subtract>()) {
+        et = ELTWISE_TYPE::Sub;
    } else {
        return false;
    }

-    auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et);
+    auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et, node->output(0).get_element_type());
    eltwise->set_friendly_name(node->get_friendly_name());
    ngraph::copy_runtime_info(node, eltwise);
    ngraph::replace_node(node, eltwise);
@ -66,7 +69,7 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
 template <typename T>
 ngraph::graph_rewrite_callback get_callback() {
    ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher& m) {
-        static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Multiply>(),
+        static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Subtract>() || std::is_same<T, ngraph::opset1::Multiply>(),
                      "Unsupported template parameter. Only Add or Multiply allowed!");

        auto lin_op = std::dynamic_pointer_cast<T> (m.get_match_root());
@ -77,7 +80,10 @@ ngraph::graph_rewrite_callback get_callback() {
        const auto output_shape = lin_op->output(0).get_partial_shape();
        const auto output_shape_rank = output_shape.rank().get_length();

-        if (!lin_op->get_element_type().is_real()) {
+        const auto intInputs = !lin_op->get_input_element_type(0).is_real() &&
+                               !lin_op->get_input_element_type(1).is_real();
+
+        if (!lin_op->get_element_type().is_real() || intInputs) {
            return convert_to_eltwise<T>(lin_op,
                                         lin_op->input(0).get_source_output(),
                                         lin_op->input(1).get_source_output());
@ -147,14 +153,65 @@ ngraph::graph_rewrite_callback get_callback() {

        auto res = check_constant(const_node, data_node.get_partial_shape());

-        if (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4)) {
+        auto checkElementwise = [](const std::shared_ptr<ngraph::Node>& elementwise) -> bool {
+            const ngraph::PartialShape partialShape = elementwise->get_input_partial_shape(0);
+            if (partialShape.is_dynamic()) {
+                return false;
+            }
+
+            std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(1));
+            if (constant == nullptr) {
+                constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(0));
+            }
+            if (constant == nullptr) {
+                return false;
+            }
+
+            const ngraph::Shape constShape = constant->get_output_shape(0);
+            if ((constShape.size() > 5ul)) {
+                return false;
+            }
+
+            if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
+                return true;
+            }
+
+            const ngraph::Shape shape = partialShape.to_shape();
+            if (constShape.size() == shape.size()) {
+                if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
+                    return false;
+                }
+                for (size_t i = 2ul; i < constShape.size(); ++i) {
+                    if (constShape[i] != 1ul) {
+                        return false;
+                    }
+                }
+            } else if (constShape.size() == (shape.size() - 1)) {
+                if (constShape[0] != shape[1]) {
+                    return false;
+                }
+                for (size_t i = 1ul; i < constShape.size(); ++i) {
+                    if (constShape[i] != 1ul) {
+                        return false;
+                    }
+                }
+            } else {
+                return false;
+            }
+
+            return true;
+        };
+
+        bool is_dequantization = (lin_op->get_rt_info().count("DEQUANTIZATION") != 0) && checkElementwise(lin_op);
+
+        if (!is_dequantization && (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4))) {
            return convert_to_eltwise<T>(lin_op,
                                         lin_op->input(0).get_source_output(),
                                         lin_op->input(1).get_source_output());
        }

        // TODO: if all values in Constant are equal the best way is to convert this Eltwise to Power
-        if (res == CONVERSION_RESULT::SCALE_SHIFT) {
+        if (res == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
            auto weights_et = const_node->get_element_type();
            auto weights_shape = const_node->get_shape();

@ -162,12 +219,49 @@ ngraph::graph_rewrite_callback get_callback() {
            std::shared_ptr<ngraph::op::ScaleShiftIE> scaleshift;
            if (std::is_same<T, ngraph::opset1::Add>()) {
                auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
-                scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(weights, output_shape),
-                                                                                   ngraph::op::util::normalize_constant(const_node, output_shape));
-            } else {
+                auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
+                auto biases_in = ngraph::op::util::normalize_constant(const_node, output_shape);
+                if (is_dequantization) {
+                    const ngraph::Shape data_shape = data_node.get_shape();
+                    ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
+                    broadcasted_shape[1] = data_shape[1];
+
+                    weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
+                    biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
+                }
+                scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
+            } else if (std::is_same<T, ngraph::opset1::Subtract>()) {
+                std::shared_ptr<ngraph::Node> new_const_node = std::make_shared<ngraph::opset1::Multiply>(
+                    ngraph::op::util::normalize_constant(const_node, output_shape),
+                    ngraph::opset1::Constant::create(weights_et, ngraph::Shape{ 1 }, { -1 }));
+
+                auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
+                auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
+                auto biases_in = new_const_node;
+                if (is_dequantization) {
+                    const ngraph::Shape data_shape = data_node.get_shape();
+                    ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
+                    broadcasted_shape[1] = data_shape[1];
+
+                    weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
+                    biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
+                }
+                scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
+            } else if (std::is_same<T, ngraph::opset1::Multiply>()) {
                auto bias = ngraph::opset1::Constant::create(weights_et, weights_shape, {0});
-                scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(const_node, output_shape),
-                                                                                   ngraph::op::util::normalize_constant(bias, output_shape));
+                auto weights_in = ngraph::op::util::normalize_constant(const_node, output_shape);
+                auto biases_in = ngraph::op::util::normalize_constant(bias, output_shape);
+                if (is_dequantization) {
+                    const ngraph::Shape data_shape = data_node.get_shape();
+                    ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
+                    broadcasted_shape[1] = data_shape[1];
+
+                    weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
+                    biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
+                }
+                scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
+            } else {
+                return false;
            }

            scaleshift->set_friendly_name(lin_op->get_friendly_name());
@ -182,9 +276,11 @@ ngraph::graph_rewrite_callback get_callback() {
            // In case Add we create fake scale equal to 1, in case of Multiply we create fake shift equal to 0
            std::shared_ptr<ngraph::op::PowerIE> power;
            if (std::is_same<T, ngraph::opset1::Add>()) {
-                power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value);
+                power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value, lin_op->get_output_element_type(0));
            } else if (std::is_same<T, ngraph::opset1::Multiply>()) {
-                power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0.);
+                power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0., lin_op->get_output_element_type(0));
+            } else if (std::is_same<T, ngraph::opset1::Subtract>()) {
+                power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., -value, lin_op->get_output_element_type(0));
            } else {
                return false;
            }
--- a/inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/reshape_fc_fusion.hpp
+++ b/inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/reshape_fc_fusion.hpp
@ -80,7 +80,8 @@ private:
            auto new_fc = std::make_shared<op::FullyConnected>(reshape->input_value(0),
                                                               fc->input_value(1),
                                                               fc->input_value(2),
-                                                               fc->get_shape());
+                                                               fc->get_shape(),
+                                                               fc->output(0).get_element_type());

            new_fc->set_friendly_name(fc->get_friendly_name());
            ngraph::copy_runtime_info({reshape, fc}, new_fc);
--- a/inference-engine/src/legacy_api/src/ie_cnn_layer_builder_ngraph.cpp
+++ b/inference-engine/src/legacy_api/src/ie_cnn_layer_builder_ngraph.cpp
@ -1637,6 +1637,9 @@ CNNLayer::Ptr NodeConverter<ngraph::op::Eltwise>::createLayer(const std::shared_
    case ELTWISE_TYPE::Sum:
        type = "sum";
        break;
+    case ELTWISE_TYPE::Sub:
+        type = "sub";
+        break;
    case ELTWISE_TYPE::Prod:
        type = "prod";
        break;
--- a/inference-engine/src/legacy_api/src/ngraph_ops/eltwise.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/eltwise.cpp
@ -15,8 +15,8 @@ using namespace ngraph;

 constexpr NodeTypeInfo op::Eltwise::type_info;

-op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type)
-    : Op({data1, data2}), eltwise_type(eltwise_type) {
+op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type, const element::Type output_type)
+    : Op({data1, data2}), eltwise_type(eltwise_type), m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

@ -25,7 +25,7 @@ std::shared_ptr<Node> op::Eltwise::clone_with_new_inputs(const OutputVector& new
        throw ngraph_error("Incorrect number of new arguments");
    }

-    return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type);
+    return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type, m_output_type);
 }

 void op::Eltwise::validate_and_infer_types() {
@ -34,8 +34,12 @@ void op::Eltwise::validate_and_infer_types() {
    element::Type data2_et = get_input_element_type(1);

    element::Type et_result;
-    NODE_VALIDATION_CHECK(this, element::Type::merge(et_result, data1_et, data2_et),
-                          "Element types for first and second do not match :", data1_et, " and ", data2_et);
+    if (m_output_type == element::undefined) {
+        NODE_VALIDATION_CHECK(this, element::Type::merge(et_result, data1_et, data2_et),
+                              "Element types for first and second do not match :", data1_et, " and ", data2_et);
+    } else {
+        et_result = m_output_type;
+    }

    if (get_input_partial_shape(0).rank().is_dynamic() ||
        get_input_partial_shape(1).rank().is_dynamic()) {
--- a/inference-engine/src/legacy_api/src/ngraph_ops/fully_connected.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/fully_connected.cpp
@ -12,8 +12,13 @@ using namespace ngraph;

 constexpr NodeTypeInfo op::FullyConnected::type_info;

-op::FullyConnected::FullyConnected(const Output<Node>& A, const Output<Node>& B, const Output<Node>& C, const Shape & output_shape)
-    : Op({A, B, C}), m_output_shape(output_shape) {
+op::FullyConnected::FullyConnected(
+    const Output<Node>& A,
+    const Output<Node>& B,
+    const Output<Node>& C,
+    const Shape & output_shape,
+    const element::Type output_type)
+    : Op({A, B, C}), m_output_shape(output_shape), m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

@ -26,5 +31,8 @@ void op::FullyConnected::validate_and_infer_types() {
    if (m_output_shape.size() < 2)
        throw ngraph_error("FullyConnected shape is incorrect");
    m_output_size = m_output_shape.back();
-    set_output_type(0, input_value(0).get_element_type(), m_output_shape);
+    set_output_type(
+        0,
+        m_output_type == element::undefined ? input_value(0).get_element_type() : m_output_type,
+        m_output_shape);
 }
--- a/inference-engine/src/legacy_api/src/ngraph_ops/normalize_ie.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/normalize_ie.cpp
@ -15,15 +15,14 @@ using namespace ngraph;
 constexpr NodeTypeInfo op::NormalizeIE::type_info;

 op::NormalizeIE::NormalizeIE(const Output<Node>& data, const Output<Node>& weights, float eps, bool across_spatial,
-                             bool channel_shared)
-    : Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared) {
+                             bool channel_shared, const ngraph::element::Type output_type)
+    : Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared), m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

 void op::NormalizeIE::validate_and_infer_types() {
-    element::Type arg_type = get_input_element_type(0);
    PartialShape arg_shape = get_input_partial_shape(0);
-    set_output_type(0, arg_type, arg_shape);
+    set_output_type(0, m_output_type, arg_shape);

    const PartialShape& input_shape = get_input_partial_shape(0);

@ -34,5 +33,5 @@ void op::NormalizeIE::validate_and_infer_types() {

 shared_ptr<Node> op::NormalizeIE::clone_with_new_inputs(const OutputVector& new_args) const {
    check_new_args_count(this, new_args);
-    return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared);
+    return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared, m_output_type);
 }
--- a/inference-engine/src/legacy_api/src/ngraph_ops/power.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/power.cpp
@ -14,8 +14,8 @@ using namespace ngraph;

 constexpr NodeTypeInfo op::PowerIE::type_info;

-op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift)
-    : Op({data_batch}), scale(scale), power(power), shift(shift) {
+op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift, const element::Type output_type)
+    : Op({data_batch}), scale(scale), power(power), shift(shift), m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

@ -24,9 +24,9 @@ std::shared_ptr<Node> op::PowerIE::clone_with_new_inputs(const OutputVector& new
        throw ngraph_error("Incorrect number of new arguments");
    }

-    return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift);
+    return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift, this->m_output_type);
 }

 void op::PowerIE::validate_and_infer_types() {
-    set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
+    set_output_type(0, m_output_type == element::undefined ? get_input_element_type(0) : m_output_type, get_input_partial_shape(0));
 }
--- a/inference-engine/src/legacy_api/src/ngraph_ops/relu_ie.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/relu_ie.cpp
@ -15,16 +15,19 @@ using namespace ngraph;

 constexpr NodeTypeInfo op::ReLUIE::type_info;

-op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope)
-    : Op(OutputVector {data}), m_negative_slope(negative_slope) {
+op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope, const element::Type output_type)
+    : Op(OutputVector {data}), m_negative_slope(negative_slope), m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

 std::shared_ptr<Node> op::ReLUIE::clone_with_new_inputs(const OutputVector& new_args) const {
    check_new_args_count(this, new_args);
-    return make_shared<ReLUIE>(new_args.at(0), m_negative_slope);
+    return make_shared<ReLUIE>(new_args.at(0), m_negative_slope, m_output_type);
 }

 void op::ReLUIE::validate_and_infer_types() {
-    set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
+    set_output_type(
+        0,
+        m_output_type == element::undefined ? get_input_element_type(0) : m_output_type,
+        get_input_partial_shape(0));
 }
--- a/inference-engine/src/legacy_api/src/ngraph_ops/scaleshift.cpp
+++ b/inference-engine/src/legacy_api/src/ngraph_ops/scaleshift.cpp
@ -14,8 +14,25 @@ using namespace ngraph;

 constexpr NodeTypeInfo op::ScaleShiftIE::type_info;

-op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias)
-    : Op({data_batch, weights, bias}) {
+element::Type getMaxBitwidth(const std::vector<element::Type>& types) {
+    if (types.empty()) {
+        return element::undefined;
+    }
+
+    element::Type maxType = types[0];
+    for (size_t i = 1; i < types.size(); ++i) {
+        if (types[i].bitwidth() > maxType.bitwidth()) {
+            maxType = types[i];
+        }
+    }
+    return maxType;
+}
+
+op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias, const element::Type output_type)
+    : Op({data_batch, weights, bias}), output_type(output_type) {
+    if (this->output_type == element::undefined) {
+        this->output_type = getMaxBitwidth({ data_batch.get_element_type(), weights.get_element_type(), bias.get_element_type() });
+    }
    constructor_validate_and_infer_types();
 }

@ -24,12 +41,12 @@ std::shared_ptr<Node> op::ScaleShiftIE::clone_with_new_inputs(const OutputVector
        throw ngraph_error("Incorrect number of new arguments");
    }

-    return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2));
+    return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2), output_type);
 }

 void op::ScaleShiftIE::validate_and_infer_types() {
    //  Check that weights and biases has the same type
-    element::Type data_et = get_input_element_type(0);
+    element::Type data_et = output_type == element::undefined ? get_input_element_type(0) : output_type;
    element::Type weights_et = get_input_element_type(1);
    element::Type biases_et = get_input_element_type(2);

--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_matmul_to_fc_or_gemm.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_matmul_to_fc_or_gemm.cpp
@ -143,9 +143,9 @@ ngraph::pass::ConvertMatMulToFC::ConvertMatMulToFC() {

            // Create FullyConnected
            std::vector<float> bias_value(O, 0);
-            auto fc_bias = opset1::Constant::create(matmul->get_input_element_type(0), Shape {O}, bias_value);
+            auto fc_bias = opset1::Constant::create(matmul->get_output_element_type(0), Shape {O}, bias_value);

-            auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape);
+            auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape, matmul->output(0).get_element_type());
            fc->set_friendly_name(matmul->get_friendly_name());
            new_ops.push_back(fc);

@ -207,7 +207,7 @@ ngraph::pass::ConvertMatMulToGemm::ConvertMatMulToGemm() {
            new_ops.push_back(fc_input_b.get_node_shared_ptr());
        }

-        auto gemm = std::make_shared<opset1::MatMul>(fc_input_a, fc_input_b, matmul->get_transpose_a(), matmul->get_transpose_b());
+        auto gemm = matmul->copy_with_new_inputs({ fc_input_a, fc_input_b });
        new_ops.push_back(gemm);

        if (gemm->get_shape() != output_shape) {
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_mul_add_to_scaleshift_or_power.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_mul_add_to_scaleshift_or_power.cpp
@ -87,6 +87,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
            const_bias_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(add_input_0);
        }

+        if (const_bias_node->output(0).get_element_type() != add_node->output(0).get_element_type()) {
+            return false;
+        }
+
        auto mul_input_0 = mul_node->input(0).get_source_output().get_node_shared_ptr();
        auto mul_input_1 = mul_node->input(1).get_source_output().get_node_shared_ptr();

@ -97,6 +101,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
            const_weights_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(mul_input_0);
        }

+        if (const_weights_node->output(0).get_element_type() != mul_node->output(0).get_element_type()) {
+            return false;
+        }
+
        if (add_node->get_output_partial_shape(0).rank().is_dynamic() ||
            mul_node->get_output_partial_shape(0).rank().is_dynamic()) {
            return false;
@ -137,13 +145,16 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
        const auto output_shape = add_node->get_output_partial_shape(0);
        const auto output_shape_rank = output_shape.rank().get_length();

+        bool is_dequantization =
+                (add_node->get_rt_info().count("DEQUANTIZATION") != 0 || mul_node->get_rt_info().count("DEQUANTIZATION") != 0);
+
        if (res1 == CONVERSION_RESULT::NONE || res2 == CONVERSION_RESULT::NONE ||
-            ((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && output_shape_rank < 4)) {
+            ((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && !is_dequantization && output_shape_rank < 4)) {
            return false;
        }

        // TODO: in case if scale and shift constants has equal values the best way is to convert them to Power
-        if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) {
+        if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
            NodeVector new_ops;

            auto weights_in = ngraph::op::util::normalize_constant(const_weights_node, output_shape);
@ -151,16 +162,29 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
            new_ops.push_back(weights_in);
            new_ops.push_back(biases_in);

-            if (res1 == CONVERSION_RESULT::POWER) {
+            if (is_dequantization) {
+                const Shape data_shape = data_node.get_shape();
+                Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
+                broadcasted_shape[1] = data_shape[1];
+
+                weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
+                new_ops.push_back(weights_in);
+
+                biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
+                new_ops.push_back(biases_in);
+            }
+
+            if (res1 == CONVERSION_RESULT::POWER && !is_dequantization) {
                weights_in = ngraph::op::util::broadcastTo(weights_in, biases_in->get_shape());
                new_ops.push_back(weights_in);
            }
-            if (res2 == CONVERSION_RESULT::POWER) {
+            if (res2 == CONVERSION_RESULT::POWER && !is_dequantization) {
                biases_in = ngraph::op::util::broadcastTo(biases_in, weights_in->get_shape());
                new_ops.push_back(biases_in);
            }

-            auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
+            auto output_type = m.get_match_root()->get_output_element_type(0);
+            auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in, output_type);
            new_ops.push_back(scaleshift);

            scaleshift->set_friendly_name(add_node->get_friendly_name());
@ -175,7 +199,8 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
                return false;
            }

-            auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift);
+            auto output_type = m.get_match_root()->get_output_element_type(0);
+            auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift, output_type);
            power->set_friendly_name(add_node->get_friendly_name());
            ngraph::copy_runtime_info({mul_node, add_node}, power);
            ngraph::replace_node(m.get_match_root(), power);
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_normalizel2_to_normalize_ie.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_normalizel2_to_normalize_ie.cpp
@ -62,7 +62,8 @@ ngraph::pass::ConvertNormalizeL2WithMulToNormalizeIE::ConvertNormalizeL2WithMulT
                                                                       constant->output(0),
                                                                       normalize->get_eps(),
                                                                       across_spatial,
-                                                                       channel_shared);
+                                                                       channel_shared,
+                                                                       normalize->get_element_type());

        normalize_ie->set_friendly_name(mul->get_friendly_name());
        ngraph::copy_runtime_info({normalize, mul}, normalize_ie);
@ -93,13 +94,14 @@ ngraph::pass::ConvertNormalizeL2ToLegacyMatcher::ConvertNormalizeL2ToLegacyMatch
        bool across_channels = !(axis.size() == 1 && axis[0] == 1);
        bool channel_shared = true;

-        auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->get_input_element_type(0), Shape{1}, std::vector<float>{1.0});
+        auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->output(0).get_element_type(), Shape{1}, std::vector<float>{1.0});

        auto normalize_ie = std::make_shared<ngraph::op::NormalizeIE> (normalize->input(0).get_source_output(),
                                                                       scale->output(0),
                                                                       normalize->get_eps(),
                                                                       across_channels,
-                                                                       channel_shared);
+                                                                       channel_shared,
+                                                                       normalize->get_element_type());

        normalize_ie->set_friendly_name(normalize->get_friendly_name());
        ngraph::copy_runtime_info(normalize, normalize_ie);
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_power_to_power_ie.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_power_to_power_ie.cpp
@ -33,7 +33,7 @@ ngraph::pass::ConvertPowerToPowerIEMatcher::ConvertPowerToPowerIEMatcher() {
                return false;
            }

-            auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0);
+            auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0, power->output(0).get_element_type());
            power_ie->set_friendly_name(power->get_friendly_name());
            ngraph::copy_runtime_info(power, power_ie);
            ngraph::replace_node(power, power_ie);
@ -44,4 +44,4 @@ ngraph::pass::ConvertPowerToPowerIEMatcher::ConvertPowerToPowerIEMatcher() {

    auto m = std::make_shared<ngraph::pattern::Matcher>(power, "ConvertPowerToPowerIE");
    this->register_matcher(m, callback);
-}
+}
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_prelu_to_relu_ie.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_prelu_to_relu_ie.cpp
@ -33,7 +33,7 @@ ngraph::pass::ConvertPReLUToReLUIE::ConvertPReLUToReLUIE() {
                return false;
            }

-            auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value);
+            auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value, prelu->output(0).get_element_type());
            relu_ie->set_friendly_name(prelu->get_friendly_name());
            ngraph::copy_runtime_info(prelu, relu_ie);
            ngraph::replace_node(prelu, relu_ie);
@ -44,4 +44,4 @@ ngraph::pass::ConvertPReLUToReLUIE::ConvertPReLUToReLUIE() {

    auto m = std::make_shared<ngraph::pattern::Matcher>(prelu, "ConvertPReLUToReLUIE");
    this->register_matcher(m, callback);
-}
+}
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_sqrt_to_power_ie.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/convert_sqrt_to_power_ie.cpp
@ -25,7 +25,7 @@ ngraph::pass::ConvertSqrtToPowerIEMatcher::ConvertSqrtToPowerIEMatcher() {
        if (!sqrt) {
            return false;
        }
-        auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0);
+        auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0, sqrt->output(0).get_element_type());
        power_ie->set_friendly_name(sqrt->get_friendly_name());
        ngraph::copy_runtime_info(sqrt, power_ie);
        ngraph::replace_node(sqrt, power_ie);
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/fc_bias_fusion.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/fc_bias_fusion.cpp
@ -65,7 +65,8 @@ ngraph::pass::FullyConnectedBiasFusion::FullyConnectedBiasFusion() {
        auto new_fc = std::make_shared<op::FullyConnected>(m_fc->input(0).get_source_output(),
                                                           m_fc->input(1).get_source_output(),
                                                           final_bias,
-                                                           m_fc->get_shape());
+                                                           m_fc->get_shape(),
+                                                           m_fc->get_output_type());
        new_ops.push_back(new_fc);

        new_fc->set_friendly_name(add->get_friendly_name());
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/reshape_1d_ops.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/reshape_1d_ops.cpp
@ -44,6 +44,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
                                                   new_dilations,
                                                   new_pads_begin,
                                                   new_pad_end,
+                                                   node->get_output_element_type(0),
                                                   node->get_group(),
                                                   node->get_auto_pad());
    } else {
@ -54,6 +55,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
                                                   new_dilations,
                                                   new_pads_begin,
                                                   new_pad_end,
+                                                   node->get_output_element_type(0),
                                                   node->get_group(),
                                                   node->get_auto_pad());
    }
--- a/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/reshape_fully_connected.cpp
+++ b/inference-engine/src/legacy_api/src/transformations/convert_opset1_to_legacy/reshape_fully_connected.cpp
@ -52,7 +52,8 @@ ngraph::pass::ReshapeFullyConnected::ReshapeFullyConnected() {
        auto fc_new = std::make_shared<op::FullyConnected>(reshape,
                                                           fc->input_value(1),
                                                           fc->input_value(2),
-                                                           output_shape_new);
+                                                           output_shape_new,
+                                                           fc->get_output_type());
        new_ops.push_back(fc_new);

        if (output_shape != output_shape_new) {
@ -73,4 +74,4 @@ ngraph::pass::ReshapeFullyConnected::ReshapeFullyConnected() {

    auto m = std::make_shared<ngraph::pattern::Matcher>(fc, "ReshapeFullyConnected");
    this->register_matcher(m, callback);
-}
+}
--- a/inference-engine/src/low_precision_transformations/CMakeLists.txt
+++ b/inference-engine/src/low_precision_transformations/CMakeLists.txt
@ -51,3 +51,7 @@ install(TARGETS ${TARGET_NAME}
        RUNTIME DESTINATION ${IE_CPACK_RUNTIME_PATH} COMPONENT core
        ARCHIVE DESTINATION ${IE_CPACK_ARCHIVE_PATH} COMPONENT core
        LIBRARY DESTINATION ${IE_CPACK_LIBRARY_PATH} COMPONENT core)
+
+if (USE_CNNNETWORK_LPT)
+        target_compile_definitions(${TARGET_NAME} PUBLIC USE_CNNNETWORK_LPT)
+endif()
--- a/inference-engine/src/low_precision_transformations/src/activation.cpp
+++ b/inference-engine/src/low_precision_transformations/src/activation.cpp
@ -103,16 +103,5 @@ void ActivationTransformation::transform(TransformationContext& context, CNNLaye
    CNNNetworkHelper::removeLayer(context.network, scaleShift);
    context.removeLayer(*scaleShift);

-    const std::vector<CNNLayerPtr> children = CNNNetworkHelper::getChildren(*activationLayer);
-    for (const CNNLayerPtr& child : children) {
-        const std::vector<CNNLayerPtr> dequantizationLayers = CNNNetworkHelper::addScaleShiftBetween(
-            context,
-            activationLayer,
-            child,
-            DequantizationDetails(scales, shifts));
-
-        for (const auto& dequantizationLayer : dequantizationLayers) {
-            context.dequantizationLayersNames.insert(dequantizationLayer->name);
-        }
-    }
+    addDequantizationLayer(context, *activationLayer, scales, shifts);
 }
--- a/inference-engine/src/low_precision_transformations/src/network_helper.cpp
+++ b/inference-engine/src/low_precision_transformations/src/network_helper.cpp
@ -1332,6 +1332,8 @@ void CNNNetworkHelper::addLayerToCNNNetworkAfterData(
                THROW_IE_EXCEPTION << "parent data is absent";
            }
            netImpl->removeOutput(parent->name);
+            netImpl->addData(parent->name.c_str(), parentOutData);
+
            netImpl->addData(layer->name.c_str(), newEdgeAfterLayer);
            netImpl->addOutput(layer->name);
        }
--- a/inference-engine/src/low_precision_transformations/src/weightable_layer_transformation.cpp
+++ b/inference-engine/src/low_precision_transformations/src/weightable_layer_transformation.cpp
@ -329,7 +329,7 @@ void WeightableLayerTransformation::updateToSupportAsymmetricQuantization(
    const PrecisionsInfo& weightsPrecisionsInfo,
    std::vector<float>& weightsShifts) const {
    const CNNLayerPtr parentOnData = CNNNetworkHelper::getParent(layer, 0ul);
-    if (parentOnData->type == "ScaleShift") {
+    if (parentOnData->type == "ScaleShift") {   // FIXME: it is always true
        const std::shared_ptr<float> dataConvertedInBlob = CNNNetworkHelper::convertFloatData(
            dataShifts.data(),
            dataShifts.size(),
--- a/inference-engine/src/mkldnn_plugin/CMakeLists.txt
+++ b/inference-engine/src/mkldnn_plugin/CMakeLists.txt
@ -167,9 +167,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
 set_ie_threading_interface_for(${TARGET_NAME})

 target_compile_definitions(${TARGET_NAME} PUBLIC -DMKLDNN_THR=${MKLDNN_THR})
-target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
+target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
                      inference_engine_transformations mkldnn)

+if (USE_CNNNETWORK_LPT)
+    target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
+endif()
+
 # Cross compiled function
 # TODO: The same for proposal, proposalONNX, topk
 cross_compiled_file(${TARGET_NAME}
--- a/inference-engine/src/mkldnn_plugin/mkldnn_exec_network.cpp
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_exec_network.cpp
@ -16,17 +16,20 @@
 #include <legacy/ie_util_internal.hpp>
 #include <legacy/graph_tools.hpp>
 #include <threading/ie_executor_manager.hpp>
+
+#ifdef USE_CNNNETWORK_LPT
 #include "low_precision_transformations/convolution.hpp"
-#include "low_precision_transformations/eltwise.hpp"
-#include "low_precision_transformations/fully_connected.hpp"
 #include "low_precision_transformations/scaleshift_to_convolution.hpp"
 #include "low_precision_transformations/transformer.hpp"
+#endif
+
 #include <threading/ie_cpu_streams_executor.hpp>
 #include <ie_system_conf.h>
 #include <threading/ie_thread_affinity.hpp>
 #include <algorithm>
 #include <unordered_set>
 #include <utility>
+#include <cstring>

 using namespace MKLDNNPlugin;
 using namespace InferenceEngine;
@ -51,6 +54,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
    // we are cloning network if we have statistics and we can transform network.
    _clonedNetwork = cloneNet(network);

+#ifdef USE_CNNNETWORK_LPT
    if (_cfg.lpTransformsMode == Config::LPTransformsMode::On) {
        auto params = LayerTransformation::Params(true,  // updatePrecisions
                                                    true,  // quantizeOutputs
@ -94,6 +98,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
            bf16Transformer.convertToFloat(cnnetwork);
        }
    }
+#endif

    MKLDNNGraph::ApplyUnrollPasses(static_cast<ICNNNetwork&>(*_clonedNetwork));

--- a/inference-engine/src/mkldnn_plugin/mkldnn_graph.cpp
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_graph.cpp
@ -32,7 +32,6 @@

 #include "precision_utils.h"
 #include <ie_plugin_config.hpp>
-#include "low_precision_transformations/transformer.hpp"

 #include "utils/blob_dump.h"

--- a/inference-engine/src/mkldnn_plugin/mkldnn_graph_optimizer.cpp
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_graph_optimizer.cpp
@ -256,6 +256,10 @@ void MKLDNNGraphOptimizer::FuseConvolutionAndZeroPoints(MKLDNNGraph &graph) {
                if (arg0->getCnnLayer()->outData[0]->getPrecision() != Precision::U8)
                    return false;

+                if (parent0->getParentEdgesAtPort(1)[0]->getDims().size() < 2) {
+                    return false;
+                }
+
                if (parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != 1 &&
                    parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != IC)
                    return false;
@ -495,6 +499,9 @@ void MKLDNNGraphOptimizer::MergeTwoEqualScaleShifts(MKLDNNGraph& graph) {
    };

    auto isEqualScaleShiftNodes = [](MKLDNNNodePtr node1, MKLDNNNodePtr node2) {
+        if (node1->getParentEdgeAt(0) != node2->getParentEdgeAt(0))
+            return false;
+
        auto *depthwiseNode1 = dynamic_cast<MKLDNNDepthwiseNode *>(node1.get());
        auto *depthwiseNode2 = dynamic_cast<MKLDNNDepthwiseNode *>(node2.get());

--- a/inference-engine/src/mkldnn_plugin/mkldnn_plugin.cpp
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_plugin.cpp
@ -53,6 +53,12 @@
 #include <ngraph/op/util/op_types.hpp>
 #include <ngraph/pass/manager.hpp>

+#include <transformations/common_optimizations/lin_op_sequence_fusion.hpp>
+#include <transformations/low_precision/transformer.hpp>
+#include <transformations/low_precision/convolution.hpp>
+#include <transformations/low_precision/group_convolution.hpp>
+#include <transformations/low_precision/multiply_to_group_convolution.hpp>
+
 #if !defined(__arm__) && !defined(_M_ARM) && !defined(__aarch64__) && !defined(_M_ARM64)
 #if defined(_WIN32) || defined(WIN32)
 #include <intrin.h>
@ -76,7 +82,7 @@ Engine::~Engine() {
    ExecutorManager::getInstance()->clear("CPUCallbackExecutor");
 }

-static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
+static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf) {
    OV_ITT_SCOPED_TASK(MKLDNNPlugin::itt::domains::MKLDNNPlugin, "Transformation");

    auto nGraphFunc = clonedNetwork->getFunction();
@ -104,9 +110,6 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
        manager.register_pass<ngraph::pass::ConvertPrecision>(precision.first, precision.second);
    }

-    manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
-    manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
-
    auto pass_config = manager.get_pass_config();

    using const_node_ptr = const std::shared_ptr<const ngraph::Node>;
@ -144,6 +147,47 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {

    manager.run_passes(nGraphFunc);

+#ifndef USE_CNNNETWORK_LPT
+    using namespace ngraph::pass::low_precision;
+    if (conf.lpTransformsMode == Config::LPTransformsMode::On) {
+        auto params = LayerTransformation::Params(
+            true,  // updatePrecisions
+            LayerTransformation::QuantizedTensorAlignment::UpdateLevel,  // quantizedTensorAlignmentOnActivations
+            LayerTransformation::QuantizedTensorAlignment::None,  // quantizedTensorAlignmentOnWeights
+            true);  // supportAsymmetricQuantization
+        LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
+            .add<ConvolutionTransformation, ngraph::opset1::Convolution>(
+                LayerTransformation::Params(params).setPrecisionsOnActivations({ngraph::element::u8}).setSupportAsymmetricQuantization(true))
+            .add<GroupConvolutionTransformation, ngraph::opset1::GroupConvolution>(
+                LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 }).setSupportAsymmetricQuantization(true))
+            .addStandaloneCleanup<MultiplyToGroupConvolutionTransformation, ngraph::opset1::Multiply>(
+                LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 })));
+
+        transformer.transform(nGraphFunc);
+    }
+#endif
+
+    ngraph::pass::Manager legacyManager;
+    legacyManager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
+    legacyManager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
+
+    auto legacyPassConfig = manager.get_pass_config();
+    legacyPassConfig->set_callback<ngraph::pass::AddMultiplyFusion>([](const_node_ptr &node) -> bool {
+        if (auto mul_op = std::dynamic_pointer_cast<const ngraph::opset1::Multiply>(node)) {
+            auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(mul_op->get_input_node_shared_ptr(0));
+            auto constant = std::dynamic_pointer_cast<const ngraph::opset1::Constant>(mul_op->get_input_node_shared_ptr(1));
+            bool is_dequantization = mul_op->get_rt_info().count("DEQUANTIZATION") != 0;
+            if (add_op && constant && is_dequantization) {
+                return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
+                    ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
+                    ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
+            }
+        }
+        return false;
+    });
+
+    legacyManager.run_passes(nGraphFunc);
+
    clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);

    // WA: after conversion to CNNNetwork user precision can redefine input/output precisions
@ -187,7 +231,7 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network, const st
    std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
    bool is_transformed = false;
    if (clonedNetwork->getFunction()) {
-        Transformation(clonedNetwork);
+        Transformation(clonedNetwork, conf);
        is_transformed = true;
    }
    auto implNetwork = std::dynamic_pointer_cast<details::CNNNetworkImpl>(clonedNetwork);
@ -312,8 +356,17 @@ QueryNetworkResult Engine::QueryNetwork(const ICNNNetwork& network, const std::m
        for (auto&& node : function->get_ops()) {
            originalOps.emplace(node->get_friendly_name());
        }
+
+        // TODO: Clarify the behavior of SetConfig method. Skip eng_config or not?
+        Config conf = engConfig;
+        conf.readProperties(config);
+
+        if (conf.enableDynamicBatch) {
+            conf.batchLimit = static_cast<int>(network.getBatchSize());
+        }
+
        auto clonedNetwork = cloneNetwork(network);
-        Transformation(clonedNetwork);
+        Transformation(clonedNetwork, conf);
        std::unordered_set<std::string> supported;
        std::unordered_set<std::string> unsupported;
        for (details::CNNNetworkIterator itLayer{clonedNetwork.get()}; itLayer != details::CNNNetworkIterator(); itLayer++) {
--- a/inference-engine/src/mkldnn_plugin/nodes/convert.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/convert.cpp
@ -112,7 +112,10 @@ public:
                    exec_cast<PrecisionTrait<Precision::U8>::value_type, PrecisionTrait<Precision::I32>::value_type>(inputs[0], outputs[0]);
                    break;
                default:
-                    std::string errorMsg = "Unsupported precisions!";
+                    std::stringstream ss;
+                    ss << "Unsupported precisions: " << inputs[0]->getTensorDesc().getPrecision() << " -> " << outputs[0]->getTensorDesc().getPrecision();
+                    std::string errorMsg = ss.str();
+
                    if (resp) {
                        errorMsg.copy(resp->msg, sizeof(resp->msg)-1);
                    }
--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_generic_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_generic_node.cpp
@ -158,7 +158,7 @@ void MKLDNNGenericNode::execLayer() {
    InferenceEngine::ResponseDesc resp;
    InferenceEngine::StatusCode rc = impls[0]->execute(inputs, outputs, &resp);
    if (rc != InferenceEngine::OK) {
-        THROW_IE_EXCEPTION << resp.msg;
+        THROW_IE_EXCEPTION << this->getTypeStr() << ":" << this->getName() << ": " << resp.msg;
    }
 }

--- a/inference-engine/src/transformations/include/ngraph_ops/convolution_ie.hpp
+++ b/inference-engine/src/transformations/include/ngraph_ops/convolution_ie.hpp
@ -47,6 +47,7 @@ public:
                  const Strides& dilations,
                  const CoordinateDiff& pads_begin,
                  const CoordinateDiff& pads_end,
+                  const element::Type output_type,
                  const size_t& group = 1,
                  const PadType& auto_pad = PadType::EXPLICIT);

@ -57,9 +58,32 @@ public:
                  const Strides& dilations,
                  const CoordinateDiff& pads_begin,
                  const CoordinateDiff& pads_end,
+                  const element::Type output_type,
                  const size_t& group = 1,
                  const PadType& auto_pad = PadType::EXPLICIT);

+    // KMB compilation support
+    ConvolutionIE(const Output<Node>& data_batch,
+                  const Output<Node>& filters,
+                  const Strides& strides,
+                  const Strides& dilations,
+                  const CoordinateDiff& pads_begin,
+                  const CoordinateDiff& pads_end,
+                  const size_t& group = 1,
+                  const PadType& auto_pad = PadType::EXPLICIT);
+
+    // KMB compilation support
+    ConvolutionIE(const Output<Node>& data_batch,
+                  const Output<Node>& filters,
+                  const Output<Node>& bias,
+                  const Strides& strides,
+                  const Strides& dilations,
+                  const CoordinateDiff& pads_begin,
+                  const CoordinateDiff& pads_end,
+                  const size_t& group = 1,
+                  const PadType& auto_pad = PadType::EXPLICIT);
+
+
    void validate_and_infer_types() override;

    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector & new_args) const override;
@ -90,6 +114,7 @@ protected:
    CoordinateDiff m_pads_end;
    PadType m_auto_pad;
    size_t m_group;
+    element::Type m_output_type;
 };

 }  // namespace op
--- a/inference-engine/src/transformations/include/ngraph_ops/type_relaxed.hpp
+++ b/inference-engine/src/transformations/include/ngraph_ops/type_relaxed.hpp
@ -12,6 +12,7 @@
 #include <transformations_visibility.hpp>

 #include "ngraph/op/op.hpp"
+#include "transformations/low_precision/common/dequantization_op.hpp"

 namespace ngraph {
 namespace op {
@ -190,6 +191,7 @@ void TypeRelaxed<BaseOp>::validate_and_infer_types() {
        BaseOp::get_input_tensor(i).set_tensor_type(old_input_types[i], BaseOp::get_input_partial_shape(i));
    }

+
    // Override (some) output types
    for (size_t i = 0; i < BaseOp::get_output_size(); ++i) {
        auto overridden_output_type = get_overridden_output_type(i);
--- a/inference-engine/src/transformations/include/transformations/low_precision/add.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/add.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/eltwise_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API AddTransformation : public EltwiseBaseTransformation {
+public:
+    AddTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
+    ~AddTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/avg_pool.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/avg_pool.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <algorithm>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API AvgPoolTransformation : public LayerTransformation {
+public:
+    AvgPoolTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/clamp.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/clamp.hpp
@ -0,0 +1,26 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ClampTransformation : public LayerTransformation {
+public:
+    ClampTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/common/dequantization_op.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/common/dequantization_op.hpp
@ -0,0 +1,138 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph/check.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+#include "transformations_visibility.hpp"
+#include "transformations/rt_info/dequantization_attribute.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+// template<typename BaseOp2>
+// class TRANSFORMATIONS_API DequantizationOp : public BaseOp2 {
+// public:
+//    template <typename ... Args>
+//    DequantizationOp(Args&&... args) : BaseOp2(std::forward<Args>(args)...) {
+//        init();
+//    }
+//
+//    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
+//        std::shared_ptr<Node> cloned = BaseOp2::clone_with_new_inputs(inputs);
+//        auto& rtInfo = cloned->get_rt_info();
+//        rtInfo = get_rt_info();
+//
+//        return cloned;
+//    }
+//
+// protected:
+//    void init() {
+//        auto& rtInfo = get_rt_info();
+//        rtInfo["DEQUANTIZATION"] = std::make_shared<ngraph::VariantWrapper<std::string>>("");
+//    }
+// };
+//
+// using DequantizationConvert = DequantizationOp<ngraph::opset1::Convert>;
+// using DequantizationSubtract = DequantizationOp<ngraph::opset1::Subtract>;
+// using DequantizationMultiply = DequantizationOp<ngraph::opset1::Multiply>;
+
+namespace {
+void initRuntimeInfo(ngraph::Node& operation) {
+    auto& rtInfo = operation.get_rt_info();
+    rtInfo["DEQUANTIZATION"] = std::make_shared<VariantWrapper<DequantizationAttr>>(DequantizationAttr());
+}
+
+// #include <ngraph/rt_info.hpp>
+// ngraph::copy_runtime_info(from, to);
+void copyRuntimeInfo(const ngraph::Node& from, ngraph::Node& to) {
+    const auto& rtInfoFrom = from.get_rt_info();
+    auto& rtInfoTo = to.get_rt_info();
+    rtInfoTo = rtInfoFrom;
+}
+
+} // namespace
+
+class TRANSFORMATIONS_API DequantizationConvert : public ngraph::opset1::Convert {
+public:
+    DequantizationConvert(const ngraph::Output<Node>& arg, const ngraph::element::Type& destination_type) :
+        ngraph::opset1::Convert(arg, destination_type) {
+        initRuntimeInfo(*this);
+    }
+
+    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
+        std::shared_ptr<Node> cloned = ngraph::opset1::Convert::clone_with_new_inputs(inputs);
+        copyRuntimeInfo(*this, *cloned);
+        return cloned;
+    }
+};
+
+class TRANSFORMATIONS_API DequantizationSubtract : public ngraph::opset1::Subtract {
+public:
+    DequantizationSubtract(
+        const ngraph::Output<Node>& arg0,
+        const ngraph::Output<Node>& arg1,
+        const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
+        ngraph::opset1::Subtract(arg0, arg1, auto_broadcast) {
+        initRuntimeInfo(*this);
+    }
+
+    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
+        std::shared_ptr<Node> cloned = ngraph::opset1::Subtract::clone_with_new_inputs(inputs);
+        copyRuntimeInfo(*this, *cloned);
+        return cloned;
+    }
+};
+
+class TRANSFORMATIONS_API DequantizationMultiply : public ngraph::opset1::Multiply {
+public:
+    DequantizationMultiply(
+        const Output<Node>& arg0,
+        const Output<Node>& arg1,
+        const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
+        ngraph::opset1::Multiply(arg0, arg1, auto_broadcast) {
+        initRuntimeInfo(*this);
+    }
+
+    DequantizationMultiply(const ngraph::opset1::Multiply& multiply) :
+        ngraph::opset1::Multiply(multiply) {
+        initRuntimeInfo(*this);
+    }
+
+    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
+        std::shared_ptr<Node> cloned = ngraph::opset1::Multiply::clone_with_new_inputs(inputs);
+        copyRuntimeInfo(*this, *cloned);
+        return cloned;
+    }
+};
+
+class TRANSFORMATIONS_API DequantizationAdd : public ngraph::opset1::Add {
+public:
+    DequantizationAdd(
+        const ngraph::Output<Node>& arg0,
+        const ngraph::Output<Node>& arg1,
+        const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
+        ngraph::opset1::Add(arg0, arg1, auto_broadcast) {
+        initRuntimeInfo(*this);
+    }
+
+    std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
+        std::shared_ptr<Node> cloned = ngraph::opset1::Add::clone_with_new_inputs(inputs);
+        copyRuntimeInfo(*this, *cloned);
+        return cloned;
+    }
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/common/fake_quantize_dequantization.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/common/fake_quantize_dequantization.hpp
@ -0,0 +1,41 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <tuple>
+#include <ngraph/ngraph.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+typedef std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> FakeQuantizeDequantizationValues;
+
+class FakeQuantizeDequantization {
+public:
+    FakeQuantizeDequantization();
+
+    FakeQuantizeDequantization(
+        Output<Node> data,
+        std::shared_ptr<ngraph::opset1::Convert> convert,
+        std::shared_ptr<ngraph::opset1::Subtract> subtract,
+        std::shared_ptr<ngraph::opset1::Multiply> multiply);
+
+    bool empty() const;
+    bool isShared() const;
+    bool isLowPrecision() const;
+    static bool checkElementwise(const std::shared_ptr<ngraph::Node>& elementwise);
+
+    Output<Node> data;
+    std::shared_ptr<opset1::Convert> convert;
+    std::shared_ptr<opset1::Subtract> subtract;
+    std::shared_ptr<opset1::Multiply> multiply;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/common/ie_lpt_exception.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/common/ie_lpt_exception.hpp
@ -0,0 +1,52 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <exception>
+#include <string>
+#include <ngraph/node.hpp>
+#include <transformations_visibility.hpp>
+
+/**
+* @def THROW_TRANSFORMATION_EXCEPTION_LPT
+* @brief A macro used to throw the exception with a notable description for low precision transformations
+*/
+#define THROW_IE_LPT_EXCEPTION(node) throw ::ngraph::pass::low_precision::InferenceEngineLptException(__FILE__, __LINE__, node)
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API InferenceEngineException : std::exception {
+    std::shared_ptr<std::ostringstream> buffer;
+    mutable std::string buffer_str;
+public:
+    template <typename T>
+    InferenceEngineException& operator<< (const T& x) {
+        *buffer << x;
+        return *this;
+    }
+
+    const char* what() const noexcept override {
+        buffer_str = buffer->str();
+        return buffer_str.c_str();
+    }
+};
+
+#define THROW_TRANSFORMATION_EXCEPTION throw ::ngraph::pass::low_precision::InferenceEngineException() << __FILE__ << ":" << __LINE__ << " "
+
+
+class TRANSFORMATIONS_API InferenceEngineLptException : public InferenceEngineException {
+public:
+    InferenceEngineLptException(const std::string& filename, const size_t line, const Node& node) {
+        *this
+            << filename << ":" << line << " Exception during low precision transformation for "
+            << node << " node with type '" << node.get_type_name() << "', name '" << node.get_friendly_name() << "'. ";
+    }
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/common/subgraph.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/common/subgraph.hpp
@ -0,0 +1,41 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph/check.hpp>
+#include <ngraph/opsets/opset1.hpp>
+#include "../ilayer_transformations_manager.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class Subgraph {
+public:
+    Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager);
+
+    bool fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers);
+    bool empty() const;
+
+    std::vector<std::shared_ptr<ngraph::Node>> quantizationLayers;
+    std::vector<std::shared_ptr<ngraph::opset1::Concat>> concatLayers;
+    std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> layers;
+
+private:
+    bool fillSubgraphForQuantization(const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize, std::unordered_set<std::string>& handledLayers);
+    bool fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers);
+    bool fill(const std::shared_ptr<ngraph::Node>& concat, std::unordered_set<std::string>& handledLayers);
+    const ngraph::pass::ILayerTransformationsManager* layerTransformationsManager;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/concat.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/concat.hpp
@ -0,0 +1,56 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <algorithm>
+#include <functional>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+
+#include "layer_transformation.hpp"
+#include "common/subgraph.hpp"
+#include "common/fake_quantize_dequantization.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ConcatTransformation : public LayerTransformation {
+public:
+    ConcatTransformation(const Params& params) : LayerTransformation(params) {}
+    ~ConcatTransformation() override {};
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+
+protected:
+    void addDequantizationLayers(
+        TransformationContext& context,
+        ngraph::pass::low_precision::Subgraph& subgraph,
+        std::function<void(
+            std::shared_ptr<ngraph::Node> layer,
+            const std::string originalLayerName,
+            std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const;
+
+    static bool isHandled(
+        const TransformationContext& context,
+        const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations);
+
+private:
+    size_t getMinQuantizationLevels(
+        const DataPrecision& dataPrecision,
+        const float maxOutputInterval,
+        const std::vector<QuantizationDetails>& quantizationLayersDetails,
+        const float outputLowValue,
+        const float outputHighValue) const;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/concat_multi_channels.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/concat_multi_channels.hpp
@ -0,0 +1,47 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <string>
+#include <unordered_map>
+
+#include <ngraph/ngraph.hpp>
+
+#include "concat.hpp"
+#include "common/subgraph.hpp"
+#include "common/fake_quantize_dequantization.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ConcatMultiChannelsTransformation : public ConcatTransformation {
+public:
+    ConcatMultiChannelsTransformation(const Params& params) : ConcatTransformation(params) {}
+    ~ConcatMultiChannelsTransformation() override {};
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+
+private:
+    static void fillDequantization(
+        std::shared_ptr<ngraph::Node> layer,
+        std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
+        std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate);
+
+    static void fillQuantization(const std::shared_ptr<ngraph::Node> layer, std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes);
+
+    static void updateDequantizationShapesIfNecessary(
+        std::shared_ptr<ngraph::Node> layer,
+        std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
+        std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize);
+
+    bool isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/convert.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/convert.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ConvertTransformation : public LayerTransformation {
+public:
+    ConvertTransformation(const Params& params) : LayerTransformation(params) {}
+    ~ConvertTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/convolution.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/convolution.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "weightable_layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ConvolutionTransformation : public WeightableLayerTransformation {
+public:
+    ConvolutionTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/depth_to_space.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/depth_to_space.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "transparent_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API DepthToSpaceTransformation : public TransparentBaseTransformation {
+public:
+    DepthToSpaceTransformation(const Params& params) : TransparentBaseTransformation(params) {}
+    ~DepthToSpaceTransformation() override {}
+    bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/eltwise_base_transformation.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/eltwise_base_transformation.hpp
@ -0,0 +1,29 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API EltwiseBaseTransformation : public LayerTransformation {
+public:
+    EltwiseBaseTransformation(const Params& params) : LayerTransformation(params) {}
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+
+    static bool isBroadcasted(const Shape& shape) noexcept;
+protected:
+    int getNotEmpty(const std::shared_ptr<Node>& eltwise) const;
+    std::pair<int, int> getMultiplyConstBranch(const std::shared_ptr<Node>& eltwise) const;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/fake_quantize.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/fake_quantize.hpp
@ -0,0 +1,33 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+#include "transformations/low_precision/fuse_fake_quantize.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API FakeQuantizeTransformation : public LayerTransformation {
+public:
+    FakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~FakeQuantizeTransformation() override {};
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+
+    static bool checkElementwise(const std::shared_ptr<Node>& eltwise);
+private:
+    std::shared_ptr<opset1::FakeQuantize> fuseElementwise(
+        TransformationContext& context,
+        const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/fuse_convert.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/fuse_convert.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+#include "transformations/low_precision/eltwise_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API FuseConvertTransformation : public LayerTransformation {
+public:
+    FuseConvertTransformation(const Params& params) : LayerTransformation(params) {}
+    ~FuseConvertTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/fuse_fake_quantize.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/fuse_fake_quantize.hpp
@ -0,0 +1,31 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API FuseFakeQuantizeTransformation : public LayerTransformation {
+public:
+    FuseFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~FuseFakeQuantizeTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+
+private:
+    std::shared_ptr<opset1::FakeQuantize> handle(
+        TransformationContext& context,
+        const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/fuse_multiply_to_fake_quantize.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/fuse_multiply_to_fake_quantize.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API FuseMultiplyToFakeQuantizeTransformation : public LayerTransformation {
+public:
+    FuseMultiplyToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~FuseMultiplyToFakeQuantizeTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/fuse_subtract_to_fake_quantize.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/fuse_subtract_to_fake_quantize.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API FuseSubtractToFakeQuantizeTransformation : public LayerTransformation {
+public:
+    FuseSubtractToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~FuseSubtractToFakeQuantizeTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/group_convolution.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/group_convolution.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "convolution.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API GroupConvolutionTransformation : public ConvolutionTransformation {
+public:
+    GroupConvolutionTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/ilayer_transformations_manager.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/ilayer_transformations_manager.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/node.hpp>
+#include "transformations_visibility.hpp"
+
+namespace ngraph {
+namespace pass {
+
+/**
+ * @brief low precision transformation component interface.
+  */
+class TRANSFORMATIONS_API ILayerTransformationsManager {
+public:
+    virtual bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept = 0;
+    virtual bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept = 0;
+};
+
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/interpolate.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/interpolate.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "transparent_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API InterpolateTransformation : public LayerTransformation {
+public:
+    InterpolateTransformation(const Params& params) : LayerTransformation(params) {}
+    ~InterpolateTransformation() override {}
+    bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/iparams_manager.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/iparams_manager.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <vector>
+#include <ngraph/ngraph.hpp>
+#include <transformations_visibility.hpp>
+
+namespace ngraph {
+namespace pass {
+
+/**
+ * @brief low precision transformation component interface.
+  */
+class TRANSFORMATIONS_API IParamsManager {
+public:
+    // TODO FIXME: it is not correct to have a string as a key here, try to use NodeTypeInfo
+    virtual std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept = 0;
+};
+
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/layer_transformation.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/layer_transformation.hpp
@ -0,0 +1,380 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <algorithm>
+#include <limits>
+#include <list>
+#include <memory>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph/pass/graph_rewrite.hpp>
+
+#include "iparams_manager.hpp"
+#include "ilayer_transformations_manager.hpp"
+#include "transformation_context.hpp"
+#include "quantization_details.hpp"
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+#include "common/fake_quantize_dequantization.hpp"
+
+/*****************************************************
+ * Debug capability
+ *  - ORIGINAL_MODEL_PATH : Specify with existing folder name
+ *    to serialize original model into it (XML & BIN extensions were added)
+ *  - TRANSFORMED_MODEL_PATH : Specify with existing folder name
+ *    to serialize original model into it (XML & BIN extensions were added)
+ *  - LPT_PRINT_DEQUANTIZATION_INFO : Define it to enable
+ *    dequantization layers printing
+ *  - LPT_DISPLAY_PRECISION : Define it to to display precision info
+ *    during low precision transformations
+ *
+ *****************************************************/
+// #define LPT_ORIGINAL_MODEL_PATH "/localdisk/orig.model"
+// #define LPT_TRANSFORMED_MODEL_PATH "/localdisk/transformed.model"
+// #define LPT_PRINT_DEQUANTIZATION_INFO
+// #define LPT_DISPLAY_PRECISION
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API DataPrecision {
+public:
+    DataPrecision() : precision(element::undefined), min(0.f), max(0.f), hasZeroPoint(false) {}
+
+    DataPrecision(const element::Type precision, const float min, const float max, const bool hasZeroPoint) :
+            precision(precision),
+            min(min),
+            max(max),
+            hasZeroPoint(hasZeroPoint) {}
+
+    static float getMinValue(const element::Type precision, const size_t levels) {
+        if (precision == element::i8) {
+            if (levels == 255) {
+                return static_cast<float>(std::numeric_limits<signed char>::lowest()) + 1.f;
+            } else if (levels == 256) {
+                return static_cast<float>(std::numeric_limits<signed char>::lowest());
+            } else {
+                NGRAPH_CHECK(false, "unexpected levels ", levels, " for precision ", precision);
+            }
+        } else if (precision == element::u8) {
+            return static_cast<float>(std::numeric_limits<unsigned char>::lowest());
+        } else if (precision == element::f16) {
+            return -1.0e15f;
+        } else if (precision == element::f32) {
+            return std::numeric_limits<float>::lowest();
+        } else {
+            NGRAPH_CHECK(false, "unexpected precision ", precision);
+        }
+    }
+
+    static float getMaxValue(const element::Type precision, const size_t levels) {
+        if ((levels != 255ul) && (levels != 256ul)) {
+            THROW_TRANSFORMATION_EXCEPTION << "unexpected levels " << levels;
+        }
+
+        if (precision == element::i8) {
+            return static_cast<float>(std::numeric_limits<signed char>::max());
+        } else if (precision == element::u8) {
+            return static_cast<float>(std::numeric_limits<unsigned char>::max()) - (256 - levels);
+        } else if (precision == element::f16) {
+            return 1.0e15f;
+        } else if (precision == element::f32) {
+            return std::numeric_limits<float>::max();
+        } else {
+            THROW_TRANSFORMATION_EXCEPTION << "unexpected precision " << precision;
+        }
+    }
+
+    static bool hasNegativeValues(const std::vector<float>& values) {
+        for (const float value : values) {
+            if (value < 0.0) {
+                return true;
+            }
+        }
+        return false;
+    }
+
+    element::Type precision;
+    float min;
+    float max;
+    bool hasZeroPoint;
+
+    static element::Type getPrecision(const std::vector<float>& outputLowValues, const std::vector<float>& outputHighValues) {
+        return (hasNegativeValues(outputLowValues) || hasNegativeValues(outputHighValues)) ? element::i8 : element::u8;
+    }
+
+    static element::Type getPrecision(const size_t /* quantizationLevels */, const bool signedInterval) {
+        return signedInterval ? element::i8 : element::u8;
+    }
+
+    static float getMin(const size_t quantizationLevels, const bool signedInterval) {
+        if (quantizationLevels == 255) {
+            return signedInterval  ? -127.0 : 0.0;
+        } else if (quantizationLevels == 256) {
+            return signedInterval ? -128.0 : 0.0;
+        } else {
+            // THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
+            // FIXME: not completed
+            return signedInterval ? -128.0 : 0.0;
+        }
+    }
+
+    static float getMax(const size_t quantizationLevels, const bool signedInterval) {
+        if ((quantizationLevels == 255) || (quantizationLevels == 256)) {
+            return signedInterval ? 127.0 : 255.0;
+        } else {
+            // THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
+            // FIXME: not completed
+            // return quantizationLevels - 1.0;
+            return signedInterval ? 127.0 : 255.0;
+        }
+    }
+};
+
+inline bool operator==(const DataPrecision& value1, const DataPrecision& value2) {
+    return
+            (value1.precision == value2.precision) &&
+            (value1.min == value1.min) &&
+            (value1.max == value1.max);
+}
+
+inline bool operator!=(const DataPrecision& value1, const DataPrecision& value2) {
+    return !(value1 == value2);
+}
+
+inline std::ostream &operator << (std::ostream &os, const DataPrecision& value) {
+    os << value.precision << ", min: " << value.min << ", max: " << value.max;
+    return os;
+}
+
+// Base class for all LP transformations, holds some common data structures
+class TRANSFORMATIONS_API LayerTransformation {
+public:
+    enum QuantizedTensorAlignment {
+        None,
+        UpdateLevel
+    };
+
+    class Params {
+    public:
+        Params(
+                const bool updatePrecisions = true,
+                const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations = QuantizedTensorAlignment::UpdateLevel,
+                const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights = QuantizedTensorAlignment::None,
+                bool supportAsymmetricQuantization = false,
+                std::vector<element::Type> precisionsOnActivations = { element::u8, element::i8 },
+                std::vector<element::Type> precisionsOnWeights = { element::i8 }) :
+                updatePrecisions(updatePrecisions),
+                quantizedTensorAlignmentOnActivations(quantizedTensorAlignmentOnActivations),
+                quantizedTensorAlignmentOnWeights(quantizedTensorAlignmentOnWeights),
+                supportAsymmetricQuantization(supportAsymmetricQuantization),
+                precisionsOnActivations(precisionsOnActivations),
+                precisionsOnWeights(precisionsOnWeights) {
+            if (precisionsOnActivations.size() == 0ul) {
+                THROW_TRANSFORMATION_EXCEPTION << "precisions on activations are not specisifed";
+            }
+
+            if (precisionsOnWeights.size() == 0ul) {
+                THROW_TRANSFORMATION_EXCEPTION << "precisions on weights are not specisifed";
+            }
+        }
+
+        Params& setUpdatePrecisions(const bool updatePrecisions) {
+            this->updatePrecisions = updatePrecisions;
+            return *this;
+        }
+
+        Params& setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations) {
+            this->quantizedTensorAlignmentOnActivations = quantizedTensorAlignmentOnActivations;
+            return *this;
+        }
+
+        Params& setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights) {
+            this->quantizedTensorAlignmentOnWeights = quantizedTensorAlignmentOnWeights;
+            return *this;
+        }
+
+        Params& setSupportAsymmetricQuantization(const bool supportAsymmetricQuantization) {
+            this->supportAsymmetricQuantization = supportAsymmetricQuantization;
+            return *this;
+        }
+
+        Params& setPrecisionsOnActivations(const std::vector<element::Type>& precisionsOnActivations) {
+            this->precisionsOnActivations = precisionsOnActivations;
+            return *this;
+        }
+
+        Params& setPrecisionsOnWeights(const std::vector<element::Type>& precisionsOnWeights) {
+            this->precisionsOnWeights = precisionsOnWeights;
+            return *this;
+        }
+
+        bool updatePrecisions;
+        QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
+        QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
+        bool supportAsymmetricQuantization;
+        std::vector<element::Type> precisionsOnActivations;
+        std::vector<element::Type> precisionsOnWeights;
+    };
+
+    class PrecisionDetails {
+    public:
+        PrecisionDetails(const element::Type& precision, const bool hasNegativeOutput, const bool hasZeroPoint) :
+                precision(precision),
+                hasNegativeOutput(hasNegativeOutput),
+                hasZeroPoint(hasZeroPoint) {}
+
+        const element::Type precision;
+        const bool hasNegativeOutput;
+        const bool hasZeroPoint;
+    };
+
+    LayerTransformation(const Params& params);
+    virtual ~LayerTransformation() = default;
+    virtual void registerMatcherIn(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const = 0;
+    virtual bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const = 0;
+
+    void setParamsManager(IParamsManager* paramsManager) noexcept;
+    void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
+
+    void setUpdatePrecisions(const bool updatePrecisions);
+    void setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
+    void setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
+
+    void setQuantizationIntervalAsymmetryThreshold(const float value);
+    void setZeroThreshold(const float value);
+    void setMinQuantizationLevels(const size_t levels);
+
+    const std::vector<element::Type>& getPrecisionsOnActivations() const;
+    const std::vector<element::Type>& getPrecisionsOnWeights() const;
+
+    virtual bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const;
+
+    bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const size_t parentIndex = 0ul) const;
+
+    bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const FakeQuantizeDequantization& dequantization) const;
+
+    PrecisionDetails getPrecisionDetails(const QuantizationDetails& quantizationDetails) const;
+
+    // return true if operation can be quantized and false otherwise
+    // for example: if convolution operation weights are not quantized, then isQuantize returns false and true otherwise
+    // note: dequantization operations on activations are absent during method execution
+    virtual bool isQuantized(std::shared_ptr<Node> layer) const noexcept;
+
+    // return true if operation can be preserved for precision
+    // note: dequantization operations on activations are absent during method execution
+    virtual bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept = 0;
+
+    DataPrecision getDataPrecision(
+            std::shared_ptr<Node> layer,
+            const QuantizationDetails& quantizationDetails,
+            const bool onWeights) const;
+
+    void fillAvailablePrecisions(std::shared_ptr<Node> layer, std::vector<element::Type>& availablePrecisions) const;
+
+    std::vector<std::shared_ptr<Node>> getChildrenRecursivelyExceptPrecisionPreserved(const std::shared_ptr<Node>& op) const noexcept;
+
+protected:
+#ifdef LPT_PRINT_DEQUANTIZATION_INFO
+    static void printDequantizationInfo(const std::shared_ptr<Node>& layer);
+    static void printDequantizationInfo(const DataPrecision& dataPrecision);
+    static void printDequantizationValues(
+        const std::vector<float>& dequantizationScales,
+        const std::vector<float>& dequantizationShifts);
+#endif
+
+    bool updatePrecisions;
+    QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
+    QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
+    bool supportAsymmetricQuantization;
+    std::vector<element::Type> precisionsOnActivations;
+    std::vector<element::Type> precisionsOnWeights;
+
+    // absolute value, used to determine quantization interval asymmetry
+    float quantizationIntervalAsymmetryThreshold;
+    // absolute value, used to determine zero
+    float zeroThreshold;
+    size_t minQuantizationLevels;
+
+    static const char originalLayerPostfix[];
+    IParamsManager* paramsManager;
+    ILayerTransformationsManager* layerTransformationsManager;
+
+protected:
+    std::shared_ptr<ngraph::Node> separateInStandaloneBranch(std::shared_ptr<ngraph::Node> node) const;
+
+    std::shared_ptr<ngraph::Node> moveDequantizationAfter(
+        TransformationContext &context,
+        const std::shared_ptr<ngraph::Node>& operation,
+        const FakeQuantizeDequantization& dequantization,
+        const bool updatePrecision,
+        const bool moveSubtract = true) const;
+
+    void fuseConvertIfPossible(const std::shared_ptr<ngraph::Node>& operation) const;
+
+    void updateOutput(
+        TransformationContext &context,
+        std::shared_ptr<ngraph::Node> lastNode,
+        std::shared_ptr<ngraph::Node> originalNode) const;
+
+    void updateOutput(
+        TransformationContext& context,
+        std::shared_ptr<ngraph::Node> lastNode,
+        std::string originalName) const;
+
+    void addPattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context, std::shared_ptr<Node> patternRoot) const;
+
+    template <typename Operation>
+    void addSingleNodePattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const {
+        using namespace ngraph;
+
+        auto is_op_type = [](std::shared_ptr<Node> n) {
+            return !!as_type_ptr<Operation>(n);
+        };
+        auto p_node = std::make_shared<pattern::op::Label>(element::f32, Shape{}, is_op_type);
+
+        addPattern(pass, context, p_node);
+    }
+};
+
+inline std::ostream &operator << (std::ostream &os, const LayerTransformation::QuantizedTensorAlignment& value) {
+    switch (value) {
+        case LayerTransformation::QuantizedTensorAlignment::None: {
+            os << "None";
+            break;
+        }
+        case LayerTransformation::QuantizedTensorAlignment::UpdateLevel: {
+            os << "UpdateLevel";
+            break;
+        }
+        default: {
+            os << static_cast<int>(value);
+            break;
+        }
+    }
+    return os;
+}
+
+inline std::ostream &operator << (std::ostream &os, const std::vector<element::Type>& values) {
+    os << "{";
+    for (size_t i = 0; i < values.size(); ++i) {
+        const element::Type& value = values[i];
+        if (i > 0) {
+            os << value;
+        } else {
+            os << ", " << value;
+        }
+    }
+    os << "}";
+    return os;
+}
+
+typedef std::shared_ptr<LayerTransformation> LayerTransformationPtr;
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/main.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/main.hpp
@ -0,0 +1,36 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+
+#include <ie_api.h>
+
+#include <ngraph/ngraph.hpp>
+
+#include <ngraph/pass/graph_rewrite.hpp>
+#include <transformations/low_precision/ilayer_transformations_manager.hpp>
+#include <transformations/low_precision/iparams_manager.hpp>
+
+using namespace std;
+
+
+namespace ngraph {
+namespace pass {
+
+class TRANSFORMATIONS_API LowPrecisionTransformations: public ngraph::pass::GraphRewrite, IParamsManager, ILayerTransformationsManager {
+public:
+    bool run_on_function(std::shared_ptr<ngraph::Function> f) override;
+
+    // IParamsManager interface implementation
+    std::vector<element::Type> getPrecisionsOnActivations(const NodeTypeInfo& layerName) const noexcept override;
+
+    // ILayerTransformationsManager interface implementation
+    bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+}// namespace pass
+}// namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/mat_mul.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/mat_mul.hpp
@ -0,0 +1,26 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API MatMulTransformation : public LayerTransformation {
+public:
+    MatMulTransformation(const Params& params) : LayerTransformation(params) {}
+    ~MatMulTransformation() override {}
+    bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/max_pool.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/max_pool.hpp
@ -0,0 +1,26 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API MaxPoolTransformation : public LayerTransformation {
+public:
+    MaxPoolTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/multiply.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/multiply.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/eltwise_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API MultiplyTransformation : public EltwiseBaseTransformation {
+public:
+    MultiplyTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
+    ~MultiplyTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/multiply_to_group_convolution.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/multiply_to_group_convolution.hpp
@ -0,0 +1,33 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API MultiplyToGroupConvolutionTransformation : public LayerTransformation {
+public:
+    MultiplyToGroupConvolutionTransformation(const Params& params) : LayerTransformation(params), groupSize(1ul) {}
+    ~MultiplyToGroupConvolutionTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
+
+    void setGroupSize(const size_t groupSize);
+    size_t getGroupSize() const;
+private:
+    size_t groupSize;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/mvn.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/mvn.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API MVNTransformation : public LayerTransformation {
+public:
+    MVNTransformation(const Params& params) : LayerTransformation(params) {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/network_helper.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/network_helper.hpp
@ -0,0 +1,245 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <cmath>
+#include <memory>
+#include <string>
+#include <vector>
+#include <unordered_set>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph/pattern/matcher.hpp>
+#include <ngraph/opsets/opset1.hpp>
+#include "ngraph_ops/type_relaxed.hpp"
+#include <ngraph/rt_info.hpp>
+
+#include "transformation_context.hpp"
+#include "quantization_details.hpp"
+#include "transformations/utils/utils.hpp"
+#include "common/fake_quantize_dequantization.hpp"
+#include "common/ie_lpt_exception.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+/**
+* @brief NetworkHelper class encapsulates manipulations with nGraph function.
+*/
+class TRANSFORMATIONS_API NetworkHelper {
+public:
+    // Return true if `type` can be castable to at least one of `type`
+    static bool is_castable_to_one_of(NodeTypeInfo type, const std::unordered_set<NodeTypeInfo>& types);
+
+    static std::vector<Input<Node>> consumer_inputs(std::shared_ptr<Node> node);
+
+    // Collect and return a vector with all nodes that consumes any of the `node` output
+    static std::vector<std::shared_ptr<Node>> consumers(std::shared_ptr<Node> node);
+
+    static Shape alignShapeForChannelDim(const Shape& shape, Rank rank);
+
+    // return true if at least one child uses layer on weights
+    static bool onWeights(std::shared_ptr<Node> layer);
+
+    template <typename OperationType>
+    static std::shared_ptr<Node> setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> operation, const element::Type& precision);
+
+    template <typename OperationType>
+    static std::shared_ptr<Node> setOutDataPrecision(std::shared_ptr<OperationType> operation, const element::Type& precision);
+
+    static size_t getOutputChannelsCount(std::shared_ptr<const Node> layer, bool isOnWeights = false);
+
+    static std::vector<std::shared_ptr<Node>> getParentsRecursivelyExceptTypes(
+        std::shared_ptr<Node> layer,
+        const std::unordered_set<NodeTypeInfo>& exceptionLayerTypes = {},
+        const int portIndex = -1);
+
+    static size_t getInputChannelsCount(std::shared_ptr<Node> layer);
+
+    static size_t getGroupsCount(std::shared_ptr<Node> layer);
+
+    // Remove node by connecting its 0th input with 0th output
+    static void removeLayer(std::shared_ptr<Node> node);
+
+    static std::shared_ptr<Node> swapMultiplyAndAdd(std::shared_ptr<opset1::Add> addAfterMultiply, const int multiplyBranch);
+
+    static void copyInfo(const std::shared_ptr<Node>& source, const std::shared_ptr<Node>& target);
+
+    static void cleanRunTimeInfo(const std::shared_ptr<Node>& layer);
+
+    static bool isScalarLike(std::shared_ptr<opset1::Constant> constant);
+
+    static bool isZero(std::shared_ptr<opset1::Constant> constant);
+
+    static std::shared_ptr<opset1::Constant> toScalar(std::shared_ptr<opset1::Constant> constant);
+
+    static std::shared_ptr<Node> getConstantInput(std::shared_ptr<Node> node);
+
+    // Optimizes the series of multiplies after a given output port
+    static std::shared_ptr<ngraph::opset1::Multiply> optimizeMultipliesAfter(std::shared_ptr<Node> multiply);
+
+    static std::shared_ptr<opset1::Constant> roundWithTolerance(std::shared_ptr<Node> node, element::Type target_type, float tolerance = 0.1);
+
+    static std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> decomposeFakeQuantize(
+        std::shared_ptr<opset1::FakeQuantize> fq,
+        const element::Type precision,
+        const float min,
+        const float max,
+        const bool hasZeroPoint,
+        const bool updatePrecision);
+
+    static std::shared_ptr<opset1::FakeQuantize> updateFakeQuantize(
+        std::shared_ptr<opset1::FakeQuantize> fq,
+        element::Type precision,
+        float min,
+        float max);
+
+    static FakeQuantizeDequantization makeDequantization(
+        const float dequantizationMul,
+        const float dequantizationSub,
+        const ngraph::element::Type originalPrecision,
+        const ngraph::Shape dataNodeOutputShape,
+        element::Type precision,
+        float min,
+        float max);
+
+    static FakeQuantizeDequantization createDequantizationFromFakeQuantize(
+        std::shared_ptr<opset1::FakeQuantize> fq,
+        element::Type precision,
+        float min,
+        float max,
+        const bool hasZeroPoint,
+        const bool updatePrecision);
+
+    static FakeQuantizeDequantization getDequantization(const std::shared_ptr<Node> node, const size_t parentIndex = 0ul, const bool inPlace = false);
+
+    static std::shared_ptr<Node> optimizeSubtract(std::shared_ptr<opset1::Subtract> add);
+
+    class InsertDequantizationResult {
+    public:
+        InsertDequantizationResult(
+            const std::shared_ptr<Node>& newOperation,
+            const std::shared_ptr<Node>& lastDequantization) : newOperation(newOperation), lastDequantization(lastDequantization) {}
+
+        std::shared_ptr<Node> newOperation;
+        std::shared_ptr<Node> lastDequantization;
+    };
+
+    static InsertDequantizationResult moveDequantizationAfter(
+        const std::shared_ptr<ngraph::Node>& operation,
+        const FakeQuantizeDequantization& dequantization,
+        const bool updatePrecision,
+        const bool moveSubtract);
+
+    // TODO: rename: fuseConvertIfPossible
+    static void removeConvertIfPossible(
+        const std::shared_ptr<ngraph::Node>& operation,
+        const FakeQuantizeDequantization& dequantization);
+
+    static bool checkConstantValuePrecision(const element::Type expectedPrecision, const std::shared_ptr<Node>& constant);
+
+    static size_t getChildInputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
+
+    static size_t getParentOutputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
+
+    static std::vector<Output<Node>> getInputs(const std::shared_ptr<ngraph::Node>& node);
+
+    static FakeQuantizeDequantizationValues createEmptyValues(const FakeQuantizeDequantization& dequantization);
+
+    static bool isZeroConst(const std::shared_ptr<Node>& node);
+
+    static std::shared_ptr<Node> toScalarIfPossible(std::shared_ptr<Node> node);
+
+    static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq);
+    static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues);
+
+    // multi-precision constant folding
+    // handles only specific case: Constant -> [dequantization operations] -> [node]
+    static void foldDequantization(std::shared_ptr<Node>& node, const size_t branchIndex, const bool inPlace = false);
+
+private:
+    static std::shared_ptr<Node> foldFakeQuantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues, const bool roundValuesWasSet);
+
+    // 1  - on weights
+    // 0  - weightable layer was not found
+    // -1 - on activations
+    static int onWeightsInDepth(std::shared_ptr<Node> layer);
+};
+
+template <typename OperationType>
+std::shared_ptr<Node> NetworkHelper::setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> layer, const element::Type& precision) {
+    // check if it already exteded operation node
+    if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
+        relaxed_layer->set_overridden_output_type(precision);
+        std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
+        return layer;
+    } else {
+        THROW_IE_LPT_EXCEPTION(*layer) << "TypeRelaxed type is expected";
+    }
+}
+
+template <typename OperationType>
+std::shared_ptr<Node> NetworkHelper::setOutDataPrecision(std::shared_ptr<OperationType> layer, const element::Type& precision) {
+    // check if it already exteded operation node
+    if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
+        relaxed_layer->set_overridden_output_type(precision);
+        std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
+        return layer;
+    } else {
+        // Make such replacements in advance for all supported polymorphic layer types
+        // extend a node with new semantics: overriden output data_type
+        // OperationType should be a real type of an object, otherwise it will lead to undefined behavior
+        auto replacement = std::make_shared<ngraph::op::TypeRelaxed<OperationType>>(*layer, precision);
+        copy_runtime_info(layer, replacement);
+        replace_node(layer, replacement);
+        return replacement;
+    }
+}
+
+template <typename T>
+std::shared_ptr<Node> make_op_pattern(const ngraph::NodeVector& args) {
+    return std::make_shared<ngraph::pattern::op::Any>(element::undefined, PartialShape{}, [](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); }, args);
+}
+
+template <typename T>
+std::shared_ptr<Node> make_op_label() {
+    return std::make_shared<ngraph::pattern::op::Label>(
+            element::undefined,
+            PartialShape{},
+            [](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); });
+}
+
+template <typename T, typename... Args>
+std::shared_ptr<Node> fold(Args&&... args) {
+    auto node = std::make_shared<T>(std::forward<Args>(args)...);
+    if (node->get_output_size() == 1) {
+        OutputVector folded(node->get_output_size());
+        if (node->constant_fold(folded, node->input_values())) {
+            return folded[0].get_node_shared_ptr();
+        }
+    }
+    return node;
+}
+
+template <typename T, typename... Args>
+std::shared_ptr<Node> fold_reshape(Args&&... args) {
+    std::shared_ptr<Node> node = std::make_shared<T>(std::forward<Args>(args)...);
+    if (node->get_output_size() == 1) {
+        OutputVector folded;
+        if (is_type<opset1::Constant>(node->input_value(0).get_node_shared_ptr()) &&
+            is_type<opset1::Constant>(node->input_value(1).get_node_shared_ptr())) {
+            return std::make_shared<opset1::Constant>(
+                    node->get_input_element_type(0),
+                    Shape(as_type_ptr<opset1::Constant>(node->input_value(1).get_node_shared_ptr())->template cast_vector<size_t>()),
+                    as_type_ptr<opset1::Constant>(node->input_value(0).get_node_shared_ptr())->get_data_ptr());
+        }
+    }
+    return node;
+}
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/normalize_l2.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/normalize_l2.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API NormalizeL2Transformation : public LayerTransformation {
+public:
+    NormalizeL2Transformation(const Params& params) : LayerTransformation(params) {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/prelu.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/prelu.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API PReluTransformation : public LayerTransformation {
+public:
+    PReluTransformation(const Params& params) : LayerTransformation(params) {}
+    ~PReluTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/quantization_details.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/quantization_details.hpp
@ -0,0 +1,89 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ostream>
+#include <vector>
+
+#include <transformations_visibility.hpp>
+
+#include <ngraph/node.hpp>
+#include <ngraph/opsets/opset1.hpp>
+#include <ngraph/type.hpp>
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API QuantizationDetails {
+public:
+    QuantizationDetails();
+    QuantizationDetails(const QuantizationDetails& quantizationDetails);
+    QuantizationDetails(
+            const size_t levels,
+            const std::vector<float>& inputLowValues,
+            const std::vector<float>& inputHighValues,
+            const std::vector<float>& outputLowValues,
+            const std::vector<float>& outputHighValues,
+            const size_t inputIntervalsCount,
+            const size_t outputIntervalsCount,
+            const size_t outputChannelsCount);
+
+    static bool outputLayoutIsSupported(std::shared_ptr<opset1::FakeQuantize> quantize);
+
+    static void getInputIntervals(
+            std::shared_ptr<opset1::FakeQuantize> quantize,
+            std::vector<float>& inputLowValues,
+            std::vector<float>& inputHighValues,
+            size_t& inputIntervalsCount);
+
+    static void getOutputIntervals(
+            std::shared_ptr<opset1::FakeQuantize> quantize,
+            std::vector<float>& outputLowValues,
+            std::vector<float>& outputHighValues,
+            size_t& outputIntervalsCount);
+
+    static QuantizationDetails getDetails(std::shared_ptr<opset1::FakeQuantize>);
+    bool hasNegativeOutput() const;
+    float maxOutput(const size_t channel) const;
+    float maxInput(const size_t channel) const;
+
+    float maxOutputHigh() const;
+    float minOutputLow() const;
+
+    float getInputLowValue(const size_t channel) const;
+    float getInputHighValue(const size_t channel) const;
+    float getOutputLowValue(const size_t channel) const;
+    float getOutputHighValue(const size_t channel) const;
+
+    static bool isSupportedLevel(const size_t level);
+
+    const size_t levels;
+    const std::vector<float> inputLowValues;
+    const std::vector<float> inputHighValues;
+    const std::vector<float> outputLowValues;
+    const std::vector<float> outputHighValues;
+    const size_t inputIntervalsCount;
+    const size_t outputIntervalsCount;
+    const size_t outputChannelsCount;
+
+private:
+    QuantizationDetails &operator=(const QuantizationDetails & /*target*/) { return *this; }
+    static void validate(std::shared_ptr<Node> constantLayer);
+    static std::vector<float> getBlobValue(std::shared_ptr<Node> constantLayer);
+};
+
+inline std::ostream &operator << (std::ostream &os, const QuantizationDetails& value) {
+    os << "levels: " << value.levels <<
+       ", input 1/" << value.inputIntervalsCount << ": [" << value.getInputLowValue(0) << " : " << value.getInputHighValue(0) << "], " <<
+       ", output 1/" << value.outputIntervalsCount << ": [" << value.getOutputLowValue(0) << " : " << value.getOutputHighValue(0) << "]";
+    return os;
+}
+
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/relu.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/relu.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ReluTransformation : public LayerTransformation {
+public:
+    ReluTransformation(const Params& params) : LayerTransformation(params) {}
+    ~ReluTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/reshape.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/reshape.hpp
@ -0,0 +1,32 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <algorithm>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API ReshapeTransformation : public LayerTransformation {
+public:
+    ReshapeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~ReshapeTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+
+    static bool canBeTransformed(
+        const ngraph::Shape& subtractShape,
+        const ngraph::Shape& multiplyShape,
+        const ngraph::Shape& inputShape,
+        const ngraph::Shape& outputShape);
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/split.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/split.hpp
@ -0,0 +1,39 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <vector>
+
+#include "layer_transformation.hpp"
+#include "ngraph/node.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API SplitTransformation : public LayerTransformation {
+public:
+    SplitTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    void updateOutputs(
+        TransformationContext& context,
+        std::vector<std::shared_ptr<ngraph::Node>> lastNodes,
+        std::shared_ptr<ngraph::Node> originalNode) const;
+protected:
+    ngraph::Shape getConstSplitShape(
+        const std::vector<size_t>& constSplitLengths,
+        const ngraph::Shape& constShape, const size_t axis,
+        const size_t idx) const;
+    virtual std::vector<size_t> getConstSplitLengths(
+        const OutputVector& inputs,
+        const ngraph::Shape& constShape,
+        const size_t outputSize) const;
+};
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/squeeze.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/squeeze.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API SqueezeTransformation : public LayerTransformation {
+public:
+    SqueezeTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/subtract.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/subtract.hpp
@ -0,0 +1,24 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API SubtractTransformation : public LayerTransformation {
+public:
+    SubtractTransformation(const Params& params) : LayerTransformation(params) {}
+    ~SubtractTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/subtract_multiply_to_multiply_add.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/subtract_multiply_to_multiply_add.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+#include "transformations/low_precision/eltwise_base_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API SubtractMultiplyToMultiplyAddTransformation : public LayerTransformation {
+public:
+    SubtractMultiplyToMultiplyAddTransformation(const Params& params) : LayerTransformation(params) {}
+    ~SubtractMultiplyToMultiplyAddTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/transformation_context.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/transformation_context.hpp
@ -0,0 +1,35 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <string>
+#include <unordered_set>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/quantization_details.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API TransformationContext {
+public:
+    explicit TransformationContext(std::shared_ptr<Function> function);
+    std::shared_ptr<Function> function;
+
+    // Used to store handled FakeQuantize operations.
+    // ConcatTransformation and FakeQuantizeTransformation handle FakeQuantize operations. ConcatTransformation handles FakeQuantize operation first.
+    // If updatePrecision transformation option is set to False then there are no FakeQuantize operation attributes to identify that the operation
+    // have been handled by ConcatTransformation already:
+    //   - output precision is original (FP32),
+    //   - intervals are changed but not equal to precision boundaries,
+    //   - quantization level can be or can be not changed.
+    // To avoid FakeQuantize operation double handling by FakeQuantizeTransformation after ConcatTransformation, FakeQuantizeTransformation
+    // has to use this member.
+    std::unordered_set<std::string> quantizedFakeQuantizeNames;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/transformer.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/transformer.hpp
@ -0,0 +1,214 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <algorithm>
+#include <map>
+#include <memory>
+#include <string>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph_ops/type_relaxed.hpp>
+
+#include "layer_transformation.hpp"
+#include "iparams_manager.hpp"
+#include "ilayer_transformations_manager.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+struct StandaloneCleanup {
+    std::string typeName;
+    std::string typeId;
+    LayerTransformationPtr transformation;
+};
+
+class TRANSFORMATIONS_API LowPrecisionTransformations {
+public:
+    LowPrecisionTransformations() {}
+    LowPrecisionTransformations(
+        const std::map<std::string, LayerTransformationPtr>& branchSpecificTransformations,
+        const std::map<std::string, LayerTransformationPtr>& transformations,
+        const std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& cleanupTransformations,
+        const std::vector<StandaloneCleanup>& standaloneCleanupTransformations);
+
+    void setUpdatePrecisions(const bool updatePrecisions);
+    void setQuantizedTensorAlignmentOnActivations(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
+    void setQuantizedTensorAlignmentOnWeights(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
+    LowPrecisionTransformations& remove(const std::string& operationType);
+    LowPrecisionTransformations& removeBranchSpecificTransformations(const std::string& operationType);
+    LowPrecisionTransformations& removeTransformations(const std::string& operationType);
+    LowPrecisionTransformations& removeCleanupTransformations(const std::string& operationType);
+
+    /**
+     * Add branch specific transformation. Transformation type and operation type are required.
+     * Operation type is used to find transformation by operation during precision definition.
+     */
+    template <class Transformation, class Operation>
+    LowPrecisionTransformations& addBranchSpecific(const LayerTransformation::Params& params) {
+        const std::string typeName = getType<Operation>();
+        const auto it = branchSpecificTransformations.find(typeName);
+        if (it != branchSpecificTransformations.end()) {
+            branchSpecificTransformations.erase(it);
+        }
+
+        branchSpecificTransformations.emplace(typeName, std::make_shared<Transformation>(params));
+        return *this;
+    }
+
+    /**
+     * Add transformation. Transformation type and operation type are required.
+     * Operation type is used to find transformation by operation during precision definition.
+     */
+    template <class Transformation, class Operation>
+    LowPrecisionTransformations& add(const LayerTransformation::Params& params) {
+        const std::string typeName = getType<Operation>();
+        const auto it = transformations.find(typeName);
+        if (it != transformations.end()) {
+            transformations.erase(it);
+        }
+
+        transformations.emplace(typeName, std::make_shared<Transformation>(params));
+        return *this;
+    }
+
+    /**
+     * Add cleanup transformation. Transformation type and operation type are required.
+     * Operation type is used to find transformation by operation during precision definition.
+     */
+    template <class Transformation, class Operation>
+    LowPrecisionTransformations& addCleanup(const LayerTransformation::Params& params) {
+        const std::string typeName = getType<Operation>();
+        const std::string typeId = typeid(Transformation).name();
+        const auto it = cleanupTransformations.find(typeName);
+        if (it == cleanupTransformations.end()) {
+            cleanupTransformations.emplace(typeName,
+                std::vector<std::pair<std::string, LayerTransformationPtr>>{ std::make_pair(typeId, std::make_shared<Transformation>(params)) });
+        } else {
+            const auto it1 = std::find_if(it->second.begin(), it->second.end(),
+                [&](const std::pair<std::string, LayerTransformationPtr>& transformation) {
+                    return transformation.first == typeName;
+                });
+            if (it1 != it->second.end()) {
+                it->second.erase(it1);
+            }
+            it->second.emplace_back(std::make_pair(typeId, std::make_shared<Transformation>(params)));
+        }
+        return *this;
+    }
+
+    /**
+     * Add cleanup transformation. Transformation type and operation type are required.
+     * Operation type is used to find transformation by operation during precision definition.
+     */
+    template <class Transformation, class Operation>
+    LowPrecisionTransformations& addStandaloneCleanup(const LayerTransformation::Params& params) {
+        const std::string typeName = getType<Operation>();
+        const std::string typeId = typeid(Transformation).name();
+        const auto it = std::find_if(standaloneCleanupTransformations.begin(), standaloneCleanupTransformations.end(),
+            [&](const StandaloneCleanup& transformation) {
+                return transformation.typeName == typeName && transformation.typeId == typeId;
+            });
+        if (it == standaloneCleanupTransformations.end()) {
+            standaloneCleanupTransformations.emplace_back(StandaloneCleanup{ typeName, typeId, std::make_shared<Transformation>(params) });
+        } else {
+            *it = { typeName, typeId, std::make_shared<Transformation>(params) };
+        }
+
+        return *this;
+    }
+
+    template <class Operation>
+    static std::string getType() {
+        return Operation::get_type_info_static().name;
+    }
+
+    static std::string getType(const Node& operation) {
+        return operation.get_type_name();
+    }
+
+    std::vector<LayerTransformationPtr> find(const std::string& transformationName) const;
+
+    template <class Operation>
+    std::vector<LayerTransformationPtr> find() const {
+        const std::string transformationKey = getType<Operation>();
+        return find(transformationKey);
+    }
+
+    void setParamsManager(IParamsManager* paramsManager) noexcept;
+    void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
+
+    // Key is not a layer type, but just a name of transformation
+    // Layer type (or a pattern) is defined by transformation itself as an ngraph matcher
+    std::map<std::string, LayerTransformationPtr> branchSpecificTransformations;
+    std::map<std::string, LayerTransformationPtr> transformations;
+    std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> cleanupTransformations;
+    std::vector<StandaloneCleanup> standaloneCleanupTransformations;
+
+private:
+    static void setParamsManager(IParamsManager* paramsManager, std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
+    static void setParamsManager(
+        IParamsManager* paramsManager,
+        std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
+    static void setParamsManager(IParamsManager* paramsManager, std::vector<StandaloneCleanup>& transformations) noexcept;
+    static void setLayerTransformationsManager(
+        ILayerTransformationsManager* layerTransformationsManager,
+        std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
+    static void setLayerTransformationsManager(
+        ILayerTransformationsManager* layerTransformationsManager,
+        std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
+    static void setLayerTransformationsManager(
+        ILayerTransformationsManager* layerTransformationsManager,
+        std::vector<StandaloneCleanup>& transformations) noexcept;
+};
+
+/**
+ * @brief low precision transformation component.
+  */
+class TRANSFORMATIONS_API LowPrecisionTransformer : public IParamsManager, ILayerTransformationsManager {
+public:
+    static LowPrecisionTransformations getAllTransformations(const LayerTransformation::Params& params = LayerTransformation::Params());
+
+    static bool isFunctionQuantized(const std::shared_ptr<Function>& function);
+
+    LowPrecisionTransformer();
+    LowPrecisionTransformer(const LowPrecisionTransformations& transformations);
+    void transform(std::shared_ptr<Function> network);
+
+    // IParamsManager interface implementation
+    std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept override;
+
+    // ILayerTransformationsManager interface implementation
+    bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept override;
+    bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept override;
+
+private:
+    LowPrecisionTransformations transformations;
+
+    void registerAllMatchers(
+        std::map<std::string, LayerTransformationPtr> transformations,
+        GraphRewrite& pass,
+        TransformationContext& context);
+
+    void registerAllMatchers(
+        std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> transformations,
+        GraphRewrite& pass,
+        TransformationContext& context);
+
+    std::vector<element::Type> precisionIntersection(
+        const std::vector<element::Type>& v1,
+        const std::vector<element::Type>& v2) const noexcept;
+};
+
+class TRANSFORMATIONS_API TypeRelaxedReplacer : public GraphRewrite {
+public:
+    TypeRelaxedReplacer();
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/transparent_base_transformation.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/transparent_base_transformation.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API TransparentBaseTransformation : public LayerTransformation {
+public:
+    TransparentBaseTransformation(const Params& params) : LayerTransformation(params) {}
+    ~TransparentBaseTransformation() override {};
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/transpose.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/transpose.hpp
@ -0,0 +1,27 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API TransposeTransformation : public LayerTransformation {
+public:
+    TransposeTransformation(const Params& params) : LayerTransformation(params) {}
+    ~TransposeTransformation() override {}
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/unsqueeze.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/unsqueeze.hpp
@ -0,0 +1,25 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <ngraph/ngraph.hpp>
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API UnsqueezeTransformation : public LayerTransformation {
+public:
+    UnsqueezeTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+    bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/variadic_split.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/variadic_split.hpp
@ -0,0 +1,28 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <vector>
+
+#include "split.hpp"
+#include "ngraph/node.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API VariadicSplitTransformation : public SplitTransformation {
+public:
+    VariadicSplitTransformation(const Params& params);
+    void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
+protected:
+    std::vector<size_t> getConstSplitLengths(
+        const OutputVector& inputs,
+        const ngraph::Shape& constShape,
+        const size_t outputSize) const override;
+};
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/low_precision/weightable_layer_transformation.hpp
+++ b/inference-engine/src/transformations/include/transformations/low_precision/weightable_layer_transformation.hpp
@ -0,0 +1,34 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformation_context.hpp"
+#include "layer_transformation.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+class TRANSFORMATIONS_API WeightableLayerTransformation : public LayerTransformation{
+public:
+    WeightableLayerTransformation(const Params& params);
+    bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
+    bool isQuantized(std::shared_ptr<Node> layer, bool isReshape) const noexcept;
+    bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
+
+protected:
+    DataPrecision decomposeFakeQuantizeForWeightsPath(std::shared_ptr<Node> weightableLayer) const;
+    static bool isGroup(const std::shared_ptr<Node>& node);
+    static bool isDepthwise(const std::shared_ptr<Node>& node);
+
+    std::shared_ptr<opset1::FakeQuantize> getFakeQuantizeOnWeights(const std::shared_ptr<Node>& node) const;
+    DataPrecision getDataPrecisionOnWeights(const std::shared_ptr<Node>& node) const;
+};
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/include/transformations/rt_info/dequantization_attribute.hpp
+++ b/inference-engine/src/transformations/include/transformations/rt_info/dequantization_attribute.hpp
@ -0,0 +1,75 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+/**
+ * @brief Defines fused names attribute
+ * @file fused_names_attribute.hpp
+ */
+
+#include <assert.h>
+#include <functional>
+#include <memory>
+#include <string>
+#include <set>
+
+#include <ngraph/node.hpp>
+#include <ngraph/variant.hpp>
+#include <transformations_visibility.hpp>
+
+
+namespace ngraph {
+
+/**
+ * @ingroup ie_runtime_attr_api
+ * @brief Dequantization class represents runtime info attribute that indicates
+ * whether the operation is dequantization
+ */
+class TRANSFORMATIONS_API DequantizationAttr {
+private:
+    std::string dequantization_attribute;
+
+public:
+    /**
+     * A default constructor
+     */
+    DequantizationAttr() = default;
+
+    /**
+     * @brief      Constructs a new object consisting of a single name     *
+     * @param[in]  name  The name
+     */
+    explicit DequantizationAttr(const std::string& name) : dequantization_attribute(name) {}
+
+    /**
+     * @brief return string with dequantization value
+     */
+    std::string getDequantizationAttr() const;
+};
+
+extern template class TRANSFORMATIONS_API VariantImpl<DequantizationAttr>;
+
+template<>
+class TRANSFORMATIONS_API VariantWrapper<DequantizationAttr> : public VariantImpl<DequantizationAttr> {
+public:
+    static constexpr VariantTypeInfo type_info{"DEQUANTIZATION", 0};
+
+    const VariantTypeInfo &get_type_info() const override {
+        return type_info;
+    }
+
+    VariantWrapper(const value_type &value) : VariantImpl<value_type>(value) {}
+
+    std::shared_ptr<ngraph::Variant> merge(const ngraph::NodeVector & nodes) override;
+
+    std::shared_ptr<ngraph::Variant> init(const std::shared_ptr<ngraph::Node> & node) override;
+};
+
+/**
+ * @ingroup ie_runtime_attr_api
+ * @brief getPrimitivesPriority return string with dequantization value
+ * @param[in] node The node will be used to get Dequantization attribute
+ */
+TRANSFORMATIONS_API std::string getDequantization(const std::shared_ptr<ngraph::Node>& node);
+
+}  // namespace ngraph
--- a/inference-engine/src/transformations/src/ngraph_ops/convolution_ie.cpp
+++ b/inference-engine/src/transformations/src/ngraph_ops/convolution_ie.cpp
@ -22,6 +22,7 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
                                 const Strides& dilations,
                                 const CoordinateDiff& pads_begin,
                                 const CoordinateDiff& pads_end,
+                                 const element::Type output_type,
                                 const size_t& group,
                                 const PadType& auto_pad)
        : Op({data_batch, filters})
@ -30,10 +31,53 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
        , m_pads_begin(pads_begin)
        , m_pads_end(pads_end)
        , m_auto_pad(auto_pad)
-        , m_group(group) {
+        , m_group(group)
+        , m_output_type(output_type) {
    constructor_validate_and_infer_types();
 }

+op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
+                                 const Output<Node>& filters,
+                                 const Output<Node>& bias,
+                                 const Strides& strides,
+                                 const Strides& dilations,
+                                 const CoordinateDiff& pads_begin,
+                                 const CoordinateDiff& pads_end,
+                                 const element::Type output_type,
+                                 const size_t& group,
+                                 const PadType& auto_pad)
+        : Op({data_batch, filters, bias})
+        , m_strides(strides)
+        , m_dilations(dilations)
+        , m_pads_begin(pads_begin)
+        , m_pads_end(pads_end)
+        , m_auto_pad(auto_pad)
+        , m_group(group)
+        , m_output_type(output_type) {
+    constructor_validate_and_infer_types();
+}
+
+// KMB compilation support
+op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
+                                 const Output<Node>& filters,
+                                 const Strides& strides,
+                                 const Strides& dilations,
+                                 const CoordinateDiff& pads_begin,
+                                 const CoordinateDiff& pads_end,
+                                 const size_t& group,
+                                 const PadType& auto_pad)
+        : Op({data_batch, filters})
+        , m_strides(strides)
+        , m_dilations(dilations)
+        , m_pads_begin(pads_begin)
+        , m_pads_end(pads_end)
+        , m_auto_pad(auto_pad)
+        , m_group(group)
+        , m_output_type(element::undefined) {
+    constructor_validate_and_infer_types();
+}
+
+// KMB compilation support
 op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
                                 const Output<Node>& filters,
                                 const Output<Node>& bias,
@ -49,7 +93,8 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
        , m_pads_begin(pads_begin)
        , m_pads_end(pads_end)
        , m_auto_pad(auto_pad)
-        , m_group(group) {
+        , m_group(group)
+        , m_output_type(element::undefined) {
    constructor_validate_and_infer_types();
 }

@ -59,23 +104,12 @@ void op::ConvolutionIE::validate_and_infer_types() {
    PartialShape filters_shape = get_input_partial_shape(1);
    element::Type filters_et = get_input_element_type(1);

-    element::Type result_et;
-
-    NODE_VALIDATION_CHECK(
-        this,
-        element::Type::merge(result_et, data_batch_et, filters_et),
-        "Element types for data batch and filters do not match (data batch element type: ",
-        data_batch_et,
-        ", filters element type: ",
-        filters_et,
-        ").");
-
    PartialShape result_shape{PartialShape::dynamic()};

    // In case if number of groups greater than 1 and channel dimension is dynamic we can't calculate output shape
    if (m_group > 1) {
        if (data_batch_shape.rank().is_dynamic() || data_batch_shape[1].is_dynamic()) {
-            set_output_type(0, result_et, result_shape);
+            set_output_type(0, m_output_type, result_shape);
            return;
        } else {
            // Update channel dimension according to groups count
@ -109,7 +143,7 @@ void op::ConvolutionIE::validate_and_infer_types() {
                                             m_strides,
                                             m_dilations);

-    set_output_type(0, result_et, result_shape);
+    set_output_type(0, m_output_type, result_shape);
 }

 shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVector & new_args) const {
@ -120,6 +154,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
                                          m_dilations,
                                          m_pads_begin,
                                          m_pads_end,
+                                          m_output_type,
                                          m_group,
                                          m_auto_pad);
    } else if (new_args.size() == 3) {
@ -130,6 +165,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
                                          m_dilations,
                                          m_pads_begin,
                                          m_pads_end,
+                                          m_output_type,
                                          m_group,
                                          m_auto_pad);
    }
--- a/inference-engine/src/transformations/src/transformations/common_optimizations/conv_bias_fusion.cpp
+++ b/inference-engine/src/transformations/src/transformations/common_optimizations/conv_bias_fusion.cpp
@ -36,6 +36,32 @@ std::pair<std::shared_ptr<A>, std::shared_ptr<B>> parse_eltwise_inputs(std::shar
    return {eltwise, constant};
 }

+template <class Conv>
+bool IsConvInLowPrecision(const std::shared_ptr<Conv>& conv) {
+    if (!ngraph::is_type<ngraph::op::ConvolutionIE>(conv)) {
+        return false;
+    }
+
+    auto isLowPrecision = [](const std::shared_ptr<ngraph::Node>& node, const size_t index) {
+        const ngraph::element::Type inputType = node->get_input_element_type(index);
+        return (inputType == ngraph::element::i8) || (inputType == ngraph::element::u8);
+    };
+
+    // Convolution operation has to be executed in INT8 if ...
+    if (isLowPrecision(conv, 0) && isLowPrecision(conv, 1)) {
+        // ... INT8 on activations && INT8 on weights
+        return true;
+    }
+
+    const std::shared_ptr<ngraph::opset1::Subtract> subtract = ngraph::as_type_ptr<ngraph::opset1::Subtract>(conv->get_input_node_shared_ptr(0));
+    if (subtract == nullptr) {
+        return false;
+    }
+
+    // ... INT8 on activations with asymmetric quantization && INT8 on weights
+    return isLowPrecision(subtract, 0) && isLowPrecision(subtract, 1) && isLowPrecision(conv, 1);
+}
+
 template <class Conv>
 ngraph::graph_rewrite_callback get_callback() {
    ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) {
@ -95,7 +121,8 @@ ngraph::graph_rewrite_callback get_callback() {
                new_bias = std::make_shared<ngraph::opset1::Add>(final_const, m_conv->input_value(2));
            }
            new_conv = m_conv->clone_with_new_inputs({m_conv->input_value(0), m_conv->input_value(1), new_bias});
-        } else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise)) {
+        } else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise) &&
+                !IsConvInLowPrecision(m_conv)) {
            // Fuse: ConvolutionIE->Mul
            auto weights_shape = m_conv->input(1).get_shape();

--- a/inference-engine/src/transformations/src/transformations/common_optimizations/lin_op_sequence_fusion.cpp
+++ b/inference-engine/src/transformations/src/transformations/common_optimizations/lin_op_sequence_fusion.cpp
@ -44,10 +44,18 @@ ngraph::pass::AddMultiplyFusion::AddMultiplyFusion() {
        auto mul = label_to_output[m_mul].get_node_shared_ptr();
        auto add = label_to_output[m_add].get_node_shared_ptr();

+        if (m_transformation_callback(mul)) {
+            return false;
+        }
+
        Output<Node> input = label_to_output[m_data];
        Output<Node> mul_const = label_to_output[m_mul_constant];
        Output<Node> add_const = label_to_output[m_add_constant];

+        if ((input.get_element_type() != mul_const.get_element_type()) || (add_const.get_element_type() != mul_const.get_element_type())) {
+            return false;
+        }
+
        // Replace Add->Multiply with Multiply->Add
        // As new Multiply can be fused with operation above it we add this Multiply
        // to the list of operations that will be used in additional matching.
--- a/inference-engine/src/transformations/src/transformations/convert_precision.cpp
+++ b/inference-engine/src/transformations/src/transformations/convert_precision.cpp
@ -161,6 +161,7 @@ bool ngraph::pass::ConvertPrecision::run_on_function(std::shared_ptr<ngraph::Fun
        // If output type mismatch given type we try to fuse type into this operation
        // otherwise we insert Convert operation.
        for (auto &node : f->get_ordered_ops()) {
+            m_transformation_callback(node);
            // Recursively apply transformation for sub-graph based operations
            if (auto sub_graph_node = std::dynamic_pointer_cast<op::util::SubGraphOp>(node)) {
                if (auto sub_graph = sub_graph_node->get_function()) {
--- a/inference-engine/src/transformations/src/transformations/low_precision/add.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/add.cpp
@ -0,0 +1,203 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/add.hpp"
+
+#include <algorithm>
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+#include "ngraph_ops/type_relaxed.hpp"
+
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+#include "transformations/low_precision/common/dequantization_op.hpp"
+#include "transformations/low_precision/network_helper.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+std::shared_ptr<opset1::Subtract> replaceToSubtract(const std::shared_ptr<Node>& op) {
+    // TODO: separate this part to standalone transformation: AddToSubtractTransformation
+    // motivation:
+    //    - single responsibility
+    //    - keep AddTransformation and AddToSubtractTransformation transformations independent and optional
+    const auto add = as_type_ptr<opset1::Add>(op);
+    if (add == nullptr) {
+        return nullptr;
+    }
+
+    // TODO: use general way from getDequantization: is eltwise with Constant
+    const int constBranchIndex = is_type<opset1::Constant>(add->get_input_node_ptr(0)) ?
+        0 :
+        (is_type<opset1::Constant>(add->get_input_node_ptr(1)) ? 1 : -1);
+    if (constBranchIndex == -1) {
+        return nullptr;
+    }
+    const size_t dataBranchIndex = constBranchIndex == 0 ? 1ul : 0;
+
+    const auto parent = add->get_input_node_shared_ptr(dataBranchIndex);
+    if (is_type<opset1::Convolution>(parent) ||
+        is_type<opset1::GroupConvolution>(parent) ||
+        (is_type<opset1::MatMul>(parent) &&
+        (is_type<opset1::Constant>(parent->get_input_node_ptr(0)) || is_type<opset1::Constant>(parent->get_input_node_ptr(1))))) {
+        return nullptr;
+    }
+
+    auto constant = fold<opset1::Negative>(add->get_input_node_shared_ptr(constBranchIndex));
+    auto constOutput = constant->output(0);
+
+    const auto subtract = std::make_shared<DequantizationSubtract>(
+        add->get_input_node_shared_ptr(dataBranchIndex),
+        constOutput,
+        add->get_autob());
+    NetworkHelper::copyInfo(add, subtract);
+
+    replace_node(add, subtract);
+    return subtract;
+}
+
+std::shared_ptr<opset1::Subtract> fuseWithSubtract(const std::shared_ptr<Node>& op) {
+    const auto add = as_type_ptr<opset1::Add>(op);
+    if ((add == nullptr) ||
+        !is_type<opset1::Subtract>(add->get_input_node_shared_ptr(0)) ||
+        // TODO: use general way from getDequantization: is eltwise with Constant
+        !is_type<opset1::Constant>(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1))) {
+        return nullptr;
+    }
+
+    const auto newSubConst = fold<opset1::Subtract>(
+        add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1),
+        add->get_input_node_shared_ptr(1));
+
+    const auto newSubtract = std::make_shared<op::TypeRelaxed<DequantizationSubtract>>(
+        std::vector<element::Type>{element::f32, element::f32},
+        std::vector<element::Type>{ element::f32 },
+        ngraph::op::TemporaryReplaceOutputType(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(0), element::f32).get(),
+        ngraph::op::TemporaryReplaceOutputType(newSubConst, element::f32).get());
+    NetworkHelper::copyInfo(add, newSubtract);
+
+    replace_node(add, newSubtract);
+    return newSubtract;
+}
+
+void AddTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
+    addSingleNodePattern<opset1::Add>(pass, context);
+}
+
+bool AddTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
+    std::shared_ptr<opset1::Add> op = as_type_ptr<opset1::Add>(m.get_match_root());
+    if (!canBeTransformed(context, op)) {
+        return false;
+    }
+
+    std::shared_ptr<Node> addNode = separateInStandaloneBranch(op);
+    std::shared_ptr<opset1::Add> add = as_type_ptr<opset1::Add>(addNode);
+
+    const int fullPathIndex = getNotEmpty(add);
+    std::shared_ptr<Node> newMultiply;
+    std::shared_ptr<Node> newAddOrSubtract;
+
+    if (fullPathIndex == -1) {
+        // swap constant multiply and add and possibly fuse to subtract
+        const auto multiplyBranch = getMultiplyConstBranch(add);
+
+        if (multiplyBranch.first == -1) {
+            NetworkHelper::foldDequantization(addNode, 0);
+            NetworkHelper::foldDequantization(addNode, 1);
+            return false;
+        }
+
+        newMultiply = NetworkHelper::swapMultiplyAndAdd(add, multiplyBranch.first);
+
+        if (is_type<opset1::Add>(newMultiply->get_input_node_shared_ptr(0))) {
+            newAddOrSubtract = newMultiply->get_input_node_shared_ptr(0);
+
+            auto subtract = fuseWithSubtract(newAddOrSubtract);
+            if (subtract != nullptr) {
+                newAddOrSubtract = subtract;
+            }
+
+            subtract = replaceToSubtract(newAddOrSubtract);
+            if (subtract != nullptr) {
+                newAddOrSubtract = subtract;
+            }
+        } else {
+            newAddOrSubtract = newMultiply;
+        }
+    } else {
+        // dequantizations are on both branches
+        const int emptyPathIndex = fullPathIndex == 0 ? 1 : 0;
+
+        FakeQuantizeDequantization dequantizationEmptyPath = NetworkHelper::getDequantization(add, emptyPathIndex);
+        if (updatePrecisions && !dequantizationEmptyPath.empty() && !dequantizationEmptyPath.isLowPrecision()) {
+            return false;
+        }
+
+        std::shared_ptr<Node> subtractEmptyPathValues;
+        std::shared_ptr<Node> multiplyEmptyPathValues;
+        std::tie(subtractEmptyPathValues, multiplyEmptyPathValues) = NetworkHelper::createEmptyValues(dequantizationEmptyPath);
+
+        FakeQuantizeDequantization dequantizationFullPath = NetworkHelper::getDequantization(add, fullPathIndex);
+        if (updatePrecisions && !dequantizationFullPath.empty() && !dequantizationFullPath.isLowPrecision()) {
+            return false;
+        }
+
+        std::shared_ptr<Node> subtractFullPathValues;
+        std::shared_ptr<Node> multiplyFullPathValues;
+        std::tie(subtractFullPathValues, multiplyFullPathValues) = NetworkHelper::createEmptyValues(dequantizationFullPath);
+
+        // calculation
+        // before: Y = (SC1 * (X1 - SH1)) + (SC2 * (X2 - SH2))
+        // after : Y = SC2 * ( SC1' * (X1 - SH1') + X2 ) , where :
+        //         SC1' = SC1 / SC2
+        //         SH1' = SH1 + SC2 * SH2 / SC1
+        std::shared_ptr<Node> newSubtractFullPathValues = fold<opset1::Add>(
+            subtractFullPathValues,
+            fold<opset1::Divide>(
+                fold<opset1::Multiply>(subtractEmptyPathValues, multiplyEmptyPathValues),
+                multiplyFullPathValues));
+
+        std::shared_ptr<Node> newMultiplyFullPathValues = fold<opset1::Divide>(multiplyFullPathValues, multiplyEmptyPathValues);
+
+        if (NetworkHelper::isZeroConst(newSubtractFullPathValues)) {
+            newSubtractFullPathValues = nullptr;
+        }
+
+        // graph update
+        std::vector<std::shared_ptr<Node>> inputs{ {}, {} };
+        auto fullPathInput = dequantizationFullPath.convert == nullptr ? dequantizationFullPath.data : dequantizationFullPath.convert;
+
+        inputs[emptyPathIndex] = dequantizationEmptyPath.data.get_node_shared_ptr();
+        inputs[fullPathIndex] = std::make_shared<DequantizationMultiply>(
+            newSubtractFullPathValues == nullptr ?
+                fullPathInput :
+                std::make_shared<DequantizationSubtract>(fullPathInput, newSubtractFullPathValues),
+            newMultiplyFullPathValues);
+
+        newAddOrSubtract = std::make_shared<op::TypeRelaxed<opset1::Add>>(
+            std::vector<element::Type>{element::f32, element::f32}, std::vector<element::Type>{ element::f32 },
+            ngraph::op::TemporaryReplaceOutputType(inputs[0], element::f32).get(),
+            ngraph::op::TemporaryReplaceOutputType(inputs[1], element::f32).get());
+        newMultiply = std::make_shared<DequantizationMultiply>(newAddOrSubtract, multiplyEmptyPathValues);
+
+        replace_node(add, newMultiply);
+        NetworkHelper::copyInfo(add, newAddOrSubtract);
+    }
+
+    updateOutput(context, newMultiply, newAddOrSubtract);
+
+    if (fullPathIndex != -1) {
+        std::shared_ptr<Node> node = add;
+        NetworkHelper::foldDequantization(node, fullPathIndex);
+    }
+
+    return true;
+}
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/avg_pool.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/avg_pool.cpp
@ -0,0 +1,80 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/avg_pool.hpp"
+
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+#include "transformations/low_precision/network_helper.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+AvgPoolTransformation::AvgPoolTransformation(const Params& params) : LayerTransformation(params) {
+}
+
+void AvgPoolTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
+    addPattern(
+        pass,
+        context,
+        make_op_pattern<opset1::AvgPool>({ make_op_label<opset1::Multiply>() }));
+}
+
+bool AvgPoolTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
+    if (!canBeTransformed(context, m.get_match_root())) {
+        return false;
+    }
+
+    const std::shared_ptr<Node> pooling = separateInStandaloneBranch(m.get_match_root());
+
+    const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(pooling);
+
+    bool updatePrecision;
+    // issue #40768
+    if ((children.size() == 1ul) && (!this->layerTransformationsManager->isQuantized(children[0]))) {
+        updatePrecision = false;
+    } else {
+        updatePrecision = false;
+        // NOTE: This check was added for models that don't have FQ after AvgPool
+        //       They will have transparent precision as it was in old LPT.
+        for (const auto& child : children) {
+            if (!is_type<opset1::FakeQuantize>(child)) {
+                updatePrecision = true;
+                break;
+            }
+        }
+    }
+
+    moveDequantizationAfter(context, pooling, NetworkHelper::getDequantization(pooling), updatePrecision);
+    return true;
+}
+
+bool AvgPoolTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> operation) const {
+    if (!LayerTransformation::canBeTransformed(context, operation)) {
+        return false;
+    }
+
+    auto dequantization = NetworkHelper::getDequantization(operation);
+
+    return !!dequantization.multiply;
+}
+
+bool AvgPoolTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
+    const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(layer);
+    // NOTE: This check was added for models that don't have FQ after AvgPool
+    //       They will have transparent precision as it was in old LPT.
+    for (const auto& child : children) {
+        if (!is_type<opset1::FakeQuantize>(child)) {
+            return true;
+        }
+    }
+    return false;
+}
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/clamp.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/clamp.cpp
@ -0,0 +1,97 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/clamp.hpp"
+#include <algorithm>
+#include <memory>
+#include <ngraph/ngraph.hpp>
+#include "transformations/low_precision/network_helper.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+ClampTransformation::ClampTransformation(const Params& params) : LayerTransformation(params) {}
+
+void ClampTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
+    addPattern(pass,
+               context,
+               make_op_pattern<opset1::Clamp>({ make_op_label<opset1::Multiply>() }));
+}
+
+bool ClampTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher& m) const {
+    auto subWithTheSameValues = [](std::shared_ptr<ngraph::opset1::Subtract> sub) {
+        if (sub == nullptr) {
+            return false;
+        }
+        const auto constant = as_type_ptr<ngraph::opset1::Constant>(sub->get_input_node_shared_ptr(1));
+
+        if (constant == nullptr) {
+            return false;
+        }
+
+        return NetworkHelper::isScalarLike(constant);
+    };
+
+    if (!canBeTransformed(context, m.get_match_root())) {
+        return false;
+    }
+
+    const std::shared_ptr<Node> clamp = separateInStandaloneBranch(m.get_match_root());
+    const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(clamp);
+
+    const bool moveSubtract = subWithTheSameValues(dequantization.subtract);
+    if (!moveSubtract && !canSubtractBeHandled(clamp, dequantization)) {
+        return false;
+    }
+    const auto newClamp = as_type_ptr<opset1::Clamp>(moveDequantizationAfter(context, clamp, dequantization, false, moveSubtract));
+    double min = newClamp->get_min();
+    double max = newClamp->get_max();
+
+    if (dequantization.multiply != nullptr) {
+        double scale = as_type_ptr<opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
+        if (scale < 0.0) {
+            std::swap(min, max);
+        }
+        min /= scale;
+        max /= scale;
+    }
+
+    if (dequantization.subtract != nullptr && moveSubtract) {
+        double shift = as_type_ptr<opset1::Constant>(dequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
+        min += shift;
+        max += shift;
+    }
+
+    const std::shared_ptr<ngraph::opset1::Clamp> replacement = std::make_shared<ngraph::opset1::Clamp>(newClamp->get_input_node_shared_ptr(0), min, max);
+    replace_node(newClamp, replacement);
+
+    element::Type outputClampType = dequantization.multiply ?
+        dequantization.multiply->get_output_element_type(0) :
+        dequantization.subtract->get_output_element_type(0);
+    ngraph::pass::low_precision::NetworkHelper::setOutDataPrecision(replacement, outputClampType);
+    return true;
+}
+
+bool ClampTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const {
+    if (!LayerTransformation::canBeTransformed(context, op)) {
+        return false;
+    }
+    const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(op);
+
+    const auto mulConst = as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1));
+    if (mulConst == nullptr) {
+        return false;
+    }
+
+    return NetworkHelper::isScalarLike(mulConst);
+}
+
+bool ClampTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
+    return false;
+}
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/common/fake_quantize_dequantization.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/common/fake_quantize_dequantization.cpp
@ -0,0 +1,103 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
+#include <memory>
+#include <ngraph/opsets/opset1.hpp>
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+FakeQuantizeDequantization::FakeQuantizeDequantization() {}
+
+FakeQuantizeDequantization::FakeQuantizeDequantization(
+    Output<Node> data,
+    std::shared_ptr<opset1::Convert> convert,
+    std::shared_ptr<opset1::Subtract> subtract,
+    std::shared_ptr<opset1::Multiply> multiply) :
+    data(data),
+    convert(convert),
+    subtract(subtract),
+    multiply(multiply) {
+}
+
+bool FakeQuantizeDequantization::empty() const {
+    return (convert == nullptr) && (subtract == nullptr) && (multiply == nullptr);
+}
+
+bool FakeQuantizeDequantization::isShared() const {
+    if ((convert != nullptr) && (convert->get_output_target_inputs(0).size() > 1ul)) {
+        return true;
+    }
+
+    if ((subtract != nullptr) && (subtract->get_output_target_inputs(0).size() > 1ul)) {
+        return true;
+    }
+
+    if ((multiply != nullptr) && (multiply->get_output_target_inputs(0).size() > 1ul)) {
+        return true;
+    }
+
+    return false;
+}
+
+bool FakeQuantizeDequantization::isLowPrecision() const {
+    return (data.get_element_type() == element::i8) || (data.get_element_type() == element::u8);
+}
+
+bool FakeQuantizeDequantization::checkElementwise(const std::shared_ptr<ngraph::Node>& dequantizationElementwise) {
+    const ngraph::PartialShape partialShape = dequantizationElementwise->get_input_partial_shape(0);
+    if (partialShape.is_dynamic()) {
+        return false;
+    }
+
+    std::shared_ptr<opset1::Constant> constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(1));
+    if (constant == nullptr) {
+        constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(0));
+    }
+    if (constant == nullptr) {
+        THROW_IE_LPT_EXCEPTION(*dequantizationElementwise) << "unexpected operation type " <<
+            dequantizationElementwise->get_type_info().name << " on the second branch";
+    }
+
+    const ngraph::Shape constShape = constant->get_output_shape(0);
+    if ((constShape.size() > 5ul)) {
+        return false;
+    }
+
+    if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
+        return true;
+    }
+
+    const ngraph::Shape shape = partialShape.to_shape();
+    if (constShape.size() == shape.size()) {
+        if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
+            return false;
+        }
+        for (size_t i = 2ul; i < constShape.size(); ++i) {
+            if (constShape[i] != 1ul) {
+                return false;
+            }
+        }
+    } else if (constShape.size() == (shape.size() - 1)) {
+        if (constShape[0] != shape[1]) {
+            return false;
+        }
+        for (size_t i = 1ul; i < constShape.size(); ++i) {
+            if (constShape[i] != 1ul) {
+                return false;
+            }
+        }
+    } else {
+        return false;
+    }
+
+    return true;
+}
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/common/subgraph.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/common/subgraph.cpp
@ -0,0 +1,179 @@
+// Copyright (C) 2018-2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <transformations/low_precision/common/subgraph.hpp>
+
+#include <algorithm>
+#include <memory>
+#include <string>
+#include <unordered_set>
+#include <utility>
+#include <vector>
+
+#include <ngraph/rt_info.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+#include "transformations/low_precision/quantization_details.hpp"
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+#include "transformations/low_precision/network_helper.hpp"
+
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+bool isQuantizationPerChannel(const std::shared_ptr<ngraph::Node>& node) {
+    if (node->outputs().size() > 1ul) {
+        return false;
+    }
+
+    const auto inputs = ngraph::pass::low_precision::NetworkHelper::getInputs(node);
+    for (const auto& input : inputs) {
+        if (ngraph::is_type<opset1::Constant>(input.get_node())) {
+            continue;
+        }
+
+        const Shape& in = input.get_shape();
+        const Shape& out = node->output(0).get_shape();
+        for (size_t i = 0; i < 2; ++i) {
+            if (in[i] != out[i]) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
+Subgraph::Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager) : layerTransformationsManager(layerTransformationsManager) {
+}
+
+bool Subgraph::fillSubgraphForQuantization(
+    const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize,
+    std::unordered_set<std::string>& handledLayers) {
+    quantizationLayers.push_back(fakeQuantize);
+    handledLayers.insert(fakeQuantize->get_friendly_name());
+    layers.emplace(fakeQuantize->get_friendly_name(), fakeQuantize);
+
+    for (size_t index = 0; index < fakeQuantize->get_output_size(); ++index) {
+        const auto childInputs = fakeQuantize->get_output_target_inputs(index);
+        for (const auto childInput : childInputs) {
+            const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
+            if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
+                continue;
+            }
+
+            const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
+            if (concatChild != nullptr) {
+                if (!fillSubgraphForConcat(concatChild, handledLayers)) {
+                    return false;
+                }
+            } else {
+                const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
+                if (fakeQuantizeChild != nullptr) {
+                    //
+                } else {
+                    if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
+                        if (!fillSubgraphForIntermediate(child, handledLayers)) {
+                            return false;
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    return true;
+}
+
+bool Subgraph::fill(const std::shared_ptr<ngraph::Node>& layer, std::unordered_set<std::string>& handledLayers) {
+    // if at least one parent is handled incorrectly then subgraph is not in low precision
+    for (size_t index = 0; index < layer->get_input_size(); ++index) {
+        const std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(index);
+        if (handledLayers.find(parent->get_friendly_name()) != handledLayers.end()) {
+            continue;
+        }
+
+        const std::shared_ptr<ngraph::opset1::Concat> concatParent = ngraph::as_type_ptr<ngraph::opset1::Concat>(parent);
+        if (concatParent != nullptr) {
+            if (!fillSubgraphForConcat(concatParent, handledLayers)) {
+                return false;
+            }
+        } else {
+            const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeParent = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
+            if (fakeQuantizeParent != nullptr) {
+                if (!fillSubgraphForQuantization(fakeQuantizeParent, handledLayers)) {
+                    //
+                }
+            } else {
+                const std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(parent);
+                if (constant != nullptr) {
+                    //
+                } else {
+                    if (layerTransformationsManager->isPrecisionPreserved(parent) && isQuantizationPerChannel(parent)) {
+                        if (!fillSubgraphForIntermediate(parent, handledLayers)) {
+                            return false;
+                        }
+                    } else {
+                        return false;
+                    }
+                }
+            }
+        }
+    }
+
+    // TODO: if at least one child was handled correctly then subgraph is low precision
+    for (size_t index = 0; index < layer->get_output_size(); ++index) {
+        const auto childInputs = layer->get_output_target_inputs(index);
+        for (const auto childInput : childInputs) {
+            const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
+
+            if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
+                continue;
+            }
+
+            const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
+            if (concatChild != nullptr) {
+                if (!fillSubgraphForConcat(concatChild, handledLayers)) {
+                    return false;
+                }
+            } else {
+                const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
+                if (fakeQuantizeChild != nullptr) {
+                    //
+                } else if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
+                    if (!fillSubgraphForIntermediate(child, handledLayers)) {
+                        return false;
+                    }
+                }
+            }
+        }
+    }
+
+    return true;
+}
+
+bool Subgraph::fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers) {
+    handledLayers.insert(intermediate->get_friendly_name());
+    layers.emplace(intermediate->get_friendly_name(), intermediate);
+
+    return fill(intermediate, handledLayers);
+}
+
+bool Subgraph::empty() const {
+    return quantizationLayers.empty();
+}
+
+bool Subgraph::fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers) {
+    concatLayers.push_back(concat);
+    handledLayers.insert(concat->get_friendly_name());
+    layers.emplace(concat->get_friendly_name(), concat);
+
+    std::shared_ptr<ngraph::Node> node = concat;
+    return fill(node, handledLayers);
+}
+
+}  // namespace low_precision
+}  // namespace pass
+}  // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/concat.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/concat.cpp
@ -0,0 +1,428 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/concat.hpp"
+
+#include <algorithm>
+#include <map>
+#include <memory>
+#include <string>
+#include <utility>
+#include <vector>
+
+#include <ngraph/opsets/opset1.hpp>
+
+#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+#include "transformations/low_precision/common/subgraph.hpp"
+#include "transformations/low_precision/common/dequantization_op.hpp"
+#include "transformations/low_precision/network_helper.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+void ConcatTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
+    addSingleNodePattern<opset1::Concat>(pass, context);
+}
+
+bool ConcatTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
+    std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
+    if (!canBeTransformed(context, concat)) {
+        return false;
+    }
+
+    ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
+    std::unordered_set<std::string> handledLayers;
+    if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
+        return false;
+    }
+
+    if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
+        return false;
+    }
+
+    // precisions can be different
+    ngraph::Node& quantizationLayer = *subgraph.quantizationLayers[0];
+    std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer.shared_from_this());
+    DataPrecision dataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
+    if (dataPrecision.precision == ngraph::element::undefined) {
+        return false;
+    }
+
+    std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
+    std::vector<QuantizationDetails> quantizationLayersDetails;
+
+    for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
+        const std::shared_ptr<ngraph::Node> fakeQuantizeLayer = subgraph.quantizationLayers[i];
+
+        const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
+        if (shape.size() < 4ul) {
+            return false;
+        }
+
+        const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
+        if (fq == nullptr) {
+            return false;
+        }
+
+        const QuantizationDetails& quantizationDetails = QuantizationDetails::getDetails(fq);
+        quantizationLayersDetails.push_back(quantizationDetails);
+
+        const DataPrecision dataPrecision2 = getDataPrecision(subgraph.quantizationLayers[i]->shared_from_this(), quantizationDetails, false);
+        if (dataPrecision2.precision == ngraph::element::undefined) {
+            return false;
+        }
+
+        if (dataPrecision.precision != dataPrecision2.precision) {
+            // quantization levels are the same, difference can be in sign
+            // wider interval (precision) is preferable: use signed if least one interval is signed
+            dataPrecision = dataPrecision.precision.is_signed() ? dataPrecision : dataPrecision2;
+        }
+    }
+
+    if (dataPrecision.precision == ngraph::element::undefined) {
+        return false;
+    }
+
+    // per tensor scale is supported only
+    if (quantizationLayersDetails.empty() || (quantizationLayersDetails[0].inputHighValues.size() != 1ul)) {
+        return false;
+    }
+
+    FakeQuantizeDequantization dequantization;
+
+    if ((quantizationLayersDetails[0].inputHighValues.size() == 1)) {
+        float outputLowValue = quantizationLayersDetails[0].outputLowValues[0];
+        float outputHighValue = quantizationLayersDetails[0].outputHighValues[0];
+
+        for (size_t index = 0lu; index < subgraph.quantizationLayers.size(); index++) {
+            const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
+            if (outputLowValue > quantizationDetails.outputLowValues[0]) {
+                outputLowValue = quantizationDetails.outputLowValues[0];
+            }
+            if (outputHighValue < quantizationDetails.outputHighValues[0]) {
+                outputHighValue = quantizationDetails.outputHighValues[0];
+            }
+        }
+
+        if ((outputLowValue == 0.f) && (outputHighValue == 0.f)) {
+            return false;
+        }
+
+        const float maxOutputInterval = outputHighValue - outputLowValue;
+        if (quantizedTensorAlignmentOnActivations == QuantizedTensorAlignment::UpdateLevel) {
+            const size_t minLevels = getMinQuantizationLevels(
+                dataPrecision,
+                maxOutputInterval,
+                quantizationLayersDetails,
+                outputLowValue,
+                outputHighValue);
+            if (minLevels < this->minQuantizationLevels) {
+                return false;
+            }
+        }
+
+        // FQ -> SUB_quantization -> MUL_quantization -[INT8]-> SUB_dequantization -> MUL_dequantization ->
+        const float quantizationMul = (dataPrecision.max - dataPrecision.min) / maxOutputInterval;
+        const float dequantizationMul = maxOutputInterval / (dataPrecision.max - dataPrecision.min);
+
+        // FQ outputLowValue = dataPrecision.min * dequantizationMul - quantizationSub
+        const float quantizationSub = outputLowValue - dataPrecision.min * dequantizationMul;
+        const float dequantizationSub = std::round(-quantizationSub * quantizationMul);
+
+        // 1. get data for dequantization. Dequantization data will be used several times later.
+        dequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
+            dequantizationMul,
+            dequantizationSub,
+            subgraph.quantizationLayers[0]->get_output_element_type(0),
+            subgraph.quantizationLayers[0]->get_output_shape(0),
+            dataPrecision.precision,
+            dataPrecision.min,
+            dataPrecision.max);
+
+        for (int index = 0; index < subgraph.quantizationLayers.size(); index++) {
+            std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeLayer = as_type_ptr<ngraph::opset1::FakeQuantize>(
+                subgraph.quantizationLayers[index]->shared_from_this());
+
+            const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
+
+            switch (quantizedTensorAlignmentOnActivations) {
+                case QuantizedTensorAlignment::None: {
+                    THROW_TRANSFORMATION_EXCEPTION << "not implemented: " << quantizedTensorAlignmentOnActivations;
+                }
+                case QuantizedTensorAlignment::UpdateLevel: {
+                    const float updatedOutputLowValue = (quantizationDetails.outputLowValues[0] - quantizationSub) * quantizationMul;
+                    const float updatedOutputHighValue = (quantizationDetails.outputHighValues[0] - quantizationSub) * quantizationMul;
+
+                    // 2. update FakeQuantize - one time action
+                    std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
+                        fakeQuantizeLayer,
+                        updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
+                        roundf(updatedOutputLowValue),
+                        roundf(updatedOutputHighValue));
+
+                    const size_t levels = static_cast<size_t>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
+                    newFakeQuantizeLayer->set_levels(levels);
+
+                    subgraph.quantizationLayers[index] = newFakeQuantizeLayer;
+                    subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
+                    break;
+                }
+                default: {
+                    THROW_TRANSFORMATION_EXCEPTION << "unexpected value " << quantizedTensorAlignmentOnActivations;
+                }
+            }
+        }
+    } else {
+        return false;
+    }
+
+    auto dequantizationValuesCallback = [&](
+        std::shared_ptr<ngraph::Node> layer,
+        const std::string originalLayerName,
+        std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
+        dequantizationsToConcatenate.push_back(dequantization);
+    };
+
+    addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
+
+    if (updatePrecisions) {
+        for (const auto it : subgraph.layers) {
+            const std::shared_ptr<ngraph::Node>& node = it.second;
+            if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node) != nullptr) {
+                ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
+            } else {
+                // set precision to explicitly to have updated precision during transformation
+                for (size_t i = 0; i < node->get_output_size(); ++i) {
+                    node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
+                }
+            }
+        }
+    }
+
+    for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
+        context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
+    }
+    return true;
+}
+
+bool ConcatTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
+    return true;
+}
+
+bool ConcatTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const {
+    std::shared_ptr<opset1::Concat> concat = as_type_ptr<opset1::Concat>(layer);
+    return concat->get_axis() == 1ul;
+}
+
+
+void ConcatTransformation::addDequantizationLayers(
+    TransformationContext& context,
+    ngraph::pass::low_precision::Subgraph& subgraph,
+    std::function<void(
+        std::shared_ptr<ngraph::Node> layer,
+        const std::string originalLayerName,
+        std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const {
+    std::unordered_map<std::string, ngraph::Node*> outputs;
+    for (size_t i = 0; i < context.function->get_output_size(); ++i) {
+        ngraph::Node* node = context.function->get_output_op(i).get();
+        if (node->get_input_size() != 1ul) {
+            THROW_IE_LPT_EXCEPTION(*node) << "unexpected inputs count for result node";
+        }
+
+        outputs.emplace(node->get_input_node_shared_ptr(0)->get_friendly_name(), node);
+    }
+
+    std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> notHandledSubgraphLayers = subgraph.layers;
+    while (notHandledSubgraphLayers.size() != 0ul) {
+        const auto layerIt = notHandledSubgraphLayers.begin();
+        std::shared_ptr<ngraph::Node> layer = layerIt->second;
+        notHandledSubgraphLayers.erase(layerIt);
+
+        std::vector<FakeQuantizeDequantization> layerDequantizations;
+
+        for (int i = 0; i < layer->get_output_size(); ++i) {
+            const auto childInputs = layer->get_output_target_inputs(i);
+            for (const auto childInput : childInputs) {
+                ngraph::Node& child = *childInput.get_node();
+
+                if (subgraph.layers.find(child.get_friendly_name()) == subgraph.layers.end()) {
+                    if (layerDequantizations.size() == 0ul) {
+                        getLayerDequantizationCallback(layer, layer->get_friendly_name(), layerDequantizations);
+                    }
+
+                    std::shared_ptr<ngraph::Node> source = layer->shared_from_this();
+                    {
+                        std::vector<std::shared_ptr<ngraph::Node>> convertNodes;
+                        std::vector<std::shared_ptr<ngraph::Node>> subtractNodes;
+                        std::vector<std::shared_ptr<ngraph::Node>> multiplyNodes;
+
+                        if (layerDequantizations.size() > 1ul) {
+                            auto broadcastElementWiseConst = [](
+                                std::shared_ptr<ngraph::opset1::Constant> operation,
+                                const ngraph::Shape targetShape) -> std::shared_ptr<Node> {
+                                auto unsqueeze = ngraph::pass::low_precision::fold<ngraph::opset1::Unsqueeze>(
+                                    operation->shared_from_this(),
+                                    std::make_shared<ngraph::opset1::Constant>(element::i64, ngraph::Shape{ 4 }, std::vector<size_t>{ 0, 1, 2, 3 }));
+
+                                auto targetShapeConst = std::make_shared<ngraph::opset1::Constant>(
+                                    element::i64, ngraph::Shape{ targetShape.size() },
+                                    targetShape);
+
+                                auto broadcast = ngraph::pass::low_precision::fold<ngraph::opset1::Broadcast>(
+                                    unsqueeze,
+                                    targetShapeConst,
+                                    ngraph::op::AutoBroadcastType::NUMPY);
+
+                                return broadcast;
+                            };
+
+                            bool allDequantizationShiftAreZero = true;
+                            bool allDequantizationMultiplyAreZero = true;
+                            for (FakeQuantizeDequantization dequantization : layerDequantizations) {
+                                if (dequantization.subtract != nullptr) {
+                                    allDequantizationShiftAreZero = false;
+                                }
+                                if (dequantization.multiply != nullptr) {
+                                    allDequantizationMultiplyAreZero = false;
+                                }
+                            }
+
+                            for (size_t i = 0; i < layerDequantizations.size(); ++i) {
+                                const auto& dequantization = layerDequantizations[i];
+
+                                convertNodes.push_back(dequantization.convert);
+
+                                const ngraph::element::Type precision = dequantization.data.get_element_type();
+                                ngraph::Shape targetShape = dequantization.data.get_shape();
+
+                                targetShape[0] = 1ul;
+                                for (size_t i = 2; i < targetShape.size(); ++i) {
+                                    targetShape[i] = 1ul;
+                                }
+
+                                if (!allDequantizationShiftAreZero) {
+                                    subtractNodes.push_back(dequantization.subtract == nullptr ?
+                                        std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 0.f })) :
+                                        broadcastElementWiseConst(
+                                            as_type_ptr<ngraph::opset1::Constant>(dequantization.subtract->input_value(1).get_node_shared_ptr()),
+                                            targetShape));
+                                }
+
+                                if (!allDequantizationMultiplyAreZero) {
+                                    multiplyNodes.push_back(dequantization.multiply == nullptr ?
+                                        std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 1.0f })) :
+                                        broadcastElementWiseConst(
+                                            as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->input_value(1).get_node_shared_ptr()),
+                                            targetShape));
+                                }
+                            }
+                        } else {
+                            // TODO: check constant shapes here - has to be scalar
+                            if (layerDequantizations[0].convert != nullptr) {
+                                convertNodes.push_back(layerDequantizations[0].convert);
+                            }
+
+                            if (layerDequantizations[0].subtract != nullptr) {
+                                subtractNodes.push_back(layerDequantizations[0].subtract->input_value(1).get_node_shared_ptr());
+                            }
+
+                            if (layerDequantizations[0].multiply != nullptr) {
+                                multiplyNodes.push_back(layerDequantizations[0].multiply->input_value(1).get_node_shared_ptr());
+                            }
+                        }
+
+                        // TODO: the second place (first is FQ decomposition) where dequantization operations are inserted
+                        const std::shared_ptr<ngraph::Node> destination = child.shared_from_this();
+
+                        if (!convertNodes.empty()) {
+                            const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
+                            std::shared_ptr<ngraph::Node> convert =
+                                convertNodes[0]->clone_with_new_inputs({ destination->get_input_source_output(sourceOutputIdx) });
+                            insert_new_node_between(source, destination, convert);
+                            source = convert;
+                        }
+
+                        // concatenation axis is 1
+                        if (!subtractNodes.empty()) {
+                            const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
+                            std::shared_ptr<ngraph::opset1::Subtract> subtract = std::make_shared<DequantizationSubtract>(
+                                destination->get_input_source_output(sourceOutputIdx),
+                                NetworkHelper::toScalarIfPossible(subtractNodes.size() == 1ul ?
+                                    subtractNodes[0] :
+                                    ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(subtractNodes, 1)));
+                            insert_new_node_between(source, destination, subtract);
+                            source = subtract;
+                        }
+
+                        if (!multiplyNodes.empty()) {
+                            const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
+                            std::shared_ptr<ngraph::opset1::Multiply> multiply = std::make_shared<DequantizationMultiply>(
+                                destination->get_input_source_output(sourceOutputIdx),
+                                NetworkHelper::toScalarIfPossible(multiplyNodes.size() == 1ul ?
+                                    multiplyNodes[0] :
+                                    ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(multiplyNodes, 1)));
+                            insert_new_node_between(source, destination, multiply);
+                            source = multiply;
+                        }
+                    }
+
+                    // first input is used
+                    const ngraph::element::Type precision = layerDequantizations[0].data.get_element_type();
+                    layer->set_output_type(0, precision, layer->get_output_partial_shape(0));
+
+                    const auto it = outputs.find(layer->get_friendly_name());
+                    if (it != outputs.end()) {
+                        const std::string originalName = layer->get_friendly_name();
+                        const std::string newName = layer->get_friendly_name() + LayerTransformation::originalLayerPostfix;
+                        layer->set_friendly_name(newName);
+                        source->set_friendly_name(originalName);
+                        subgraph.layers[layer->get_friendly_name()] = layer;
+                    }
+                }
+            }
+        }
+    }
+}
+
+bool ConcatTransformation::isHandled(const TransformationContext& context, const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations) {
+    for (const std::shared_ptr<ngraph::Node>& quantizationLayer : quantizationOperations) {
+        if (context.quantizedFakeQuantizeNames.find(quantizationLayer->get_friendly_name()) != context.quantizedFakeQuantizeNames.end()) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
+size_t ConcatTransformation::getMinQuantizationLevels(
+    const DataPrecision& dataPrecision,
+    const float maxOutputInterval,
+    const std::vector<QuantizationDetails>& quantizationLayersDetails,
+    const float outputLowValue,
+    const float outputHighValue) const {
+    size_t minLevels = std::numeric_limits<std::size_t>::max();
+    for (const QuantizationDetails quantizationDetails : quantizationLayersDetails) {
+        // if there is negative part then calculation is based on `outputLowValue` if not then on `outputHighValue` only
+        const float updatedOutputLowValue = outputLowValue != 0.f ?
+            (quantizationDetails.outputLowValues[0] / outputLowValue) * dataPrecision.min :
+            (quantizationDetails.outputLowValues[0] / outputHighValue) * dataPrecision.max;
+
+        // if there is positive part then calculation is based on `outputHighValue` if not then on `outputLowValue` only
+        const float updatedOutputHighValue = outputHighValue != 0.f ?
+            (quantizationDetails.outputHighValues[0] / outputHighValue) * dataPrecision.max :
+            (quantizationDetails.outputHighValues[0] / outputLowValue) * dataPrecision.min;
+
+        const int levels = static_cast<int>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
+        if (minLevels > levels) {
+            minLevels = levels;
+        }
+    }
+    return minLevels;
+}
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/inference-engine/src/transformations/src/transformations/low_precision/concat_multi_channels.cpp
+++ b/inference-engine/src/transformations/src/transformations/low_precision/concat_multi_channels.cpp
@ -0,0 +1,232 @@
+// Copyright (C) 2020 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "transformations/low_precision/concat_multi_channels.hpp"
+
+#include <queue>
+#include <memory>
+#include <string>
+#include <unordered_map>
+#include <vector>
+
+#include <ngraph/ngraph.hpp>
+#include <ngraph/opsets/opset1.hpp>
+
+#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
+#include "transformations/low_precision/common/ie_lpt_exception.hpp"
+#include "transformations/low_precision/common/subgraph.hpp"
+#include "transformations/low_precision/network_helper.hpp"
+
+namespace ngraph {
+namespace pass {
+namespace low_precision {
+
+bool ConcatMultiChannelsTransformation::isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept {
+    for (const std::shared_ptr<ngraph::opset1::Concat>& concat : concatLayers) {
+        const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(concat);
+        for (const std::shared_ptr<ngraph::Node>& child : children) {
+            if (is_type<ngraph::opset1::Convolution>(child.get())) {
+                return false;
+            }
+        }
+    }
+    return true;
+}
+
+void ConcatMultiChannelsTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
+    addSingleNodePattern<opset1::Concat>(pass, context);
+}
+
+bool ConcatMultiChannelsTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
+    std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
+    if (!canBeTransformed(context, concat)) {
+        return false;
+    }
+
+    ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
+    std::unordered_set<std::string> handledLayers;
+    if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
+        return false;
+    }
+
+    if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
+        return false;
+    }
+
+    if (!isMultiChannel(subgraph.concatLayers)) {
+        ConcatTransformation::transform(context, m);
+        return false;
+    }
+
+    DataPrecision dataPrecision;
+    {
+        for (auto quantizationLayer : subgraph.quantizationLayers) {
+            std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer->shared_from_this());
+            const DataPrecision tmp = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
+
+            if (dataPrecision.precision == ngraph::element::undefined) {
+                dataPrecision = tmp;
+                continue;
+            }
+
+            if ((tmp.precision != dataPrecision.precision) && (tmp.precision == ngraph::element::u8)) {
+                dataPrecision = tmp;
+            }
+        }
+    }
+
+    std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
+
+    for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
+        const std::shared_ptr<ngraph::Node>& fakeQuantizeLayer = subgraph.quantizationLayers[i];
+        const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
+        if (shape.size() < 4ul) {
+            return false;
+        }
+
+        const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
+        if (fq == nullptr) {
+            return false;
+        }
+
+        const DataPrecision currentDataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
+        const QuantizationDetails quantizationDetails = QuantizationDetails::getDetails(fq);
+
+        // 1. get data for dequantization. Dequantization data will be used several times later.
+        const FakeQuantizeDequantization fakeQuantizeDequantization = ngraph::pass::low_precision::NetworkHelper::createDequantizationFromFakeQuantize(
+            fq,
+            dataPrecision.precision,
+            dataPrecision.min,
+            dataPrecision.max,
+            dataPrecision.precision == currentDataPrecision.precision ? currentDataPrecision.hasZeroPoint : true,
+            updatePrecisions);
+        dequantizations[fakeQuantizeLayer->get_friendly_name()] = fakeQuantizeDequantization;
+
+        // 2. update FakeQuantize - one time action
+        const std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
+            fq,
+            updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
+            roundf(dataPrecision.min),
+            roundf(dataPrecision.max));
+
+        subgraph.quantizationLayers[i] = newFakeQuantizeLayer;
+        subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
+    }
+
+    auto dequantizationValuesCallback = [&](
+        std::shared_ptr<ngraph::Node> layer,
+        const std::string originalLayerName,
+        std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
+        if (layer->get_friendly_name() != originalLayerName) {
+            const auto update = [](
+                const std::string& originalLayerName,
+                const std::string& newLayerName,
+                std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationLayers) {
+                auto it = dequantizationLayers.find(originalLayerName);
+                if (it != dequantizationLayers.end()) {
+                    dequantizationLayers.emplace(newLayerName, it->second);
+                    dequantizationLayers.erase(it);
+                }
+            };
+            update(originalLayerName, layer->get_friendly_name(), dequantizations);
+        }
+
+        fillDequantization(
+            layer,
+            dequantizations,
+            dequantizationsToConcatenate);
+    };
+
+    addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
+
+    if (updatePrecisions) {
+        for (const auto it : subgraph.layers) {
+            const std::shared_ptr<ngraph::Node> node = it.second;
+            if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node)) {
+                ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
+            } else {
+                // set precision to explicitly to have updated precision during transformation
+                for (size_t i = 0; i < node->get_output_size(); ++i) {
+                    node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
+                }
+            }
+        }
+    }
+
+    for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
+        context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
+    }
+    return true;
+}
+
+bool ConcatMultiChannelsTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
+    return true;
+}
+
+// fill dequantizationsToMerge collection for layer with using dequantizationByFakeQuantize
+void ConcatMultiChannelsTransformation::fillDequantization(
+    std::shared_ptr<ngraph::Node> layer,
+    std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
+    std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
+    std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>> fakeQuantizes;
+    std::shared_ptr<ngraph::opset1::FakeQuantize> currentFakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(layer);
+    if (currentFakeQuantize != nullptr) {
+        fakeQuantizes.push_back(currentFakeQuantize);
+    } else {
+        fillQuantization(layer, fakeQuantizes);
+        if (fakeQuantizes.size() == layer->get_input_size()) {
+            updateDequantizationShapesIfNecessary(layer, fakeQuantizes, dequantizationByFakeQuantize);
+        }
+    }
+
+    for (const auto& fakeQuantize : fakeQuantizes) {
+        const auto it = dequantizationByFakeQuantize.find(fakeQuantize->get_friendly_name());
+        if (it == dequantizationByFakeQuantize.end()) {
+            THROW_IE_LPT_EXCEPTION(*fakeQuantize) << "dequantization scale values are not found";
+        }
+        const FakeQuantizeDequantization& fakeQuantizeDequantization = it->second;
+        dequantizationsToConcatenate.push_back(fakeQuantizeDequantization);
+    }
+}
+
+void ConcatMultiChannelsTransformation::updateDequantizationShapesIfNecessary(
+    std::shared_ptr<ngraph::Node> layer,
+    std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
+    std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize) {
+    for (int i = 0; i < fakeQuantizes.size(); ++i) {
+        ngraph::Shape inputShape = layer->get_input_shape(i);
+        ngraph::Shape dequantizationShape = fakeQuantizes[i]->get_shape();
+        if (inputShape[1] != dequantizationShape[1]) {
+            FakeQuantizeDequantization replacedDequantization = dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()];
+
+            const float scale = as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<float>()[0];
+            const float shift = replacedDequantization.subtract ?
+                as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<float>()[0] : 0.f;
+            const auto precisionBefore = replacedDequantization.data.get_element_type();
+            const auto precisionAfter = replacedDequantization.multiply->get_element_type();
+
+            auto newDequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
+                scale, shift, precisionBefore, inputShape, precisionAfter, 0.f, 5.f);
+            dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()] = newDequantization;
+        }
+    }
+}
+
+void ConcatMultiChannelsTransformation::fillQuantization(
+    const std::shared_ptr<ngraph::Node> layer,
+    std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes) {
+    for (int i = 0; i < layer->get_input_size(); ++i) {
+        std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(i);
+        std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
+        if (fakeQuantize != nullptr) {
+            fakeQuantizes.push_back(fakeQuantize);
+        } else {
+            fillQuantization(parent, fakeQuantizes);
+        }
+    }
+}
+
+} // namespace low_precision
+} // namespace pass
+} // namespace ngraph
--- a/Show More
+++ b/Show More