Es/lpt/lpt to ngraph fixes2 with master (#2671)

* [LPT] Replace creation of dequantization with factory

* [ngraph][LPT] Add ScaleShift replace for dequantization operations

* [LPT] SubtractMultiplyToMultiplyAdd refactoring

* [LPT] Code style fix

* [LPT] Edit SubtractMultiplyToMultiplyAdd transformation for dequantization

* [LPT] Linux compilation quick fix

* [LPT] [WIP] runtime info applying

* [LPT] Concat transformation functional tests extending

* [LPT] MultiplyToConvolution + Subtract to add fusing + improvements in LowPrecisionTransformer

* [LPT] linux compilation error fix

* [LPT] compilation error

* [LPT] MultiplyToGroupConvolution fix: 5D support

* [LPT] Multiply transformation extending: FQ weights support - wip

* [LPT] FQ folding & precision selection

* [LPT] code style fixes

* [LPT] code style fixes

* [LPT] Linux compilation error fix

* [LPT] SubtractMultiplyToMultiplyAdd: refactoring

* [LPT] Tests fixes

* [LPT] MultiplyToGroupConvolution tests

* [LPT] Convert subtract with int inputs to Eltwise sub

* [LPT] Constant folding fix for quant models

* [LPT] 1) Asymmetric quantization improvement 2) tests extending

* [LPT] 2 fixes for se_resnext_50

* [LPT] Add transformation priority branch selection test

* [LPT] AddMultiplyFusion: legacy transformation quick fix

* [LPT] nGraph tests temporary disabling

* [LPT] Fix for eltwise inputs with multiple outputs

* [LPT] Fix for FQ fuse

* [LPT] Reshape by channel, batch temporary disabled

* [nGraph][LPT] MatMul fix for reading FP16 models

* [LPT] 1) Add (not after Convolution/GroupConvolution/MatMul with Constant) to Subtract 2) precision selection fix: MultiplyToGroupConvolution quick fix

* [LPT] DenseNet improvments: AddTransformation: Add to Subtract + tests

* [LPT] AddTransformarion refactoring

* [LPT] AddTransformation tests temporay disabled

* [LPT] ReshapeTransformation improvements: degradation fix

* [LPT] code style fix

* [LPT] Concat tests temporary disabling

* [LPT] tests unification
1) plugin tests: added test-cases and nGraph-validation for clamp, split and variadic split
2) func tests: added test-cases
3) transformNGraph: added the ability to run additional transformations

* [LPT] split & variadic split merge fix

* [LPT] Clamp: added support for asymmetric quantization

* [LPT] added DequantizationAttr run-time attribute

* [LPT] debug info removal

* [LPT] ConcatTransformation: zero point fix

* [LPT] CNNNetwork ReLU transformation quick fix

* [LPT]
1) Concat fix
2) ConcatMultiChannels fix
3) Added "Concat with Split" test-cases
4) Subgraph fix

* [LPT]
1) Concat fix
2) Added "Concat with different precision on childs" test-case

* [LPT] concat fix Ubuntu18

* [LPT] Concat test fixes

* [LPT] Not fp32 FQ input support

* [LPT] MatMul Fix + separateInStandaloneBranch Fix

* [LPT] Fix reference input types in mish fusion tests

* [LPT] Fix cpuFuncTests on CentOS building

* [nGraph][LPT] ScaleShift 2d, 3d nGraph conversion enabling

* [LPT] 1) FullyConnected workaround removing 2) validate_nodes_and_infer_types for LPT

* [ngraph] Add check for childs for ConvertSubtract

* [LPT] Squeeze/Unsqueeze tests unification

* [LPT] Squeeze/Unsqueeze change signature for getReference/getOriginal

* [LPT] Mul & Add -> ScaleShift quick fix

* [LPT] nGraph tests emporary disabling

* [LPT] code style fix

* [LPT] code style fix #2

* [LPT] nGraph tests temporary disabling

* [LPT] code styl fix #3

* [LPT] shared plugin tests temporary disabling

* [LPT] cleanup

* [LPT] nGraph unit_tests tests temproary disabling

* [LPT] nGraph unit tests disabling #2

* [LPT] nGraph tests disabling

* [LPT] nGraph tests temporary disabling

* [LPT] WA removing

* [LPT] CentOS compilation fix

* [LPT] KMB wa to avoid compilation error

* [LPT] functional test temporary disabling

* [nGraph] code style fixes

* [LPT] ConcatTransformation: data movement operation as intermediate handling

* [LPT] FuseSubtractToFakeQuantize after VariadicSplit

* [LPT] ConcatWithSplitTransformation functional test temporary disabling

* [LPT] Clamp and ConcatWithDifferentPrecisionsOnChilds: tests fix

* [LPT] MatMul: bert-nv-mlperf-quantized fix

* [LPT] Add to convolution biases fuse fix

* [LPT] GPU plugin tests fixes

* [LPT] Normalize GPU plugin tests fix

* [LPT] test-commit

* [LPT] CLDNN Plugin FP16 conversion

* [LPT] AvgPool update precision if there is not FQ after + convolution
precision limitation on activation

* [LPT] Convolution fixes

* [LPT] FuseSubtractToFakequantize & FuseMultiplyToFakeQuantize improvement

* [LPT] FuseSubtractToFakeQuantize test fix

* [LPT] FuseSubtractToFakeQuantizeTransformation tests

* [LPT] code style fix

* [LPT] AvgPool child recursive extend

* [LPT] AvgPool tests + fix

* [LPT] compilation quick fix

* [LPT] Add to convolution biases fuse fix

* [LPT] Linux issues: MatMulWithOptimizedConstantFakeQuantizeTransformation temporary disabled

* [LPT] Normalize GPU plugin tests fix

* [LPT] test-commit

* [LPT]
1) added the ability to create sub without dequantizationAttribute
2) fixed optimizeMulAfter: added copying rt_info
3) Tests Unification: Convolution transformation
4) added cleanRunTimeInfo into Network Helper

* [LPT] Tests Unification: GroupConvolution

* [LPT] removed debug info

* [LPT] functional tests for Convolution & GroupConvolution extending

* [LPT] [MatMul] Quick fix ubuntu error

* [LPT] MatMulTransformation quick test fix: one constant for both intervals

* [nGraph] code style fix

* [LPT] added output_precision to NormalizeIE

* [nGraph] NormalizeIE fix for LPT support

* [LPT] nGraph WA removal

* [LPT] fixed fillSubgraph for concat multi channels

* [LPT] MatMul fix

* [nGraph] WA removal: 1) nGraph tests enabling 2) LPT extanding: not handle in FP32

* [LPT] nGraph WA removal: function tests skip config rollback

* [LPT] WA removal: precision propagation fix

* [LPT] ConvertMulOrAddFinally transformation extending

* [nGraph] ConvolutionMultiplyFusion rollback (move from legacy to common)

* [nGraph] ConvertMulAddToScaleShiftOrPower: WA removal

* [nGraph] TypeRelaxed: WA removal

* [nGraph] WA removal: TypeRelaxed

* [LPT] WA removal: ConcatTransformation

* [nGraph] WA removal: Eltwise & ConvertMulOrAddFinally fixes to support LPT

* [nGraph] MulAddConversion fix: 2D & 3D ScaleShift are supproted

* [nGraph] VisualizeTree extending

* [LPT] FakeQuantizeDequantization extending: check element wise dequantization operation

* [LPT] FakeQuantizeDequantization extending: SubtractMultiplyToMultiplyAddTransformation & WeightableLayerTransformation

* [LPT] Convolution + test infrastructure update

* [LPT] GPU compilation error

* [nGraph] BatchNorm plugin tests: input tensor definition

* [LPT] LowPrecisionTransformer::isFunctionQuantized was added

* [nGraph] WA final cleanup

* [nGraph] ScaleShiftIE quick fix

* [LPT] Functional tests: added test-cases "Concat with intermediate with constant"

* [LPT] Transformer::isNetworkquantized fix

* [LPT] SubtractMultiplyToMultiplyAdd zero Add remove: fix for ssd300 on gpu

* [LPT] MultiplyToGroupConvolution not transform on Const

* [LPT] workaround for negative scales

* [LPT] Convert standalone dequantization Mul,Sub,Add to ScaleShift

* [LPT] SubtractMultiplyToMultiplyAdd test fix

* [LPT] Clamp transformation: GPU tests fix

* [LPT] Transformer tests

* [LPT] FakeQuantizePrecisionSelectionTransformation was disabled for GPU

* [LPT] TransformerIsFunctionQuantized refactoring

* [nGraph] code style fix

* [LPT] mobilenet_v2_tf_depthwise test update

* [LPT] TMP: dequantization folding

* [LPT] Elementwise transformation fix: dequantization operations constant folding

* [LPT] cleanup

* [LPT] denormal values fix

* [LPT] FuseFakeQuantize test fixed + negative multiply case

* [LPT] FP32 -> FP16 conversion info

* [LPT] FQ dot interval support + swapMultiplyAdd safely division

* [LPT] test fix

* [LPT] Tests for dot interval on FQ + tests for addTransformation enabling

* [LPT] Clamp transformation fix

* [LPT] FQ prec selection test fix

* [LPT] Clamp test case

* [LPT] Concat division precision fix

* [LPT] cleanup

* [LPT] merge fix

* [LPT] WIP: MatMul asymmetric quantization fix (BERT)

* [LPT] MatMulWithOptimizedConstantFakeQuantizeTransformation disabled

* [LPT] GPU Plugin set config fix

* [LPT] Fix merge mistakes

* [LPT] Rollback device specific INT8

* [LPT] ReshapeFullyConnected fix: FullyConnected output fix

* [LPT] bert-base-chinese GPU fix

* [ngraph/LPT] Tests for fix convert_mul_or_add_finally with dequantization

[ngraph/LPT] Fix convert mul_or_add_finally with dequantization

* [LPT] ScaleShift dim < 4 only dequantization conversion

* [LPT] MatMul transformation tests extensing

* [LPT] ReshapeFullyConnected legacy transformation: LPT test case addition

* [nGraph] VisualizeTree extending: property names displying to simplify search

* [LPT] getDequantization extending

* [LPT] MulAddToScaleshiftOrPower: out precision fix & tests

* [LPT] Multiply to ScaleShiftIE: Multiply transformation: remove DEQUANTIZATION if not valid

* [LPT] Concat test case

* [nGraph] try to fix opencv compatibility

* [nGraph] nGraph code style fix

* [LPT] InPlace dequantization folding

* [LPT] Multiply constant folding test

* [LPT] Fix plugin test case for MatMulWithOptimizedConstantFakeQuantize

[LPT] Enable MatMulWithOptimizedConstantFakeQuantize plugin test

* [LPT] Convolution transformation: mulConst shape fix

* [LPT] INT8 Constant folding branch for elementwise ops optimization removal

* [LPT] eltwise for const branch fix

* [LPT] linux fix

* [LPT] Multiply test refactoring

* [LPT] Convert Fuse in Constant + tests

* [LPT] function comparation: runtime info comparation rollback

* [LPT] linux build fix

* [LPT] linux build fix2

* [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT

* [LPT] Reshape transformation update: don't broadcast by batch

* [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT - refactoring

* [LPT] MatMul transformation: transpose input tensors fix

* [LPT] checkElementwise for AddTransformation WA: should be moved to getDequantization

* [LPT] merge fix

* [LPT] MatMul fix & tests

* [LPT] AddTransformation tests

* [LPT] Interpolate transformation enabled

* [LPT] constant folding before LPT

* [LPT] WIP: not completed tests

* [LPT] GPU degradation fix

* [LPT] FuseConvert workaround

* [LPT] code cleanup

* [LPT] Interpolate GPU test quick fix

* [LPT] GroupConvolution fix

* [LPT] Fix fusing multiply for non-dequantization layers

* [LPT] GPU pipeline update: enableInt8 initialization place update

* [LPT] tests compilation fix

* [LPT] merge fix

* [LPT] tests enabling

* [LPT] merge issue resolving

* [LPT] LPT CNNNetwork usage macros: part #1: source code

* [LPT] LPT CNNNetwork usage macros: part #2: cmake files update and tests addoption

* [LPT] LPT workaround from nGraph core removing

* [LPT] previous LPT version tests

* [LPT] inference_engine_lp_transformations was returned back

* [LPT] replace_node rollback

* [LPT] ConvertSubtract fix

* [LPT] GPU: baselineIsFP16 reuse fix

* [LPT] FakeQuantizeTransformation: GPU workaround: I32 -> FP32 Convert is not fused

* [LPT] AvgPool output precision workaround

* [LPT] Group convolution precision + Subtract to ScaleShift const fix

* [LPT] SubMulToMulAdd & Transpose: action-recognition-0001 fix

* [LPT] Transpose: added test with per-tensor quantization

Co-authored-by: Aleksandr Pertovsky <aleksandr.pertovsky@intel.com>
Co-authored-by: Zinoviev, Vladimir <vladimir.zinoviev@intel.com>
Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com>
Co-authored-by: Gorokhov Dmitriy <dmitry.gorokhov@intel.com>
This commit is contained in:
Edward Shogulin 2020-10-23 13:22:55 +03:00 committed by GitHub
parent ca95240c91
commit c2271da637
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
537 changed files with 37312 additions and 2406 deletions

View File

@ -21,9 +21,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
SOURCES ${MAIN_SRC} ${LIBRARY_HEADERS}
VERSION_DEFINES_FOR cldnn_engine.cpp)
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
clDNN_lib pugixml inference_engine_transformations)
if (USE_CNNNETWORK_LPT)
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
endif()
set (CLDNN_TOP_FOLDER ${IE_MAIN_SOURCE_DIR}/thirdparty/clDNN)
target_include_directories(${TARGET_NAME} PRIVATE
${CMAKE_CURRENT_SOURCE_DIR}

View File

@ -34,7 +34,9 @@
#include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
#include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
#include <transformations/init_node_info.hpp>
#include <transformations/convert_precision.hpp>
#include <transformations/rt_info/fused_names_attribute.hpp>
#include <legacy/convert_function_to_cnn_network.hpp>
#include <legacy/ie_util_internal.hpp>
#include <legacy/graph_transformer.h>
@ -43,6 +45,9 @@
#include "cldnn_executable_network.h"
#include "cldnn_custom_layer.h"
#include <transformations/low_precision/transformer.hpp>
#include <transformations/low_precision/mat_mul.hpp>
#ifdef __linux__
#include <dlfcn.h>
#endif
@ -73,8 +78,10 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
return device_info;
}
InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const {
InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network, CLDNNPlugin::Config config) const {
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
bool baselineIsFP16 = false;
if (clonedNetwork->getFunction()) {
const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
// Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
@ -113,6 +120,12 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
return can_use_reduce;
}
if (auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(node)) {
return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
}
return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node) ||
std::dynamic_pointer_cast<const ::ngraph::opset2::BatchToSpace>(node) ||
@ -128,24 +141,64 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
// Disable shape inference (WA for generic operations)
::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);
// Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::InitNodeInfo>();
// WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
manager.register_pass<ngraph::pass::ConvertPriorBox>();
manager.register_pass<ngraph::pass::CommonOptimizations>();
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
#ifndef USE_CNNNETWORK_LPT
bool enableInt8;
#endif
manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);
{
// Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
ngraph::pass::Manager manager;
manager.register_pass<ngraph::pass::InitNodeInfo>();
// WA: ConvertPriorBox must be executed before the 1st ConstantFolding pass
manager.register_pass<ngraph::pass::ConvertPriorBox>();
manager.register_pass<ngraph::pass::CommonOptimizations>();
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
ngraph::pass::Manager ti_manager;
// Unroll will be called after all conversions
// temporarily switch back to plugin unroller from NGraph unroller until TI output names are corrected
// ti_manager.register_pass<ngraph::pass::UnrollTensorIterator>();
ti_manager.run_passes(nGraphFunc);
manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);
#ifndef USE_CNNNETWORK_LPT
enableInt8 = config.enableInt8 && ngraph::pass::low_precision::LowPrecisionTransformer::isFunctionQuantized(nGraphFunc);
if (enableInt8) {
const auto fp16_callback = [&baselineIsFP16](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
if (!baselineIsFP16 && node->get_output_element_type(0) == ngraph::element::f16) {
baselineIsFP16 = true;
}
return true;
};
ngraph::pass::Manager conversion_manager;
// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
conversion_manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::f16, ngraph::element::f32);
conversion_manager.set_callback(fp16_callback);
conversion_manager.run_passes(nGraphFunc);
}
#endif
}
#ifndef USE_CNNNETWORK_LPT
using namespace ngraph::pass::low_precision;
if (enableInt8) {
auto params = LayerTransformation::Params(
true, // updatePrecisions
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
true); // supportAsymmetricQuantization
LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
.add<MatMulTransformation, ngraph::opset1::MatMul>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false)));
transformer.transform(nGraphFunc);
}
#endif
{
ngraph::pass::Manager manager = ngraph::pass::Manager();
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
manager.set_callback(transformations_callback);
manager.run_passes(nGraphFunc);
}
clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
}
@ -157,6 +210,17 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
transformator.fullTrim();
}
if (baselineIsFP16) {
// [WA part1] Store 'lpt_back_to_fp16' flag to convert FP32 operations to original FP16 after LPT
InputsDataMap inputsMap;
clonedNetwork->getInputsInfo(inputsMap);
if (!inputsMap.empty()) {
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
input0.begin()->second->params["lpt_back_to_fp16"];
}
}
return clonedNetwork;
}
@ -259,7 +323,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
context = m_defaultContext;
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), context, conf);
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), context, conf);
}
ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network,
@ -283,7 +347,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
conf.max_dynamic_batch = static_cast<int>(network.getBatchSize());
}
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), casted, conf);
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), casted, conf);
}
RemoteContext::Ptr clDNNEngine::CreateContext(const ParamMap& params) {
@ -326,7 +390,7 @@ QueryNetworkResult clDNNEngine::QueryNetwork(const ICNNNetwork& network,
for (auto&& node : function->get_ops()) {
originalOps.emplace(node->get_friendly_name());
}
auto clonedNetwork = CloneAndTransformNetwork(network);
auto clonedNetwork = CloneAndTransformNetwork(network, _impl->m_config);
std::unordered_set<std::string> supported;
std::unordered_set<std::string> unsupported;

View File

@ -27,7 +27,8 @@ class clDNNEngine : public InferenceEngine::InferencePluginInternal,
CLDNNRemoteCLContext::Ptr m_defaultContext;
cldnn::device_info GetDeviceInfo(const std::map<std::string, std::string> &config) const;
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const;
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network,
CLDNNPlugin::Config config) const;
public:
clDNNEngine();

View File

@ -88,9 +88,11 @@
#include <sys/stat.h>
#include <exec_graph_info.hpp>
#ifdef USE_CNNNETWORK_LPT
#include "low_precision_transformations/transformer.hpp"
#include "low_precision_transformations/fully_connected.hpp"
#include "low_precision_transformations/gemm.hpp"
#endif
#include <iostream>
#include <iomanip>
@ -397,6 +399,41 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
, p_currentOutputs({}) {
InitFormat(network);
bool fqFound = false;
bool baselineIsFP16 = false;
InputsDataMap inputsMap;
network.getInputsInfo(inputsMap);
if (!inputsMap.empty()) {
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
if (!input0.empty() && (input0.begin()->second->params.count("lpt_back_to_fp16") != 0)) {
baselineIsFP16 = true;
fqFound = true;
}
}
#ifdef USE_CNNNETWORK_LPT
bool allFQareSupported = true;
if (config.enableInt8) {
auto it = details::CNNNetworkIterator(&network);
auto end = details::CNNNetworkIterator();
while (it != end) {
auto& layer = *it;
if (layer->precision == Precision::FP16) {
baselineIsFP16 = true;
}
if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
fqFound = true;
auto levels = layer->GetParamAsUInt("levels");
if (levels != 255 && levels != 256) {
allFQareSupported = false;
}
}
it++;
}
}
if (config.enableInt8) {
auto params = LayerTransformation::Params(true, // updatePrecisions
true, // quantizeOutputs
@ -413,29 +450,6 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
.add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
.add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");
bool fqFound = false;
bool allFQareSupported = true;
bool baselineIsFP16 = false;
{
auto it = details::CNNNetworkIterator(&network);
auto end = details::CNNNetworkIterator();
while (it != end) {
auto& layer = *it;
if (layer->precision == Precision::FP16) {
baselineIsFP16 = true;
}
if (CaselessEq<std::string>()(layer->type, "FakeQuantize")) {
fqFound = true;
auto levels = layer->GetParamAsUInt("levels");
if (levels != 255 && levels != 256) {
allFQareSupported = false;
}
}
it++;
}
}
// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
if (fqFound && allFQareSupported) {
NetPass::ConvertPrecision(network, Precision::FP16, Precision::FP32);
@ -443,8 +457,11 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
LowPrecisionTransformer transformer(transforms);
transformer.transform(network);
}
#endif
// [WA part2] Try to find non-quantized layers and convert them back to FP16
// [WA part2] Try to find non-quantized layers and convert them back to FP16
if (config.enableInt8) {
if (fqFound && baselineIsFP16 && config.enable_fp16_for_quantized_models) {
auto layersSorted = BFSSort(network);

View File

@ -57,7 +57,12 @@ target_compile_definitions(${TARGET_NAME}_test_static
INTEGER_LOW_P
USE_STATIC_IE)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_lp_transformations libGNA::API)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
if (USE_CNNNETWORK_LPT)
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_lp_transformations)
endif()
target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)

View File

@ -22,13 +22,17 @@ public:
Eltwise(const Output<Node>& data1,
const Output<Node>& data2,
const ELTWISE_TYPE eltwise_type);
const ELTWISE_TYPE eltwise_type,
const element::Type output_type = element::undefined);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
ELTWISE_TYPE eltwise_type;
private:
element::Type m_output_type;
};
} // namespace op

View File

@ -29,17 +29,21 @@ public:
FullyConnected(const Output<Node> & A,
const Output<Node> & B,
const Output<Node> & C,
const Shape & output_shape);
const Shape & output_shape,
const element::Type output_type = element::undefined);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
size_t get_out_size() { return m_output_size; }
size_t get_out_size() const { return m_output_size; }
element::Type get_output_type() const { return m_output_type; }
private:
size_t m_output_size = 0;
Shape m_output_shape = {};
element::Type m_output_type;
};
} // namespace op

View File

@ -25,7 +25,8 @@ public:
const Output<Node>& weights,
float eps,
bool across_spatial,
bool channel_shared);
bool channel_shared,
const ngraph::element::Type output_type);
float get_eps() const { return m_eps; }
bool get_channel_shared() const { return m_channel_shared;}
@ -39,6 +40,7 @@ protected:
float m_eps;
bool m_across_spatial;
bool m_channel_shared;
ngraph::element::Type m_output_type;
};
} // namespace op

View File

@ -19,13 +19,16 @@ public:
const NodeTypeInfo& get_type_info() const override { return type_info; }
PowerIE(const Output<Node>& data_batch,
const float power, const float scale, const float shift);
const float power, const float scale, const float shift, const element::Type output_type = element::undefined);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
float scale, power, shift;
private:
element::Type m_output_type;
};
} // namespace op

View File

@ -18,7 +18,7 @@ public:
static constexpr NodeTypeInfo type_info{"ReLUIE", 1};
const NodeTypeInfo& get_type_info() const override { return type_info; }
ReLUIE(const Output<Node> & data, const float & negative_slope);
ReLUIE(const Output<Node> & data, const float & negative_slope, const element::Type output_type);
void validate_and_infer_types() override;
@ -26,8 +26,11 @@ public:
float get_slope() { return m_negative_slope; }
element::Type get_output_type() const { return m_output_type; }
private:
float m_negative_slope;
element::Type m_output_type;
};
} // namespace op

View File

@ -20,11 +20,15 @@ public:
ScaleShiftIE(const Output<Node>& data_batch,
const Output<Node>& weights,
const Output<Node>& bias);
const Output<Node>& bias,
const element::Type output_type = element::undefined);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
private:
element::Type output_type;
};
} // namespace op

View File

@ -35,6 +35,7 @@ public:
// This pass finally converts single Multiply and Add operations to ScaleShift or Power operation
ConvertMulOrAddFinally() : GraphRewrite() {
convert_mul_or_add_finally<ngraph::opset1::Add>();
convert_mul_or_add_finally<ngraph::opset1::Subtract>();
convert_mul_or_add_finally<ngraph::opset1::Multiply>();
}
@ -52,11 +53,13 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
et = ELTWISE_TYPE::Prod;
} else if (std::is_same<T, ngraph::opset1::Add>()) {
et = ELTWISE_TYPE::Sum;
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
et = ELTWISE_TYPE::Sub;
} else {
return false;
}
auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et);
auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et, node->output(0).get_element_type());
eltwise->set_friendly_name(node->get_friendly_name());
ngraph::copy_runtime_info(node, eltwise);
ngraph::replace_node(node, eltwise);
@ -66,7 +69,7 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
template <typename T>
ngraph::graph_rewrite_callback get_callback() {
ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher& m) {
static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Multiply>(),
static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Subtract>() || std::is_same<T, ngraph::opset1::Multiply>(),
"Unsupported template parameter. Only Add or Multiply allowed!");
auto lin_op = std::dynamic_pointer_cast<T> (m.get_match_root());
@ -77,7 +80,10 @@ ngraph::graph_rewrite_callback get_callback() {
const auto output_shape = lin_op->output(0).get_partial_shape();
const auto output_shape_rank = output_shape.rank().get_length();
if (!lin_op->get_element_type().is_real()) {
const auto intInputs = !lin_op->get_input_element_type(0).is_real() &&
!lin_op->get_input_element_type(1).is_real();
if (!lin_op->get_element_type().is_real() || intInputs) {
return convert_to_eltwise<T>(lin_op,
lin_op->input(0).get_source_output(),
lin_op->input(1).get_source_output());
@ -147,14 +153,65 @@ ngraph::graph_rewrite_callback get_callback() {
auto res = check_constant(const_node, data_node.get_partial_shape());
if (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4)) {
auto checkElementwise = [](const std::shared_ptr<ngraph::Node>& elementwise) -> bool {
const ngraph::PartialShape partialShape = elementwise->get_input_partial_shape(0);
if (partialShape.is_dynamic()) {
return false;
}
std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(1));
if (constant == nullptr) {
constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(0));
}
if (constant == nullptr) {
return false;
}
const ngraph::Shape constShape = constant->get_output_shape(0);
if ((constShape.size() > 5ul)) {
return false;
}
if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
return true;
}
const ngraph::Shape shape = partialShape.to_shape();
if (constShape.size() == shape.size()) {
if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
return false;
}
for (size_t i = 2ul; i < constShape.size(); ++i) {
if (constShape[i] != 1ul) {
return false;
}
}
} else if (constShape.size() == (shape.size() - 1)) {
if (constShape[0] != shape[1]) {
return false;
}
for (size_t i = 1ul; i < constShape.size(); ++i) {
if (constShape[i] != 1ul) {
return false;
}
}
} else {
return false;
}
return true;
};
bool is_dequantization = (lin_op->get_rt_info().count("DEQUANTIZATION") != 0) && checkElementwise(lin_op);
if (!is_dequantization && (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4))) {
return convert_to_eltwise<T>(lin_op,
lin_op->input(0).get_source_output(),
lin_op->input(1).get_source_output());
}
// TODO: if all values in Constant are equal the best way is to convert this Eltwise to Power
if (res == CONVERSION_RESULT::SCALE_SHIFT) {
if (res == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
auto weights_et = const_node->get_element_type();
auto weights_shape = const_node->get_shape();
@ -162,12 +219,49 @@ ngraph::graph_rewrite_callback get_callback() {
std::shared_ptr<ngraph::op::ScaleShiftIE> scaleshift;
if (std::is_same<T, ngraph::opset1::Add>()) {
auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(weights, output_shape),
ngraph::op::util::normalize_constant(const_node, output_shape));
} else {
auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
auto biases_in = ngraph::op::util::normalize_constant(const_node, output_shape);
if (is_dequantization) {
const ngraph::Shape data_shape = data_node.get_shape();
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
broadcasted_shape[1] = data_shape[1];
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
}
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
std::shared_ptr<ngraph::Node> new_const_node = std::make_shared<ngraph::opset1::Multiply>(
ngraph::op::util::normalize_constant(const_node, output_shape),
ngraph::opset1::Constant::create(weights_et, ngraph::Shape{ 1 }, { -1 }));
auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
auto biases_in = new_const_node;
if (is_dequantization) {
const ngraph::Shape data_shape = data_node.get_shape();
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
broadcasted_shape[1] = data_shape[1];
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
}
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
} else if (std::is_same<T, ngraph::opset1::Multiply>()) {
auto bias = ngraph::opset1::Constant::create(weights_et, weights_shape, {0});
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(const_node, output_shape),
ngraph::op::util::normalize_constant(bias, output_shape));
auto weights_in = ngraph::op::util::normalize_constant(const_node, output_shape);
auto biases_in = ngraph::op::util::normalize_constant(bias, output_shape);
if (is_dequantization) {
const ngraph::Shape data_shape = data_node.get_shape();
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
broadcasted_shape[1] = data_shape[1];
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
}
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
} else {
return false;
}
scaleshift->set_friendly_name(lin_op->get_friendly_name());
@ -182,9 +276,11 @@ ngraph::graph_rewrite_callback get_callback() {
// In case Add we create fake scale equal to 1, in case of Multiply we create fake shift equal to 0
std::shared_ptr<ngraph::op::PowerIE> power;
if (std::is_same<T, ngraph::opset1::Add>()) {
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value);
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value, lin_op->get_output_element_type(0));
} else if (std::is_same<T, ngraph::opset1::Multiply>()) {
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0.);
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0., lin_op->get_output_element_type(0));
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., -value, lin_op->get_output_element_type(0));
} else {
return false;
}

View File

@ -80,7 +80,8 @@ private:
auto new_fc = std::make_shared<op::FullyConnected>(reshape->input_value(0),
fc->input_value(1),
fc->input_value(2),
fc->get_shape());
fc->get_shape(),
fc->output(0).get_element_type());
new_fc->set_friendly_name(fc->get_friendly_name());
ngraph::copy_runtime_info({reshape, fc}, new_fc);

View File

@ -1637,6 +1637,9 @@ CNNLayer::Ptr NodeConverter<ngraph::op::Eltwise>::createLayer(const std::shared_
case ELTWISE_TYPE::Sum:
type = "sum";
break;
case ELTWISE_TYPE::Sub:
type = "sub";
break;
case ELTWISE_TYPE::Prod:
type = "prod";
break;

View File

@ -15,8 +15,8 @@ using namespace ngraph;
constexpr NodeTypeInfo op::Eltwise::type_info;
op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type)
: Op({data1, data2}), eltwise_type(eltwise_type) {
op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type, const element::Type output_type)
: Op({data1, data2}), eltwise_type(eltwise_type), m_output_type(output_type) {
constructor_validate_and_infer_types();
}
@ -25,7 +25,7 @@ std::shared_ptr<Node> op::Eltwise::clone_with_new_inputs(const OutputVector& new
throw ngraph_error("Incorrect number of new arguments");
}
return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type);
return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type, m_output_type);
}
void op::Eltwise::validate_and_infer_types() {
@ -34,8 +34,12 @@ void op::Eltwise::validate_and_infer_types() {
element::Type data2_et = get_input_element_type(1);
element::Type et_result;
NODE_VALIDATION_CHECK(this, element::Type::merge(et_result, data1_et, data2_et),
"Element types for first and second do not match :", data1_et, " and ", data2_et);
if (m_output_type == element::undefined) {
NODE_VALIDATION_CHECK(this, element::Type::merge(et_result, data1_et, data2_et),
"Element types for first and second do not match :", data1_et, " and ", data2_et);
} else {
et_result = m_output_type;
}
if (get_input_partial_shape(0).rank().is_dynamic() ||
get_input_partial_shape(1).rank().is_dynamic()) {

View File

@ -12,8 +12,13 @@ using namespace ngraph;
constexpr NodeTypeInfo op::FullyConnected::type_info;
op::FullyConnected::FullyConnected(const Output<Node>& A, const Output<Node>& B, const Output<Node>& C, const Shape & output_shape)
: Op({A, B, C}), m_output_shape(output_shape) {
op::FullyConnected::FullyConnected(
const Output<Node>& A,
const Output<Node>& B,
const Output<Node>& C,
const Shape & output_shape,
const element::Type output_type)
: Op({A, B, C}), m_output_shape(output_shape), m_output_type(output_type) {
constructor_validate_and_infer_types();
}
@ -26,5 +31,8 @@ void op::FullyConnected::validate_and_infer_types() {
if (m_output_shape.size() < 2)
throw ngraph_error("FullyConnected shape is incorrect");
m_output_size = m_output_shape.back();
set_output_type(0, input_value(0).get_element_type(), m_output_shape);
set_output_type(
0,
m_output_type == element::undefined ? input_value(0).get_element_type() : m_output_type,
m_output_shape);
}

View File

@ -15,15 +15,14 @@ using namespace ngraph;
constexpr NodeTypeInfo op::NormalizeIE::type_info;
op::NormalizeIE::NormalizeIE(const Output<Node>& data, const Output<Node>& weights, float eps, bool across_spatial,
bool channel_shared)
: Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared) {
bool channel_shared, const ngraph::element::Type output_type)
: Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared), m_output_type(output_type) {
constructor_validate_and_infer_types();
}
void op::NormalizeIE::validate_and_infer_types() {
element::Type arg_type = get_input_element_type(0);
PartialShape arg_shape = get_input_partial_shape(0);
set_output_type(0, arg_type, arg_shape);
set_output_type(0, m_output_type, arg_shape);
const PartialShape& input_shape = get_input_partial_shape(0);
@ -34,5 +33,5 @@ void op::NormalizeIE::validate_and_infer_types() {
shared_ptr<Node> op::NormalizeIE::clone_with_new_inputs(const OutputVector& new_args) const {
check_new_args_count(this, new_args);
return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared);
return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared, m_output_type);
}

View File

@ -14,8 +14,8 @@ using namespace ngraph;
constexpr NodeTypeInfo op::PowerIE::type_info;
op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift)
: Op({data_batch}), scale(scale), power(power), shift(shift) {
op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift, const element::Type output_type)
: Op({data_batch}), scale(scale), power(power), shift(shift), m_output_type(output_type) {
constructor_validate_and_infer_types();
}
@ -24,9 +24,9 @@ std::shared_ptr<Node> op::PowerIE::clone_with_new_inputs(const OutputVector& new
throw ngraph_error("Incorrect number of new arguments");
}
return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift);
return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift, this->m_output_type);
}
void op::PowerIE::validate_and_infer_types() {
set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
set_output_type(0, m_output_type == element::undefined ? get_input_element_type(0) : m_output_type, get_input_partial_shape(0));
}

View File

@ -15,16 +15,19 @@ using namespace ngraph;
constexpr NodeTypeInfo op::ReLUIE::type_info;
op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope)
: Op(OutputVector {data}), m_negative_slope(negative_slope) {
op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope, const element::Type output_type)
: Op(OutputVector {data}), m_negative_slope(negative_slope), m_output_type(output_type) {
constructor_validate_and_infer_types();
}
std::shared_ptr<Node> op::ReLUIE::clone_with_new_inputs(const OutputVector& new_args) const {
check_new_args_count(this, new_args);
return make_shared<ReLUIE>(new_args.at(0), m_negative_slope);
return make_shared<ReLUIE>(new_args.at(0), m_negative_slope, m_output_type);
}
void op::ReLUIE::validate_and_infer_types() {
set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
set_output_type(
0,
m_output_type == element::undefined ? get_input_element_type(0) : m_output_type,
get_input_partial_shape(0));
}

View File

@ -14,8 +14,25 @@ using namespace ngraph;
constexpr NodeTypeInfo op::ScaleShiftIE::type_info;
op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias)
: Op({data_batch, weights, bias}) {
element::Type getMaxBitwidth(const std::vector<element::Type>& types) {
if (types.empty()) {
return element::undefined;
}
element::Type maxType = types[0];
for (size_t i = 1; i < types.size(); ++i) {
if (types[i].bitwidth() > maxType.bitwidth()) {
maxType = types[i];
}
}
return maxType;
}
op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias, const element::Type output_type)
: Op({data_batch, weights, bias}), output_type(output_type) {
if (this->output_type == element::undefined) {
this->output_type = getMaxBitwidth({ data_batch.get_element_type(), weights.get_element_type(), bias.get_element_type() });
}
constructor_validate_and_infer_types();
}
@ -24,12 +41,12 @@ std::shared_ptr<Node> op::ScaleShiftIE::clone_with_new_inputs(const OutputVector
throw ngraph_error("Incorrect number of new arguments");
}
return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2));
return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2), output_type);
}
void op::ScaleShiftIE::validate_and_infer_types() {
// Check that weights and biases has the same type
element::Type data_et = get_input_element_type(0);
element::Type data_et = output_type == element::undefined ? get_input_element_type(0) : output_type;
element::Type weights_et = get_input_element_type(1);
element::Type biases_et = get_input_element_type(2);

View File

@ -143,9 +143,9 @@ ngraph::pass::ConvertMatMulToFC::ConvertMatMulToFC() {
// Create FullyConnected
std::vector<float> bias_value(O, 0);
auto fc_bias = opset1::Constant::create(matmul->get_input_element_type(0), Shape {O}, bias_value);
auto fc_bias = opset1::Constant::create(matmul->get_output_element_type(0), Shape {O}, bias_value);
auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape);
auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape, matmul->output(0).get_element_type());
fc->set_friendly_name(matmul->get_friendly_name());
new_ops.push_back(fc);
@ -207,7 +207,7 @@ ngraph::pass::ConvertMatMulToGemm::ConvertMatMulToGemm() {
new_ops.push_back(fc_input_b.get_node_shared_ptr());
}
auto gemm = std::make_shared<opset1::MatMul>(fc_input_a, fc_input_b, matmul->get_transpose_a(), matmul->get_transpose_b());
auto gemm = matmul->copy_with_new_inputs({ fc_input_a, fc_input_b });
new_ops.push_back(gemm);
if (gemm->get_shape() != output_shape) {

View File

@ -87,6 +87,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
const_bias_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(add_input_0);
}
if (const_bias_node->output(0).get_element_type() != add_node->output(0).get_element_type()) {
return false;
}
auto mul_input_0 = mul_node->input(0).get_source_output().get_node_shared_ptr();
auto mul_input_1 = mul_node->input(1).get_source_output().get_node_shared_ptr();
@ -97,6 +101,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
const_weights_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(mul_input_0);
}
if (const_weights_node->output(0).get_element_type() != mul_node->output(0).get_element_type()) {
return false;
}
if (add_node->get_output_partial_shape(0).rank().is_dynamic() ||
mul_node->get_output_partial_shape(0).rank().is_dynamic()) {
return false;
@ -137,13 +145,16 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
const auto output_shape = add_node->get_output_partial_shape(0);
const auto output_shape_rank = output_shape.rank().get_length();
bool is_dequantization =
(add_node->get_rt_info().count("DEQUANTIZATION") != 0 || mul_node->get_rt_info().count("DEQUANTIZATION") != 0);
if (res1 == CONVERSION_RESULT::NONE || res2 == CONVERSION_RESULT::NONE ||
((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && output_shape_rank < 4)) {
((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && !is_dequantization && output_shape_rank < 4)) {
return false;
}
// TODO: in case if scale and shift constants has equal values the best way is to convert them to Power
if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) {
if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
NodeVector new_ops;
auto weights_in = ngraph::op::util::normalize_constant(const_weights_node, output_shape);
@ -151,16 +162,29 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
new_ops.push_back(weights_in);
new_ops.push_back(biases_in);
if (res1 == CONVERSION_RESULT::POWER) {
if (is_dequantization) {
const Shape data_shape = data_node.get_shape();
Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
broadcasted_shape[1] = data_shape[1];
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
new_ops.push_back(weights_in);
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
new_ops.push_back(biases_in);
}
if (res1 == CONVERSION_RESULT::POWER && !is_dequantization) {
weights_in = ngraph::op::util::broadcastTo(weights_in, biases_in->get_shape());
new_ops.push_back(weights_in);
}
if (res2 == CONVERSION_RESULT::POWER) {
if (res2 == CONVERSION_RESULT::POWER && !is_dequantization) {
biases_in = ngraph::op::util::broadcastTo(biases_in, weights_in->get_shape());
new_ops.push_back(biases_in);
}
auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
auto output_type = m.get_match_root()->get_output_element_type(0);
auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in, output_type);
new_ops.push_back(scaleshift);
scaleshift->set_friendly_name(add_node->get_friendly_name());
@ -175,7 +199,8 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
return false;
}
auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift);
auto output_type = m.get_match_root()->get_output_element_type(0);
auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift, output_type);
power->set_friendly_name(add_node->get_friendly_name());
ngraph::copy_runtime_info({mul_node, add_node}, power);
ngraph::replace_node(m.get_match_root(), power);

View File

@ -62,7 +62,8 @@ ngraph::pass::ConvertNormalizeL2WithMulToNormalizeIE::ConvertNormalizeL2WithMulT
constant->output(0),
normalize->get_eps(),
across_spatial,
channel_shared);
channel_shared,
normalize->get_element_type());
normalize_ie->set_friendly_name(mul->get_friendly_name());
ngraph::copy_runtime_info({normalize, mul}, normalize_ie);
@ -93,13 +94,14 @@ ngraph::pass::ConvertNormalizeL2ToLegacyMatcher::ConvertNormalizeL2ToLegacyMatch
bool across_channels = !(axis.size() == 1 && axis[0] == 1);
bool channel_shared = true;
auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->get_input_element_type(0), Shape{1}, std::vector<float>{1.0});
auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->output(0).get_element_type(), Shape{1}, std::vector<float>{1.0});
auto normalize_ie = std::make_shared<ngraph::op::NormalizeIE> (normalize->input(0).get_source_output(),
scale->output(0),
normalize->get_eps(),
across_channels,
channel_shared);
channel_shared,
normalize->get_element_type());
normalize_ie->set_friendly_name(normalize->get_friendly_name());
ngraph::copy_runtime_info(normalize, normalize_ie);

View File

@ -33,7 +33,7 @@ ngraph::pass::ConvertPowerToPowerIEMatcher::ConvertPowerToPowerIEMatcher() {
return false;
}
auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0);
auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0, power->output(0).get_element_type());
power_ie->set_friendly_name(power->get_friendly_name());
ngraph::copy_runtime_info(power, power_ie);
ngraph::replace_node(power, power_ie);
@ -44,4 +44,4 @@ ngraph::pass::ConvertPowerToPowerIEMatcher::ConvertPowerToPowerIEMatcher() {
auto m = std::make_shared<ngraph::pattern::Matcher>(power, "ConvertPowerToPowerIE");
this->register_matcher(m, callback);
}
}

View File

@ -33,7 +33,7 @@ ngraph::pass::ConvertPReLUToReLUIE::ConvertPReLUToReLUIE() {
return false;
}
auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value);
auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value, prelu->output(0).get_element_type());
relu_ie->set_friendly_name(prelu->get_friendly_name());
ngraph::copy_runtime_info(prelu, relu_ie);
ngraph::replace_node(prelu, relu_ie);
@ -44,4 +44,4 @@ ngraph::pass::ConvertPReLUToReLUIE::ConvertPReLUToReLUIE() {
auto m = std::make_shared<ngraph::pattern::Matcher>(prelu, "ConvertPReLUToReLUIE");
this->register_matcher(m, callback);
}
}

View File

@ -25,7 +25,7 @@ ngraph::pass::ConvertSqrtToPowerIEMatcher::ConvertSqrtToPowerIEMatcher() {
if (!sqrt) {
return false;
}
auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0);
auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0, sqrt->output(0).get_element_type());
power_ie->set_friendly_name(sqrt->get_friendly_name());
ngraph::copy_runtime_info(sqrt, power_ie);
ngraph::replace_node(sqrt, power_ie);

View File

@ -65,7 +65,8 @@ ngraph::pass::FullyConnectedBiasFusion::FullyConnectedBiasFusion() {
auto new_fc = std::make_shared<op::FullyConnected>(m_fc->input(0).get_source_output(),
m_fc->input(1).get_source_output(),
final_bias,
m_fc->get_shape());
m_fc->get_shape(),
m_fc->get_output_type());
new_ops.push_back(new_fc);
new_fc->set_friendly_name(add->get_friendly_name());

View File

@ -44,6 +44,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
new_dilations,
new_pads_begin,
new_pad_end,
node->get_output_element_type(0),
node->get_group(),
node->get_auto_pad());
} else {
@ -54,6 +55,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
new_dilations,
new_pads_begin,
new_pad_end,
node->get_output_element_type(0),
node->get_group(),
node->get_auto_pad());
}

View File

@ -52,7 +52,8 @@ ngraph::pass::ReshapeFullyConnected::ReshapeFullyConnected() {
auto fc_new = std::make_shared<op::FullyConnected>(reshape,
fc->input_value(1),
fc->input_value(2),
output_shape_new);
output_shape_new,
fc->get_output_type());
new_ops.push_back(fc_new);
if (output_shape != output_shape_new) {
@ -73,4 +74,4 @@ ngraph::pass::ReshapeFullyConnected::ReshapeFullyConnected() {
auto m = std::make_shared<ngraph::pattern::Matcher>(fc, "ReshapeFullyConnected");
this->register_matcher(m, callback);
}
}

View File

@ -51,3 +51,7 @@ install(TARGETS ${TARGET_NAME}
RUNTIME DESTINATION ${IE_CPACK_RUNTIME_PATH} COMPONENT core
ARCHIVE DESTINATION ${IE_CPACK_ARCHIVE_PATH} COMPONENT core
LIBRARY DESTINATION ${IE_CPACK_LIBRARY_PATH} COMPONENT core)
if (USE_CNNNETWORK_LPT)
target_compile_definitions(${TARGET_NAME} PUBLIC USE_CNNNETWORK_LPT)
endif()

View File

@ -103,16 +103,5 @@ void ActivationTransformation::transform(TransformationContext& context, CNNLaye
CNNNetworkHelper::removeLayer(context.network, scaleShift);
context.removeLayer(*scaleShift);
const std::vector<CNNLayerPtr> children = CNNNetworkHelper::getChildren(*activationLayer);
for (const CNNLayerPtr& child : children) {
const std::vector<CNNLayerPtr> dequantizationLayers = CNNNetworkHelper::addScaleShiftBetween(
context,
activationLayer,
child,
DequantizationDetails(scales, shifts));
for (const auto& dequantizationLayer : dequantizationLayers) {
context.dequantizationLayersNames.insert(dequantizationLayer->name);
}
}
addDequantizationLayer(context, *activationLayer, scales, shifts);
}

View File

@ -1332,6 +1332,8 @@ void CNNNetworkHelper::addLayerToCNNNetworkAfterData(
THROW_IE_EXCEPTION << "parent data is absent";
}
netImpl->removeOutput(parent->name);
netImpl->addData(parent->name.c_str(), parentOutData);
netImpl->addData(layer->name.c_str(), newEdgeAfterLayer);
netImpl->addOutput(layer->name);
}

View File

@ -329,7 +329,7 @@ void WeightableLayerTransformation::updateToSupportAsymmetricQuantization(
const PrecisionsInfo& weightsPrecisionsInfo,
std::vector<float>& weightsShifts) const {
const CNNLayerPtr parentOnData = CNNNetworkHelper::getParent(layer, 0ul);
if (parentOnData->type == "ScaleShift") {
if (parentOnData->type == "ScaleShift") { // FIXME: it is always true
const std::shared_ptr<float> dataConvertedInBlob = CNNNetworkHelper::convertFloatData(
dataShifts.data(),
dataShifts.size(),

View File

@ -167,9 +167,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
set_ie_threading_interface_for(${TARGET_NAME})
target_compile_definitions(${TARGET_NAME} PUBLIC -DMKLDNN_THR=${MKLDNN_THR})
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
inference_engine_transformations mkldnn)
if (USE_CNNNETWORK_LPT)
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
endif()
# Cross compiled function
# TODO: The same for proposal, proposalONNX, topk
cross_compiled_file(${TARGET_NAME}

View File

@ -16,17 +16,20 @@
#include <legacy/ie_util_internal.hpp>
#include <legacy/graph_tools.hpp>
#include <threading/ie_executor_manager.hpp>
#ifdef USE_CNNNETWORK_LPT
#include "low_precision_transformations/convolution.hpp"
#include "low_precision_transformations/eltwise.hpp"
#include "low_precision_transformations/fully_connected.hpp"
#include "low_precision_transformations/scaleshift_to_convolution.hpp"
#include "low_precision_transformations/transformer.hpp"
#endif
#include <threading/ie_cpu_streams_executor.hpp>
#include <ie_system_conf.h>
#include <threading/ie_thread_affinity.hpp>
#include <algorithm>
#include <unordered_set>
#include <utility>
#include <cstring>
using namespace MKLDNNPlugin;
using namespace InferenceEngine;
@ -51,6 +54,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
// we are cloning network if we have statistics and we can transform network.
_clonedNetwork = cloneNet(network);
#ifdef USE_CNNNETWORK_LPT
if (_cfg.lpTransformsMode == Config::LPTransformsMode::On) {
auto params = LayerTransformation::Params(true, // updatePrecisions
true, // quantizeOutputs
@ -94,6 +98,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
bf16Transformer.convertToFloat(cnnetwork);
}
}
#endif
MKLDNNGraph::ApplyUnrollPasses(static_cast<ICNNNetwork&>(*_clonedNetwork));

View File

@ -32,7 +32,6 @@
#include "precision_utils.h"
#include <ie_plugin_config.hpp>
#include "low_precision_transformations/transformer.hpp"
#include "utils/blob_dump.h"

View File

@ -256,6 +256,10 @@ void MKLDNNGraphOptimizer::FuseConvolutionAndZeroPoints(MKLDNNGraph &graph) {
if (arg0->getCnnLayer()->outData[0]->getPrecision() != Precision::U8)
return false;
if (parent0->getParentEdgesAtPort(1)[0]->getDims().size() < 2) {
return false;
}
if (parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != 1 &&
parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != IC)
return false;
@ -495,6 +499,9 @@ void MKLDNNGraphOptimizer::MergeTwoEqualScaleShifts(MKLDNNGraph& graph) {
};
auto isEqualScaleShiftNodes = [](MKLDNNNodePtr node1, MKLDNNNodePtr node2) {
if (node1->getParentEdgeAt(0) != node2->getParentEdgeAt(0))
return false;
auto *depthwiseNode1 = dynamic_cast<MKLDNNDepthwiseNode *>(node1.get());
auto *depthwiseNode2 = dynamic_cast<MKLDNNDepthwiseNode *>(node2.get());

View File

@ -53,6 +53,12 @@
#include <ngraph/op/util/op_types.hpp>
#include <ngraph/pass/manager.hpp>
#include <transformations/common_optimizations/lin_op_sequence_fusion.hpp>
#include <transformations/low_precision/transformer.hpp>
#include <transformations/low_precision/convolution.hpp>
#include <transformations/low_precision/group_convolution.hpp>
#include <transformations/low_precision/multiply_to_group_convolution.hpp>
#if !defined(__arm__) && !defined(_M_ARM) && !defined(__aarch64__) && !defined(_M_ARM64)
#if defined(_WIN32) || defined(WIN32)
#include <intrin.h>
@ -76,7 +82,7 @@ Engine::~Engine() {
ExecutorManager::getInstance()->clear("CPUCallbackExecutor");
}
static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf) {
OV_ITT_SCOPED_TASK(MKLDNNPlugin::itt::domains::MKLDNNPlugin, "Transformation");
auto nGraphFunc = clonedNetwork->getFunction();
@ -104,9 +110,6 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
manager.register_pass<ngraph::pass::ConvertPrecision>(precision.first, precision.second);
}
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
auto pass_config = manager.get_pass_config();
using const_node_ptr = const std::shared_ptr<const ngraph::Node>;
@ -144,6 +147,47 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
manager.run_passes(nGraphFunc);
#ifndef USE_CNNNETWORK_LPT
using namespace ngraph::pass::low_precision;
if (conf.lpTransformsMode == Config::LPTransformsMode::On) {
auto params = LayerTransformation::Params(
true, // updatePrecisions
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
true); // supportAsymmetricQuantization
LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
.add<ConvolutionTransformation, ngraph::opset1::Convolution>(
LayerTransformation::Params(params).setPrecisionsOnActivations({ngraph::element::u8}).setSupportAsymmetricQuantization(true))
.add<GroupConvolutionTransformation, ngraph::opset1::GroupConvolution>(
LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 }).setSupportAsymmetricQuantization(true))
.addStandaloneCleanup<MultiplyToGroupConvolutionTransformation, ngraph::opset1::Multiply>(
LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 })));
transformer.transform(nGraphFunc);
}
#endif
ngraph::pass::Manager legacyManager;
legacyManager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
legacyManager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
auto legacyPassConfig = manager.get_pass_config();
legacyPassConfig->set_callback<ngraph::pass::AddMultiplyFusion>([](const_node_ptr &node) -> bool {
if (auto mul_op = std::dynamic_pointer_cast<const ngraph::opset1::Multiply>(node)) {
auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(mul_op->get_input_node_shared_ptr(0));
auto constant = std::dynamic_pointer_cast<const ngraph::opset1::Constant>(mul_op->get_input_node_shared_ptr(1));
bool is_dequantization = mul_op->get_rt_info().count("DEQUANTIZATION") != 0;
if (add_op && constant && is_dequantization) {
return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
}
}
return false;
});
legacyManager.run_passes(nGraphFunc);
clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
// WA: after conversion to CNNNetwork user precision can redefine input/output precisions
@ -187,7 +231,7 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network, const st
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
bool is_transformed = false;
if (clonedNetwork->getFunction()) {
Transformation(clonedNetwork);
Transformation(clonedNetwork, conf);
is_transformed = true;
}
auto implNetwork = std::dynamic_pointer_cast<details::CNNNetworkImpl>(clonedNetwork);
@ -312,8 +356,17 @@ QueryNetworkResult Engine::QueryNetwork(const ICNNNetwork& network, const std::m
for (auto&& node : function->get_ops()) {
originalOps.emplace(node->get_friendly_name());
}
// TODO: Clarify the behavior of SetConfig method. Skip eng_config or not?
Config conf = engConfig;
conf.readProperties(config);
if (conf.enableDynamicBatch) {
conf.batchLimit = static_cast<int>(network.getBatchSize());
}
auto clonedNetwork = cloneNetwork(network);
Transformation(clonedNetwork);
Transformation(clonedNetwork, conf);
std::unordered_set<std::string> supported;
std::unordered_set<std::string> unsupported;
for (details::CNNNetworkIterator itLayer{clonedNetwork.get()}; itLayer != details::CNNNetworkIterator(); itLayer++) {

View File

@ -112,7 +112,10 @@ public:
exec_cast<PrecisionTrait<Precision::U8>::value_type, PrecisionTrait<Precision::I32>::value_type>(inputs[0], outputs[0]);
break;
default:
std::string errorMsg = "Unsupported precisions!";
std::stringstream ss;
ss << "Unsupported precisions: " << inputs[0]->getTensorDesc().getPrecision() << " -> " << outputs[0]->getTensorDesc().getPrecision();
std::string errorMsg = ss.str();
if (resp) {
errorMsg.copy(resp->msg, sizeof(resp->msg)-1);
}

View File

@ -158,7 +158,7 @@ void MKLDNNGenericNode::execLayer() {
InferenceEngine::ResponseDesc resp;
InferenceEngine::StatusCode rc = impls[0]->execute(inputs, outputs, &resp);
if (rc != InferenceEngine::OK) {
THROW_IE_EXCEPTION << resp.msg;
THROW_IE_EXCEPTION << this->getTypeStr() << ":" << this->getName() << ": " << resp.msg;
}
}

View File

@ -47,6 +47,7 @@ public:
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const element::Type output_type,
const size_t& group = 1,
const PadType& auto_pad = PadType::EXPLICIT);
@ -57,9 +58,32 @@ public:
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const element::Type output_type,
const size_t& group = 1,
const PadType& auto_pad = PadType::EXPLICIT);
// KMB compilation support
ConvolutionIE(const Output<Node>& data_batch,
const Output<Node>& filters,
const Strides& strides,
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const size_t& group = 1,
const PadType& auto_pad = PadType::EXPLICIT);
// KMB compilation support
ConvolutionIE(const Output<Node>& data_batch,
const Output<Node>& filters,
const Output<Node>& bias,
const Strides& strides,
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const size_t& group = 1,
const PadType& auto_pad = PadType::EXPLICIT);
void validate_and_infer_types() override;
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector & new_args) const override;
@ -90,6 +114,7 @@ protected:
CoordinateDiff m_pads_end;
PadType m_auto_pad;
size_t m_group;
element::Type m_output_type;
};
} // namespace op

View File

@ -12,6 +12,7 @@
#include <transformations_visibility.hpp>
#include "ngraph/op/op.hpp"
#include "transformations/low_precision/common/dequantization_op.hpp"
namespace ngraph {
namespace op {
@ -190,6 +191,7 @@ void TypeRelaxed<BaseOp>::validate_and_infer_types() {
BaseOp::get_input_tensor(i).set_tensor_type(old_input_types[i], BaseOp::get_input_partial_shape(i));
}
// Override (some) output types
for (size_t i = 0; i < BaseOp::get_output_size(); ++i) {
auto overridden_output_type = get_overridden_output_type(i);

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/eltwise_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API AddTransformation : public EltwiseBaseTransformation {
public:
AddTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
~AddTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <algorithm>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API AvgPoolTransformation : public LayerTransformation {
public:
AvgPoolTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,26 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ClampTransformation : public LayerTransformation {
public:
ClampTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,138 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include <ngraph/ngraph.hpp>
#include <ngraph/check.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "transformations_visibility.hpp"
#include "transformations/rt_info/dequantization_attribute.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
// template<typename BaseOp2>
// class TRANSFORMATIONS_API DequantizationOp : public BaseOp2 {
// public:
// template <typename ... Args>
// DequantizationOp(Args&&... args) : BaseOp2(std::forward<Args>(args)...) {
// init();
// }
//
// std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
// std::shared_ptr<Node> cloned = BaseOp2::clone_with_new_inputs(inputs);
// auto& rtInfo = cloned->get_rt_info();
// rtInfo = get_rt_info();
//
// return cloned;
// }
//
// protected:
// void init() {
// auto& rtInfo = get_rt_info();
// rtInfo["DEQUANTIZATION"] = std::make_shared<ngraph::VariantWrapper<std::string>>("");
// }
// };
//
// using DequantizationConvert = DequantizationOp<ngraph::opset1::Convert>;
// using DequantizationSubtract = DequantizationOp<ngraph::opset1::Subtract>;
// using DequantizationMultiply = DequantizationOp<ngraph::opset1::Multiply>;
namespace {
void initRuntimeInfo(ngraph::Node& operation) {
auto& rtInfo = operation.get_rt_info();
rtInfo["DEQUANTIZATION"] = std::make_shared<VariantWrapper<DequantizationAttr>>(DequantizationAttr());
}
// #include <ngraph/rt_info.hpp>
// ngraph::copy_runtime_info(from, to);
void copyRuntimeInfo(const ngraph::Node& from, ngraph::Node& to) {
const auto& rtInfoFrom = from.get_rt_info();
auto& rtInfoTo = to.get_rt_info();
rtInfoTo = rtInfoFrom;
}
} // namespace
class TRANSFORMATIONS_API DequantizationConvert : public ngraph::opset1::Convert {
public:
DequantizationConvert(const ngraph::Output<Node>& arg, const ngraph::element::Type& destination_type) :
ngraph::opset1::Convert(arg, destination_type) {
initRuntimeInfo(*this);
}
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
std::shared_ptr<Node> cloned = ngraph::opset1::Convert::clone_with_new_inputs(inputs);
copyRuntimeInfo(*this, *cloned);
return cloned;
}
};
class TRANSFORMATIONS_API DequantizationSubtract : public ngraph::opset1::Subtract {
public:
DequantizationSubtract(
const ngraph::Output<Node>& arg0,
const ngraph::Output<Node>& arg1,
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
ngraph::opset1::Subtract(arg0, arg1, auto_broadcast) {
initRuntimeInfo(*this);
}
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
std::shared_ptr<Node> cloned = ngraph::opset1::Subtract::clone_with_new_inputs(inputs);
copyRuntimeInfo(*this, *cloned);
return cloned;
}
};
class TRANSFORMATIONS_API DequantizationMultiply : public ngraph::opset1::Multiply {
public:
DequantizationMultiply(
const Output<Node>& arg0,
const Output<Node>& arg1,
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
ngraph::opset1::Multiply(arg0, arg1, auto_broadcast) {
initRuntimeInfo(*this);
}
DequantizationMultiply(const ngraph::opset1::Multiply& multiply) :
ngraph::opset1::Multiply(multiply) {
initRuntimeInfo(*this);
}
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
std::shared_ptr<Node> cloned = ngraph::opset1::Multiply::clone_with_new_inputs(inputs);
copyRuntimeInfo(*this, *cloned);
return cloned;
}
};
class TRANSFORMATIONS_API DequantizationAdd : public ngraph::opset1::Add {
public:
DequantizationAdd(
const ngraph::Output<Node>& arg0,
const ngraph::Output<Node>& arg1,
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
ngraph::opset1::Add(arg0, arg1, auto_broadcast) {
initRuntimeInfo(*this);
}
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
std::shared_ptr<Node> cloned = ngraph::opset1::Add::clone_with_new_inputs(inputs);
copyRuntimeInfo(*this, *cloned);
return cloned;
}
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,41 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <tuple>
#include <ngraph/ngraph.hpp>
#include <ngraph/opsets/opset1.hpp>
namespace ngraph {
namespace pass {
namespace low_precision {
typedef std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> FakeQuantizeDequantizationValues;
class FakeQuantizeDequantization {
public:
FakeQuantizeDequantization();
FakeQuantizeDequantization(
Output<Node> data,
std::shared_ptr<ngraph::opset1::Convert> convert,
std::shared_ptr<ngraph::opset1::Subtract> subtract,
std::shared_ptr<ngraph::opset1::Multiply> multiply);
bool empty() const;
bool isShared() const;
bool isLowPrecision() const;
static bool checkElementwise(const std::shared_ptr<ngraph::Node>& elementwise);
Output<Node> data;
std::shared_ptr<opset1::Convert> convert;
std::shared_ptr<opset1::Subtract> subtract;
std::shared_ptr<opset1::Multiply> multiply;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,52 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <exception>
#include <string>
#include <ngraph/node.hpp>
#include <transformations_visibility.hpp>
/**
* @def THROW_TRANSFORMATION_EXCEPTION_LPT
* @brief A macro used to throw the exception with a notable description for low precision transformations
*/
#define THROW_IE_LPT_EXCEPTION(node) throw ::ngraph::pass::low_precision::InferenceEngineLptException(__FILE__, __LINE__, node)
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API InferenceEngineException : std::exception {
std::shared_ptr<std::ostringstream> buffer;
mutable std::string buffer_str;
public:
template <typename T>
InferenceEngineException& operator<< (const T& x) {
*buffer << x;
return *this;
}
const char* what() const noexcept override {
buffer_str = buffer->str();
return buffer_str.c_str();
}
};
#define THROW_TRANSFORMATION_EXCEPTION throw ::ngraph::pass::low_precision::InferenceEngineException() << __FILE__ << ":" << __LINE__ << " "
class TRANSFORMATIONS_API InferenceEngineLptException : public InferenceEngineException {
public:
InferenceEngineLptException(const std::string& filename, const size_t line, const Node& node) {
*this
<< filename << ":" << line << " Exception during low precision transformation for "
<< node << " node with type '" << node.get_type_name() << "', name '" << node.get_friendly_name() << "'. ";
}
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,41 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include <ngraph/ngraph.hpp>
#include <ngraph/check.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "../ilayer_transformations_manager.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class Subgraph {
public:
Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager);
bool fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers);
bool empty() const;
std::vector<std::shared_ptr<ngraph::Node>> quantizationLayers;
std::vector<std::shared_ptr<ngraph::opset1::Concat>> concatLayers;
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> layers;
private:
bool fillSubgraphForQuantization(const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize, std::unordered_set<std::string>& handledLayers);
bool fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers);
bool fill(const std::shared_ptr<ngraph::Node>& concat, std::unordered_set<std::string>& handledLayers);
const ngraph::pass::ILayerTransformationsManager* layerTransformationsManager;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,56 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <algorithm>
#include <functional>
#include <memory>
#include <string>
#include <vector>
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
#include "common/subgraph.hpp"
#include "common/fake_quantize_dequantization.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ConcatTransformation : public LayerTransformation {
public:
ConcatTransformation(const Params& params) : LayerTransformation(params) {}
~ConcatTransformation() override {};
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
protected:
void addDequantizationLayers(
TransformationContext& context,
ngraph::pass::low_precision::Subgraph& subgraph,
std::function<void(
std::shared_ptr<ngraph::Node> layer,
const std::string originalLayerName,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const;
static bool isHandled(
const TransformationContext& context,
const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations);
private:
size_t getMinQuantizationLevels(
const DataPrecision& dataPrecision,
const float maxOutputInterval,
const std::vector<QuantizationDetails>& quantizationLayersDetails,
const float outputLowValue,
const float outputHighValue) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,47 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include <ngraph/ngraph.hpp>
#include "concat.hpp"
#include "common/subgraph.hpp"
#include "common/fake_quantize_dequantization.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ConcatMultiChannelsTransformation : public ConcatTransformation {
public:
ConcatMultiChannelsTransformation(const Params& params) : ConcatTransformation(params) {}
~ConcatMultiChannelsTransformation() override {};
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
private:
static void fillDequantization(
std::shared_ptr<ngraph::Node> layer,
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate);
static void fillQuantization(const std::shared_ptr<ngraph::Node> layer, std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes);
static void updateDequantizationShapesIfNecessary(
std::shared_ptr<ngraph::Node> layer,
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize);
bool isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ConvertTransformation : public LayerTransformation {
public:
ConvertTransformation(const Params& params) : LayerTransformation(params) {}
~ConvertTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "weightable_layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ConvolutionTransformation : public WeightableLayerTransformation {
public:
ConvolutionTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "transparent_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API DepthToSpaceTransformation : public TransparentBaseTransformation {
public:
DepthToSpaceTransformation(const Params& params) : TransparentBaseTransformation(params) {}
~DepthToSpaceTransformation() override {}
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,29 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API EltwiseBaseTransformation : public LayerTransformation {
public:
EltwiseBaseTransformation(const Params& params) : LayerTransformation(params) {}
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
static bool isBroadcasted(const Shape& shape) noexcept;
protected:
int getNotEmpty(const std::shared_ptr<Node>& eltwise) const;
std::pair<int, int> getMultiplyConstBranch(const std::shared_ptr<Node>& eltwise) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,33 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
#include "transformations/low_precision/fuse_fake_quantize.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API FakeQuantizeTransformation : public LayerTransformation {
public:
FakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
~FakeQuantizeTransformation() override {};
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
static bool checkElementwise(const std::shared_ptr<Node>& eltwise);
private:
std::shared_ptr<opset1::FakeQuantize> fuseElementwise(
TransformationContext& context,
const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
#include "transformations/low_precision/eltwise_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API FuseConvertTransformation : public LayerTransformation {
public:
FuseConvertTransformation(const Params& params) : LayerTransformation(params) {}
~FuseConvertTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,31 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API FuseFakeQuantizeTransformation : public LayerTransformation {
public:
FuseFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
~FuseFakeQuantizeTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
private:
std::shared_ptr<opset1::FakeQuantize> handle(
TransformationContext& context,
const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API FuseMultiplyToFakeQuantizeTransformation : public LayerTransformation {
public:
FuseMultiplyToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
~FuseMultiplyToFakeQuantizeTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API FuseSubtractToFakeQuantizeTransformation : public LayerTransformation {
public:
FuseSubtractToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
~FuseSubtractToFakeQuantizeTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "convolution.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API GroupConvolutionTransformation : public ConvolutionTransformation {
public:
GroupConvolutionTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/node.hpp>
#include "transformations_visibility.hpp"
namespace ngraph {
namespace pass {
/**
* @brief low precision transformation component interface.
*/
class TRANSFORMATIONS_API ILayerTransformationsManager {
public:
virtual bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept = 0;
virtual bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept = 0;
};
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "transparent_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API InterpolateTransformation : public LayerTransformation {
public:
InterpolateTransformation(const Params& params) : LayerTransformation(params) {}
~InterpolateTransformation() override {}
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <vector>
#include <ngraph/ngraph.hpp>
#include <transformations_visibility.hpp>
namespace ngraph {
namespace pass {
/**
* @brief low precision transformation component interface.
*/
class TRANSFORMATIONS_API IParamsManager {
public:
// TODO FIXME: it is not correct to have a string as a key here, try to use NodeTypeInfo
virtual std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept = 0;
};
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,380 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <algorithm>
#include <limits>
#include <list>
#include <memory>
#include <vector>
#include <ngraph/ngraph.hpp>
#include <ngraph/pass/graph_rewrite.hpp>
#include "iparams_manager.hpp"
#include "ilayer_transformations_manager.hpp"
#include "transformation_context.hpp"
#include "quantization_details.hpp"
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
#include "common/fake_quantize_dequantization.hpp"
/*****************************************************
* Debug capability
* - ORIGINAL_MODEL_PATH : Specify with existing folder name
* to serialize original model into it (XML & BIN extensions were added)
* - TRANSFORMED_MODEL_PATH : Specify with existing folder name
* to serialize original model into it (XML & BIN extensions were added)
* - LPT_PRINT_DEQUANTIZATION_INFO : Define it to enable
* dequantization layers printing
* - LPT_DISPLAY_PRECISION : Define it to to display precision info
* during low precision transformations
*
*****************************************************/
// #define LPT_ORIGINAL_MODEL_PATH "/localdisk/orig.model"
// #define LPT_TRANSFORMED_MODEL_PATH "/localdisk/transformed.model"
// #define LPT_PRINT_DEQUANTIZATION_INFO
// #define LPT_DISPLAY_PRECISION
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API DataPrecision {
public:
DataPrecision() : precision(element::undefined), min(0.f), max(0.f), hasZeroPoint(false) {}
DataPrecision(const element::Type precision, const float min, const float max, const bool hasZeroPoint) :
precision(precision),
min(min),
max(max),
hasZeroPoint(hasZeroPoint) {}
static float getMinValue(const element::Type precision, const size_t levels) {
if (precision == element::i8) {
if (levels == 255) {
return static_cast<float>(std::numeric_limits<signed char>::lowest()) + 1.f;
} else if (levels == 256) {
return static_cast<float>(std::numeric_limits<signed char>::lowest());
} else {
NGRAPH_CHECK(false, "unexpected levels ", levels, " for precision ", precision);
}
} else if (precision == element::u8) {
return static_cast<float>(std::numeric_limits<unsigned char>::lowest());
} else if (precision == element::f16) {
return -1.0e15f;
} else if (precision == element::f32) {
return std::numeric_limits<float>::lowest();
} else {
NGRAPH_CHECK(false, "unexpected precision ", precision);
}
}
static float getMaxValue(const element::Type precision, const size_t levels) {
if ((levels != 255ul) && (levels != 256ul)) {
THROW_TRANSFORMATION_EXCEPTION << "unexpected levels " << levels;
}
if (precision == element::i8) {
return static_cast<float>(std::numeric_limits<signed char>::max());
} else if (precision == element::u8) {
return static_cast<float>(std::numeric_limits<unsigned char>::max()) - (256 - levels);
} else if (precision == element::f16) {
return 1.0e15f;
} else if (precision == element::f32) {
return std::numeric_limits<float>::max();
} else {
THROW_TRANSFORMATION_EXCEPTION << "unexpected precision " << precision;
}
}
static bool hasNegativeValues(const std::vector<float>& values) {
for (const float value : values) {
if (value < 0.0) {
return true;
}
}
return false;
}
element::Type precision;
float min;
float max;
bool hasZeroPoint;
static element::Type getPrecision(const std::vector<float>& outputLowValues, const std::vector<float>& outputHighValues) {
return (hasNegativeValues(outputLowValues) || hasNegativeValues(outputHighValues)) ? element::i8 : element::u8;
}
static element::Type getPrecision(const size_t /* quantizationLevels */, const bool signedInterval) {
return signedInterval ? element::i8 : element::u8;
}
static float getMin(const size_t quantizationLevels, const bool signedInterval) {
if (quantizationLevels == 255) {
return signedInterval ? -127.0 : 0.0;
} else if (quantizationLevels == 256) {
return signedInterval ? -128.0 : 0.0;
} else {
// THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
// FIXME: not completed
return signedInterval ? -128.0 : 0.0;
}
}
static float getMax(const size_t quantizationLevels, const bool signedInterval) {
if ((quantizationLevels == 255) || (quantizationLevels == 256)) {
return signedInterval ? 127.0 : 255.0;
} else {
// THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
// FIXME: not completed
// return quantizationLevels - 1.0;
return signedInterval ? 127.0 : 255.0;
}
}
};
inline bool operator==(const DataPrecision& value1, const DataPrecision& value2) {
return
(value1.precision == value2.precision) &&
(value1.min == value1.min) &&
(value1.max == value1.max);
}
inline bool operator!=(const DataPrecision& value1, const DataPrecision& value2) {
return !(value1 == value2);
}
inline std::ostream &operator << (std::ostream &os, const DataPrecision& value) {
os << value.precision << ", min: " << value.min << ", max: " << value.max;
return os;
}
// Base class for all LP transformations, holds some common data structures
class TRANSFORMATIONS_API LayerTransformation {
public:
enum QuantizedTensorAlignment {
None,
UpdateLevel
};
class Params {
public:
Params(
const bool updatePrecisions = true,
const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations = QuantizedTensorAlignment::UpdateLevel,
const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights = QuantizedTensorAlignment::None,
bool supportAsymmetricQuantization = false,
std::vector<element::Type> precisionsOnActivations = { element::u8, element::i8 },
std::vector<element::Type> precisionsOnWeights = { element::i8 }) :
updatePrecisions(updatePrecisions),
quantizedTensorAlignmentOnActivations(quantizedTensorAlignmentOnActivations),
quantizedTensorAlignmentOnWeights(quantizedTensorAlignmentOnWeights),
supportAsymmetricQuantization(supportAsymmetricQuantization),
precisionsOnActivations(precisionsOnActivations),
precisionsOnWeights(precisionsOnWeights) {
if (precisionsOnActivations.size() == 0ul) {
THROW_TRANSFORMATION_EXCEPTION << "precisions on activations are not specisifed";
}
if (precisionsOnWeights.size() == 0ul) {
THROW_TRANSFORMATION_EXCEPTION << "precisions on weights are not specisifed";
}
}
Params& setUpdatePrecisions(const bool updatePrecisions) {
this->updatePrecisions = updatePrecisions;
return *this;
}
Params& setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations) {
this->quantizedTensorAlignmentOnActivations = quantizedTensorAlignmentOnActivations;
return *this;
}
Params& setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights) {
this->quantizedTensorAlignmentOnWeights = quantizedTensorAlignmentOnWeights;
return *this;
}
Params& setSupportAsymmetricQuantization(const bool supportAsymmetricQuantization) {
this->supportAsymmetricQuantization = supportAsymmetricQuantization;
return *this;
}
Params& setPrecisionsOnActivations(const std::vector<element::Type>& precisionsOnActivations) {
this->precisionsOnActivations = precisionsOnActivations;
return *this;
}
Params& setPrecisionsOnWeights(const std::vector<element::Type>& precisionsOnWeights) {
this->precisionsOnWeights = precisionsOnWeights;
return *this;
}
bool updatePrecisions;
QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
bool supportAsymmetricQuantization;
std::vector<element::Type> precisionsOnActivations;
std::vector<element::Type> precisionsOnWeights;
};
class PrecisionDetails {
public:
PrecisionDetails(const element::Type& precision, const bool hasNegativeOutput, const bool hasZeroPoint) :
precision(precision),
hasNegativeOutput(hasNegativeOutput),
hasZeroPoint(hasZeroPoint) {}
const element::Type precision;
const bool hasNegativeOutput;
const bool hasZeroPoint;
};
LayerTransformation(const Params& params);
virtual ~LayerTransformation() = default;
virtual void registerMatcherIn(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const = 0;
virtual bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const = 0;
void setParamsManager(IParamsManager* paramsManager) noexcept;
void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
void setUpdatePrecisions(const bool updatePrecisions);
void setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
void setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
void setQuantizationIntervalAsymmetryThreshold(const float value);
void setZeroThreshold(const float value);
void setMinQuantizationLevels(const size_t levels);
const std::vector<element::Type>& getPrecisionsOnActivations() const;
const std::vector<element::Type>& getPrecisionsOnWeights() const;
virtual bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const;
bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const size_t parentIndex = 0ul) const;
bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const FakeQuantizeDequantization& dequantization) const;
PrecisionDetails getPrecisionDetails(const QuantizationDetails& quantizationDetails) const;
// return true if operation can be quantized and false otherwise
// for example: if convolution operation weights are not quantized, then isQuantize returns false and true otherwise
// note: dequantization operations on activations are absent during method execution
virtual bool isQuantized(std::shared_ptr<Node> layer) const noexcept;
// return true if operation can be preserved for precision
// note: dequantization operations on activations are absent during method execution
virtual bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept = 0;
DataPrecision getDataPrecision(
std::shared_ptr<Node> layer,
const QuantizationDetails& quantizationDetails,
const bool onWeights) const;
void fillAvailablePrecisions(std::shared_ptr<Node> layer, std::vector<element::Type>& availablePrecisions) const;
std::vector<std::shared_ptr<Node>> getChildrenRecursivelyExceptPrecisionPreserved(const std::shared_ptr<Node>& op) const noexcept;
protected:
#ifdef LPT_PRINT_DEQUANTIZATION_INFO
static void printDequantizationInfo(const std::shared_ptr<Node>& layer);
static void printDequantizationInfo(const DataPrecision& dataPrecision);
static void printDequantizationValues(
const std::vector<float>& dequantizationScales,
const std::vector<float>& dequantizationShifts);
#endif
bool updatePrecisions;
QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
bool supportAsymmetricQuantization;
std::vector<element::Type> precisionsOnActivations;
std::vector<element::Type> precisionsOnWeights;
// absolute value, used to determine quantization interval asymmetry
float quantizationIntervalAsymmetryThreshold;
// absolute value, used to determine zero
float zeroThreshold;
size_t minQuantizationLevels;
static const char originalLayerPostfix[];
IParamsManager* paramsManager;
ILayerTransformationsManager* layerTransformationsManager;
protected:
std::shared_ptr<ngraph::Node> separateInStandaloneBranch(std::shared_ptr<ngraph::Node> node) const;
std::shared_ptr<ngraph::Node> moveDequantizationAfter(
TransformationContext &context,
const std::shared_ptr<ngraph::Node>& operation,
const FakeQuantizeDequantization& dequantization,
const bool updatePrecision,
const bool moveSubtract = true) const;
void fuseConvertIfPossible(const std::shared_ptr<ngraph::Node>& operation) const;
void updateOutput(
TransformationContext &context,
std::shared_ptr<ngraph::Node> lastNode,
std::shared_ptr<ngraph::Node> originalNode) const;
void updateOutput(
TransformationContext& context,
std::shared_ptr<ngraph::Node> lastNode,
std::string originalName) const;
void addPattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context, std::shared_ptr<Node> patternRoot) const;
template <typename Operation>
void addSingleNodePattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const {
using namespace ngraph;
auto is_op_type = [](std::shared_ptr<Node> n) {
return !!as_type_ptr<Operation>(n);
};
auto p_node = std::make_shared<pattern::op::Label>(element::f32, Shape{}, is_op_type);
addPattern(pass, context, p_node);
}
};
inline std::ostream &operator << (std::ostream &os, const LayerTransformation::QuantizedTensorAlignment& value) {
switch (value) {
case LayerTransformation::QuantizedTensorAlignment::None: {
os << "None";
break;
}
case LayerTransformation::QuantizedTensorAlignment::UpdateLevel: {
os << "UpdateLevel";
break;
}
default: {
os << static_cast<int>(value);
break;
}
}
return os;
}
inline std::ostream &operator << (std::ostream &os, const std::vector<element::Type>& values) {
os << "{";
for (size_t i = 0; i < values.size(); ++i) {
const element::Type& value = values[i];
if (i > 0) {
os << value;
} else {
os << ", " << value;
}
}
os << "}";
return os;
}
typedef std::shared_ptr<LayerTransformation> LayerTransformationPtr;
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,36 @@
// Copyright (C) 2018-2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ie_api.h>
#include <ngraph/ngraph.hpp>
#include <ngraph/pass/graph_rewrite.hpp>
#include <transformations/low_precision/ilayer_transformations_manager.hpp>
#include <transformations/low_precision/iparams_manager.hpp>
using namespace std;
namespace ngraph {
namespace pass {
class TRANSFORMATIONS_API LowPrecisionTransformations: public ngraph::pass::GraphRewrite, IParamsManager, ILayerTransformationsManager {
public:
bool run_on_function(std::shared_ptr<ngraph::Function> f) override;
// IParamsManager interface implementation
std::vector<element::Type> getPrecisionsOnActivations(const NodeTypeInfo& layerName) const noexcept override;
// ILayerTransformationsManager interface implementation
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
}// namespace pass
}// namespace ngraph

View File

@ -0,0 +1,26 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API MatMulTransformation : public LayerTransformation {
public:
MatMulTransformation(const Params& params) : LayerTransformation(params) {}
~MatMulTransformation() override {}
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,26 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API MaxPoolTransformation : public LayerTransformation {
public:
MaxPoolTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/eltwise_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API MultiplyTransformation : public EltwiseBaseTransformation {
public:
MultiplyTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
~MultiplyTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,33 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API MultiplyToGroupConvolutionTransformation : public LayerTransformation {
public:
MultiplyToGroupConvolutionTransformation(const Params& params) : LayerTransformation(params), groupSize(1ul) {}
~MultiplyToGroupConvolutionTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
void setGroupSize(const size_t groupSize);
size_t getGroupSize() const;
private:
size_t groupSize;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API MVNTransformation : public LayerTransformation {
public:
MVNTransformation(const Params& params) : LayerTransformation(params) {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,245 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <cmath>
#include <memory>
#include <string>
#include <vector>
#include <unordered_set>
#include <ngraph/ngraph.hpp>
#include <ngraph/pattern/matcher.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "ngraph_ops/type_relaxed.hpp"
#include <ngraph/rt_info.hpp>
#include "transformation_context.hpp"
#include "quantization_details.hpp"
#include "transformations/utils/utils.hpp"
#include "common/fake_quantize_dequantization.hpp"
#include "common/ie_lpt_exception.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
/**
* @brief NetworkHelper class encapsulates manipulations with nGraph function.
*/
class TRANSFORMATIONS_API NetworkHelper {
public:
// Return true if `type` can be castable to at least one of `type`
static bool is_castable_to_one_of(NodeTypeInfo type, const std::unordered_set<NodeTypeInfo>& types);
static std::vector<Input<Node>> consumer_inputs(std::shared_ptr<Node> node);
// Collect and return a vector with all nodes that consumes any of the `node` output
static std::vector<std::shared_ptr<Node>> consumers(std::shared_ptr<Node> node);
static Shape alignShapeForChannelDim(const Shape& shape, Rank rank);
// return true if at least one child uses layer on weights
static bool onWeights(std::shared_ptr<Node> layer);
template <typename OperationType>
static std::shared_ptr<Node> setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> operation, const element::Type& precision);
template <typename OperationType>
static std::shared_ptr<Node> setOutDataPrecision(std::shared_ptr<OperationType> operation, const element::Type& precision);
static size_t getOutputChannelsCount(std::shared_ptr<const Node> layer, bool isOnWeights = false);
static std::vector<std::shared_ptr<Node>> getParentsRecursivelyExceptTypes(
std::shared_ptr<Node> layer,
const std::unordered_set<NodeTypeInfo>& exceptionLayerTypes = {},
const int portIndex = -1);
static size_t getInputChannelsCount(std::shared_ptr<Node> layer);
static size_t getGroupsCount(std::shared_ptr<Node> layer);
// Remove node by connecting its 0th input with 0th output
static void removeLayer(std::shared_ptr<Node> node);
static std::shared_ptr<Node> swapMultiplyAndAdd(std::shared_ptr<opset1::Add> addAfterMultiply, const int multiplyBranch);
static void copyInfo(const std::shared_ptr<Node>& source, const std::shared_ptr<Node>& target);
static void cleanRunTimeInfo(const std::shared_ptr<Node>& layer);
static bool isScalarLike(std::shared_ptr<opset1::Constant> constant);
static bool isZero(std::shared_ptr<opset1::Constant> constant);
static std::shared_ptr<opset1::Constant> toScalar(std::shared_ptr<opset1::Constant> constant);
static std::shared_ptr<Node> getConstantInput(std::shared_ptr<Node> node);
// Optimizes the series of multiplies after a given output port
static std::shared_ptr<ngraph::opset1::Multiply> optimizeMultipliesAfter(std::shared_ptr<Node> multiply);
static std::shared_ptr<opset1::Constant> roundWithTolerance(std::shared_ptr<Node> node, element::Type target_type, float tolerance = 0.1);
static std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> decomposeFakeQuantize(
std::shared_ptr<opset1::FakeQuantize> fq,
const element::Type precision,
const float min,
const float max,
const bool hasZeroPoint,
const bool updatePrecision);
static std::shared_ptr<opset1::FakeQuantize> updateFakeQuantize(
std::shared_ptr<opset1::FakeQuantize> fq,
element::Type precision,
float min,
float max);
static FakeQuantizeDequantization makeDequantization(
const float dequantizationMul,
const float dequantizationSub,
const ngraph::element::Type originalPrecision,
const ngraph::Shape dataNodeOutputShape,
element::Type precision,
float min,
float max);
static FakeQuantizeDequantization createDequantizationFromFakeQuantize(
std::shared_ptr<opset1::FakeQuantize> fq,
element::Type precision,
float min,
float max,
const bool hasZeroPoint,
const bool updatePrecision);
static FakeQuantizeDequantization getDequantization(const std::shared_ptr<Node> node, const size_t parentIndex = 0ul, const bool inPlace = false);
static std::shared_ptr<Node> optimizeSubtract(std::shared_ptr<opset1::Subtract> add);
class InsertDequantizationResult {
public:
InsertDequantizationResult(
const std::shared_ptr<Node>& newOperation,
const std::shared_ptr<Node>& lastDequantization) : newOperation(newOperation), lastDequantization(lastDequantization) {}
std::shared_ptr<Node> newOperation;
std::shared_ptr<Node> lastDequantization;
};
static InsertDequantizationResult moveDequantizationAfter(
const std::shared_ptr<ngraph::Node>& operation,
const FakeQuantizeDequantization& dequantization,
const bool updatePrecision,
const bool moveSubtract);
// TODO: rename: fuseConvertIfPossible
static void removeConvertIfPossible(
const std::shared_ptr<ngraph::Node>& operation,
const FakeQuantizeDequantization& dequantization);
static bool checkConstantValuePrecision(const element::Type expectedPrecision, const std::shared_ptr<Node>& constant);
static size_t getChildInputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
static size_t getParentOutputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
static std::vector<Output<Node>> getInputs(const std::shared_ptr<ngraph::Node>& node);
static FakeQuantizeDequantizationValues createEmptyValues(const FakeQuantizeDequantization& dequantization);
static bool isZeroConst(const std::shared_ptr<Node>& node);
static std::shared_ptr<Node> toScalarIfPossible(std::shared_ptr<Node> node);
static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq);
static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues);
// multi-precision constant folding
// handles only specific case: Constant -> [dequantization operations] -> [node]
static void foldDequantization(std::shared_ptr<Node>& node, const size_t branchIndex, const bool inPlace = false);
private:
static std::shared_ptr<Node> foldFakeQuantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues, const bool roundValuesWasSet);
// 1 - on weights
// 0 - weightable layer was not found
// -1 - on activations
static int onWeightsInDepth(std::shared_ptr<Node> layer);
};
template <typename OperationType>
std::shared_ptr<Node> NetworkHelper::setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> layer, const element::Type& precision) {
// check if it already exteded operation node
if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
relaxed_layer->set_overridden_output_type(precision);
std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
return layer;
} else {
THROW_IE_LPT_EXCEPTION(*layer) << "TypeRelaxed type is expected";
}
}
template <typename OperationType>
std::shared_ptr<Node> NetworkHelper::setOutDataPrecision(std::shared_ptr<OperationType> layer, const element::Type& precision) {
// check if it already exteded operation node
if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
relaxed_layer->set_overridden_output_type(precision);
std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
return layer;
} else {
// Make such replacements in advance for all supported polymorphic layer types
// extend a node with new semantics: overriden output data_type
// OperationType should be a real type of an object, otherwise it will lead to undefined behavior
auto replacement = std::make_shared<ngraph::op::TypeRelaxed<OperationType>>(*layer, precision);
copy_runtime_info(layer, replacement);
replace_node(layer, replacement);
return replacement;
}
}
template <typename T>
std::shared_ptr<Node> make_op_pattern(const ngraph::NodeVector& args) {
return std::make_shared<ngraph::pattern::op::Any>(element::undefined, PartialShape{}, [](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); }, args);
}
template <typename T>
std::shared_ptr<Node> make_op_label() {
return std::make_shared<ngraph::pattern::op::Label>(
element::undefined,
PartialShape{},
[](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); });
}
template <typename T, typename... Args>
std::shared_ptr<Node> fold(Args&&... args) {
auto node = std::make_shared<T>(std::forward<Args>(args)...);
if (node->get_output_size() == 1) {
OutputVector folded(node->get_output_size());
if (node->constant_fold(folded, node->input_values())) {
return folded[0].get_node_shared_ptr();
}
}
return node;
}
template <typename T, typename... Args>
std::shared_ptr<Node> fold_reshape(Args&&... args) {
std::shared_ptr<Node> node = std::make_shared<T>(std::forward<Args>(args)...);
if (node->get_output_size() == 1) {
OutputVector folded;
if (is_type<opset1::Constant>(node->input_value(0).get_node_shared_ptr()) &&
is_type<opset1::Constant>(node->input_value(1).get_node_shared_ptr())) {
return std::make_shared<opset1::Constant>(
node->get_input_element_type(0),
Shape(as_type_ptr<opset1::Constant>(node->input_value(1).get_node_shared_ptr())->template cast_vector<size_t>()),
as_type_ptr<opset1::Constant>(node->input_value(0).get_node_shared_ptr())->get_data_ptr());
}
}
return node;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API NormalizeL2Transformation : public LayerTransformation {
public:
NormalizeL2Transformation(const Params& params) : LayerTransformation(params) {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API PReluTransformation : public LayerTransformation {
public:
PReluTransformation(const Params& params) : LayerTransformation(params) {}
~PReluTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,89 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ostream>
#include <vector>
#include <transformations_visibility.hpp>
#include <ngraph/node.hpp>
#include <ngraph/opsets/opset1.hpp>
#include <ngraph/type.hpp>
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API QuantizationDetails {
public:
QuantizationDetails();
QuantizationDetails(const QuantizationDetails& quantizationDetails);
QuantizationDetails(
const size_t levels,
const std::vector<float>& inputLowValues,
const std::vector<float>& inputHighValues,
const std::vector<float>& outputLowValues,
const std::vector<float>& outputHighValues,
const size_t inputIntervalsCount,
const size_t outputIntervalsCount,
const size_t outputChannelsCount);
static bool outputLayoutIsSupported(std::shared_ptr<opset1::FakeQuantize> quantize);
static void getInputIntervals(
std::shared_ptr<opset1::FakeQuantize> quantize,
std::vector<float>& inputLowValues,
std::vector<float>& inputHighValues,
size_t& inputIntervalsCount);
static void getOutputIntervals(
std::shared_ptr<opset1::FakeQuantize> quantize,
std::vector<float>& outputLowValues,
std::vector<float>& outputHighValues,
size_t& outputIntervalsCount);
static QuantizationDetails getDetails(std::shared_ptr<opset1::FakeQuantize>);
bool hasNegativeOutput() const;
float maxOutput(const size_t channel) const;
float maxInput(const size_t channel) const;
float maxOutputHigh() const;
float minOutputLow() const;
float getInputLowValue(const size_t channel) const;
float getInputHighValue(const size_t channel) const;
float getOutputLowValue(const size_t channel) const;
float getOutputHighValue(const size_t channel) const;
static bool isSupportedLevel(const size_t level);
const size_t levels;
const std::vector<float> inputLowValues;
const std::vector<float> inputHighValues;
const std::vector<float> outputLowValues;
const std::vector<float> outputHighValues;
const size_t inputIntervalsCount;
const size_t outputIntervalsCount;
const size_t outputChannelsCount;
private:
QuantizationDetails &operator=(const QuantizationDetails & /*target*/) { return *this; }
static void validate(std::shared_ptr<Node> constantLayer);
static std::vector<float> getBlobValue(std::shared_ptr<Node> constantLayer);
};
inline std::ostream &operator << (std::ostream &os, const QuantizationDetails& value) {
os << "levels: " << value.levels <<
", input 1/" << value.inputIntervalsCount << ": [" << value.getInputLowValue(0) << " : " << value.getInputHighValue(0) << "], " <<
", output 1/" << value.outputIntervalsCount << ": [" << value.getOutputLowValue(0) << " : " << value.getOutputHighValue(0) << "]";
return os;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ReluTransformation : public LayerTransformation {
public:
ReluTransformation(const Params& params) : LayerTransformation(params) {}
~ReluTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,32 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <algorithm>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API ReshapeTransformation : public LayerTransformation {
public:
ReshapeTransformation(const Params& params) : LayerTransformation(params) {}
~ReshapeTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
static bool canBeTransformed(
const ngraph::Shape& subtractShape,
const ngraph::Shape& multiplyShape,
const ngraph::Shape& inputShape,
const ngraph::Shape& outputShape);
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,39 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <vector>
#include "layer_transformation.hpp"
#include "ngraph/node.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API SplitTransformation : public LayerTransformation {
public:
SplitTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
void updateOutputs(
TransformationContext& context,
std::vector<std::shared_ptr<ngraph::Node>> lastNodes,
std::shared_ptr<ngraph::Node> originalNode) const;
protected:
ngraph::Shape getConstSplitShape(
const std::vector<size_t>& constSplitLengths,
const ngraph::Shape& constShape, const size_t axis,
const size_t idx) const;
virtual std::vector<size_t> getConstSplitLengths(
const OutputVector& inputs,
const ngraph::Shape& constShape,
const size_t outputSize) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API SqueezeTransformation : public LayerTransformation {
public:
SqueezeTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,24 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API SubtractTransformation : public LayerTransformation {
public:
SubtractTransformation(const Params& params) : LayerTransformation(params) {}
~SubtractTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
#include "transformations/low_precision/eltwise_base_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API SubtractMultiplyToMultiplyAddTransformation : public LayerTransformation {
public:
SubtractMultiplyToMultiplyAddTransformation(const Params& params) : LayerTransformation(params) {}
~SubtractMultiplyToMultiplyAddTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,35 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <string>
#include <unordered_set>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/quantization_details.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API TransformationContext {
public:
explicit TransformationContext(std::shared_ptr<Function> function);
std::shared_ptr<Function> function;
// Used to store handled FakeQuantize operations.
// ConcatTransformation and FakeQuantizeTransformation handle FakeQuantize operations. ConcatTransformation handles FakeQuantize operation first.
// If updatePrecision transformation option is set to False then there are no FakeQuantize operation attributes to identify that the operation
// have been handled by ConcatTransformation already:
// - output precision is original (FP32),
// - intervals are changed but not equal to precision boundaries,
// - quantization level can be or can be not changed.
// To avoid FakeQuantize operation double handling by FakeQuantizeTransformation after ConcatTransformation, FakeQuantizeTransformation
// has to use this member.
std::unordered_set<std::string> quantizedFakeQuantizeNames;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,214 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <algorithm>
#include <map>
#include <memory>
#include <string>
#include <vector>
#include <ngraph/ngraph.hpp>
#include <ngraph_ops/type_relaxed.hpp>
#include "layer_transformation.hpp"
#include "iparams_manager.hpp"
#include "ilayer_transformations_manager.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
struct StandaloneCleanup {
std::string typeName;
std::string typeId;
LayerTransformationPtr transformation;
};
class TRANSFORMATIONS_API LowPrecisionTransformations {
public:
LowPrecisionTransformations() {}
LowPrecisionTransformations(
const std::map<std::string, LayerTransformationPtr>& branchSpecificTransformations,
const std::map<std::string, LayerTransformationPtr>& transformations,
const std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& cleanupTransformations,
const std::vector<StandaloneCleanup>& standaloneCleanupTransformations);
void setUpdatePrecisions(const bool updatePrecisions);
void setQuantizedTensorAlignmentOnActivations(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
void setQuantizedTensorAlignmentOnWeights(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
LowPrecisionTransformations& remove(const std::string& operationType);
LowPrecisionTransformations& removeBranchSpecificTransformations(const std::string& operationType);
LowPrecisionTransformations& removeTransformations(const std::string& operationType);
LowPrecisionTransformations& removeCleanupTransformations(const std::string& operationType);
/**
* Add branch specific transformation. Transformation type and operation type are required.
* Operation type is used to find transformation by operation during precision definition.
*/
template <class Transformation, class Operation>
LowPrecisionTransformations& addBranchSpecific(const LayerTransformation::Params& params) {
const std::string typeName = getType<Operation>();
const auto it = branchSpecificTransformations.find(typeName);
if (it != branchSpecificTransformations.end()) {
branchSpecificTransformations.erase(it);
}
branchSpecificTransformations.emplace(typeName, std::make_shared<Transformation>(params));
return *this;
}
/**
* Add transformation. Transformation type and operation type are required.
* Operation type is used to find transformation by operation during precision definition.
*/
template <class Transformation, class Operation>
LowPrecisionTransformations& add(const LayerTransformation::Params& params) {
const std::string typeName = getType<Operation>();
const auto it = transformations.find(typeName);
if (it != transformations.end()) {
transformations.erase(it);
}
transformations.emplace(typeName, std::make_shared<Transformation>(params));
return *this;
}
/**
* Add cleanup transformation. Transformation type and operation type are required.
* Operation type is used to find transformation by operation during precision definition.
*/
template <class Transformation, class Operation>
LowPrecisionTransformations& addCleanup(const LayerTransformation::Params& params) {
const std::string typeName = getType<Operation>();
const std::string typeId = typeid(Transformation).name();
const auto it = cleanupTransformations.find(typeName);
if (it == cleanupTransformations.end()) {
cleanupTransformations.emplace(typeName,
std::vector<std::pair<std::string, LayerTransformationPtr>>{ std::make_pair(typeId, std::make_shared<Transformation>(params)) });
} else {
const auto it1 = std::find_if(it->second.begin(), it->second.end(),
[&](const std::pair<std::string, LayerTransformationPtr>& transformation) {
return transformation.first == typeName;
});
if (it1 != it->second.end()) {
it->second.erase(it1);
}
it->second.emplace_back(std::make_pair(typeId, std::make_shared<Transformation>(params)));
}
return *this;
}
/**
* Add cleanup transformation. Transformation type and operation type are required.
* Operation type is used to find transformation by operation during precision definition.
*/
template <class Transformation, class Operation>
LowPrecisionTransformations& addStandaloneCleanup(const LayerTransformation::Params& params) {
const std::string typeName = getType<Operation>();
const std::string typeId = typeid(Transformation).name();
const auto it = std::find_if(standaloneCleanupTransformations.begin(), standaloneCleanupTransformations.end(),
[&](const StandaloneCleanup& transformation) {
return transformation.typeName == typeName && transformation.typeId == typeId;
});
if (it == standaloneCleanupTransformations.end()) {
standaloneCleanupTransformations.emplace_back(StandaloneCleanup{ typeName, typeId, std::make_shared<Transformation>(params) });
} else {
*it = { typeName, typeId, std::make_shared<Transformation>(params) };
}
return *this;
}
template <class Operation>
static std::string getType() {
return Operation::get_type_info_static().name;
}
static std::string getType(const Node& operation) {
return operation.get_type_name();
}
std::vector<LayerTransformationPtr> find(const std::string& transformationName) const;
template <class Operation>
std::vector<LayerTransformationPtr> find() const {
const std::string transformationKey = getType<Operation>();
return find(transformationKey);
}
void setParamsManager(IParamsManager* paramsManager) noexcept;
void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
// Key is not a layer type, but just a name of transformation
// Layer type (or a pattern) is defined by transformation itself as an ngraph matcher
std::map<std::string, LayerTransformationPtr> branchSpecificTransformations;
std::map<std::string, LayerTransformationPtr> transformations;
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> cleanupTransformations;
std::vector<StandaloneCleanup> standaloneCleanupTransformations;
private:
static void setParamsManager(IParamsManager* paramsManager, std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
static void setParamsManager(
IParamsManager* paramsManager,
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
static void setParamsManager(IParamsManager* paramsManager, std::vector<StandaloneCleanup>& transformations) noexcept;
static void setLayerTransformationsManager(
ILayerTransformationsManager* layerTransformationsManager,
std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
static void setLayerTransformationsManager(
ILayerTransformationsManager* layerTransformationsManager,
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
static void setLayerTransformationsManager(
ILayerTransformationsManager* layerTransformationsManager,
std::vector<StandaloneCleanup>& transformations) noexcept;
};
/**
* @brief low precision transformation component.
*/
class TRANSFORMATIONS_API LowPrecisionTransformer : public IParamsManager, ILayerTransformationsManager {
public:
static LowPrecisionTransformations getAllTransformations(const LayerTransformation::Params& params = LayerTransformation::Params());
static bool isFunctionQuantized(const std::shared_ptr<Function>& function);
LowPrecisionTransformer();
LowPrecisionTransformer(const LowPrecisionTransformations& transformations);
void transform(std::shared_ptr<Function> network);
// IParamsManager interface implementation
std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept override;
// ILayerTransformationsManager interface implementation
bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept override;
bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept override;
private:
LowPrecisionTransformations transformations;
void registerAllMatchers(
std::map<std::string, LayerTransformationPtr> transformations,
GraphRewrite& pass,
TransformationContext& context);
void registerAllMatchers(
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> transformations,
GraphRewrite& pass,
TransformationContext& context);
std::vector<element::Type> precisionIntersection(
const std::vector<element::Type>& v1,
const std::vector<element::Type>& v2) const noexcept;
};
class TRANSFORMATIONS_API TypeRelaxedReplacer : public GraphRewrite {
public:
TypeRelaxedReplacer();
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API TransparentBaseTransformation : public LayerTransformation {
public:
TransparentBaseTransformation(const Params& params) : LayerTransformation(params) {}
~TransparentBaseTransformation() override {};
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,27 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API TransposeTransformation : public LayerTransformation {
public:
TransposeTransformation(const Params& params) : LayerTransformation(params) {}
~TransposeTransformation() override {}
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,25 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <ngraph/ngraph.hpp>
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API UnsqueezeTransformation : public LayerTransformation {
public:
UnsqueezeTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,28 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <vector>
#include "split.hpp"
#include "ngraph/node.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API VariadicSplitTransformation : public SplitTransformation {
public:
VariadicSplitTransformation(const Params& params);
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
protected:
std::vector<size_t> getConstSplitLengths(
const OutputVector& inputs,
const ngraph::Shape& constShape,
const size_t outputSize) const override;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,34 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformation_context.hpp"
#include "layer_transformation.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
class TRANSFORMATIONS_API WeightableLayerTransformation : public LayerTransformation{
public:
WeightableLayerTransformation(const Params& params);
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
bool isQuantized(std::shared_ptr<Node> layer, bool isReshape) const noexcept;
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
protected:
DataPrecision decomposeFakeQuantizeForWeightsPath(std::shared_ptr<Node> weightableLayer) const;
static bool isGroup(const std::shared_ptr<Node>& node);
static bool isDepthwise(const std::shared_ptr<Node>& node);
std::shared_ptr<opset1::FakeQuantize> getFakeQuantizeOnWeights(const std::shared_ptr<Node>& node) const;
DataPrecision getDataPrecisionOnWeights(const std::shared_ptr<Node>& node) const;
};
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,75 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
/**
* @brief Defines fused names attribute
* @file fused_names_attribute.hpp
*/
#include <assert.h>
#include <functional>
#include <memory>
#include <string>
#include <set>
#include <ngraph/node.hpp>
#include <ngraph/variant.hpp>
#include <transformations_visibility.hpp>
namespace ngraph {
/**
* @ingroup ie_runtime_attr_api
* @brief Dequantization class represents runtime info attribute that indicates
* whether the operation is dequantization
*/
class TRANSFORMATIONS_API DequantizationAttr {
private:
std::string dequantization_attribute;
public:
/**
* A default constructor
*/
DequantizationAttr() = default;
/**
* @brief Constructs a new object consisting of a single name *
* @param[in] name The name
*/
explicit DequantizationAttr(const std::string& name) : dequantization_attribute(name) {}
/**
* @brief return string with dequantization value
*/
std::string getDequantizationAttr() const;
};
extern template class TRANSFORMATIONS_API VariantImpl<DequantizationAttr>;
template<>
class TRANSFORMATIONS_API VariantWrapper<DequantizationAttr> : public VariantImpl<DequantizationAttr> {
public:
static constexpr VariantTypeInfo type_info{"DEQUANTIZATION", 0};
const VariantTypeInfo &get_type_info() const override {
return type_info;
}
VariantWrapper(const value_type &value) : VariantImpl<value_type>(value) {}
std::shared_ptr<ngraph::Variant> merge(const ngraph::NodeVector & nodes) override;
std::shared_ptr<ngraph::Variant> init(const std::shared_ptr<ngraph::Node> & node) override;
};
/**
* @ingroup ie_runtime_attr_api
* @brief getPrimitivesPriority return string with dequantization value
* @param[in] node The node will be used to get Dequantization attribute
*/
TRANSFORMATIONS_API std::string getDequantization(const std::shared_ptr<ngraph::Node>& node);
} // namespace ngraph

View File

@ -22,6 +22,7 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const element::Type output_type,
const size_t& group,
const PadType& auto_pad)
: Op({data_batch, filters})
@ -30,10 +31,53 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
, m_pads_begin(pads_begin)
, m_pads_end(pads_end)
, m_auto_pad(auto_pad)
, m_group(group) {
, m_group(group)
, m_output_type(output_type) {
constructor_validate_and_infer_types();
}
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
const Output<Node>& filters,
const Output<Node>& bias,
const Strides& strides,
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const element::Type output_type,
const size_t& group,
const PadType& auto_pad)
: Op({data_batch, filters, bias})
, m_strides(strides)
, m_dilations(dilations)
, m_pads_begin(pads_begin)
, m_pads_end(pads_end)
, m_auto_pad(auto_pad)
, m_group(group)
, m_output_type(output_type) {
constructor_validate_and_infer_types();
}
// KMB compilation support
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
const Output<Node>& filters,
const Strides& strides,
const Strides& dilations,
const CoordinateDiff& pads_begin,
const CoordinateDiff& pads_end,
const size_t& group,
const PadType& auto_pad)
: Op({data_batch, filters})
, m_strides(strides)
, m_dilations(dilations)
, m_pads_begin(pads_begin)
, m_pads_end(pads_end)
, m_auto_pad(auto_pad)
, m_group(group)
, m_output_type(element::undefined) {
constructor_validate_and_infer_types();
}
// KMB compilation support
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
const Output<Node>& filters,
const Output<Node>& bias,
@ -49,7 +93,8 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
, m_pads_begin(pads_begin)
, m_pads_end(pads_end)
, m_auto_pad(auto_pad)
, m_group(group) {
, m_group(group)
, m_output_type(element::undefined) {
constructor_validate_and_infer_types();
}
@ -59,23 +104,12 @@ void op::ConvolutionIE::validate_and_infer_types() {
PartialShape filters_shape = get_input_partial_shape(1);
element::Type filters_et = get_input_element_type(1);
element::Type result_et;
NODE_VALIDATION_CHECK(
this,
element::Type::merge(result_et, data_batch_et, filters_et),
"Element types for data batch and filters do not match (data batch element type: ",
data_batch_et,
", filters element type: ",
filters_et,
").");
PartialShape result_shape{PartialShape::dynamic()};
// In case if number of groups greater than 1 and channel dimension is dynamic we can't calculate output shape
if (m_group > 1) {
if (data_batch_shape.rank().is_dynamic() || data_batch_shape[1].is_dynamic()) {
set_output_type(0, result_et, result_shape);
set_output_type(0, m_output_type, result_shape);
return;
} else {
// Update channel dimension according to groups count
@ -109,7 +143,7 @@ void op::ConvolutionIE::validate_and_infer_types() {
m_strides,
m_dilations);
set_output_type(0, result_et, result_shape);
set_output_type(0, m_output_type, result_shape);
}
shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVector & new_args) const {
@ -120,6 +154,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
m_dilations,
m_pads_begin,
m_pads_end,
m_output_type,
m_group,
m_auto_pad);
} else if (new_args.size() == 3) {
@ -130,6 +165,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
m_dilations,
m_pads_begin,
m_pads_end,
m_output_type,
m_group,
m_auto_pad);
}

View File

@ -36,6 +36,32 @@ std::pair<std::shared_ptr<A>, std::shared_ptr<B>> parse_eltwise_inputs(std::shar
return {eltwise, constant};
}
template <class Conv>
bool IsConvInLowPrecision(const std::shared_ptr<Conv>& conv) {
if (!ngraph::is_type<ngraph::op::ConvolutionIE>(conv)) {
return false;
}
auto isLowPrecision = [](const std::shared_ptr<ngraph::Node>& node, const size_t index) {
const ngraph::element::Type inputType = node->get_input_element_type(index);
return (inputType == ngraph::element::i8) || (inputType == ngraph::element::u8);
};
// Convolution operation has to be executed in INT8 if ...
if (isLowPrecision(conv, 0) && isLowPrecision(conv, 1)) {
// ... INT8 on activations && INT8 on weights
return true;
}
const std::shared_ptr<ngraph::opset1::Subtract> subtract = ngraph::as_type_ptr<ngraph::opset1::Subtract>(conv->get_input_node_shared_ptr(0));
if (subtract == nullptr) {
return false;
}
// ... INT8 on activations with asymmetric quantization && INT8 on weights
return isLowPrecision(subtract, 0) && isLowPrecision(subtract, 1) && isLowPrecision(conv, 1);
}
template <class Conv>
ngraph::graph_rewrite_callback get_callback() {
ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) {
@ -95,7 +121,8 @@ ngraph::graph_rewrite_callback get_callback() {
new_bias = std::make_shared<ngraph::opset1::Add>(final_const, m_conv->input_value(2));
}
new_conv = m_conv->clone_with_new_inputs({m_conv->input_value(0), m_conv->input_value(1), new_bias});
} else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise)) {
} else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise) &&
!IsConvInLowPrecision(m_conv)) {
// Fuse: ConvolutionIE->Mul
auto weights_shape = m_conv->input(1).get_shape();

View File

@ -44,10 +44,18 @@ ngraph::pass::AddMultiplyFusion::AddMultiplyFusion() {
auto mul = label_to_output[m_mul].get_node_shared_ptr();
auto add = label_to_output[m_add].get_node_shared_ptr();
if (m_transformation_callback(mul)) {
return false;
}
Output<Node> input = label_to_output[m_data];
Output<Node> mul_const = label_to_output[m_mul_constant];
Output<Node> add_const = label_to_output[m_add_constant];
if ((input.get_element_type() != mul_const.get_element_type()) || (add_const.get_element_type() != mul_const.get_element_type())) {
return false;
}
// Replace Add->Multiply with Multiply->Add
// As new Multiply can be fused with operation above it we add this Multiply
// to the list of operations that will be used in additional matching.

View File

@ -161,6 +161,7 @@ bool ngraph::pass::ConvertPrecision::run_on_function(std::shared_ptr<ngraph::Fun
// If output type mismatch given type we try to fuse type into this operation
// otherwise we insert Convert operation.
for (auto &node : f->get_ordered_ops()) {
m_transformation_callback(node);
// Recursively apply transformation for sub-graph based operations
if (auto sub_graph_node = std::dynamic_pointer_cast<op::util::SubGraphOp>(node)) {
if (auto sub_graph = sub_graph_node->get_function()) {

View File

@ -0,0 +1,203 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/add.hpp"
#include <algorithm>
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include "ngraph_ops/type_relaxed.hpp"
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
#include "transformations/low_precision/common/dequantization_op.hpp"
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
std::shared_ptr<opset1::Subtract> replaceToSubtract(const std::shared_ptr<Node>& op) {
// TODO: separate this part to standalone transformation: AddToSubtractTransformation
// motivation:
// - single responsibility
// - keep AddTransformation and AddToSubtractTransformation transformations independent and optional
const auto add = as_type_ptr<opset1::Add>(op);
if (add == nullptr) {
return nullptr;
}
// TODO: use general way from getDequantization: is eltwise with Constant
const int constBranchIndex = is_type<opset1::Constant>(add->get_input_node_ptr(0)) ?
0 :
(is_type<opset1::Constant>(add->get_input_node_ptr(1)) ? 1 : -1);
if (constBranchIndex == -1) {
return nullptr;
}
const size_t dataBranchIndex = constBranchIndex == 0 ? 1ul : 0;
const auto parent = add->get_input_node_shared_ptr(dataBranchIndex);
if (is_type<opset1::Convolution>(parent) ||
is_type<opset1::GroupConvolution>(parent) ||
(is_type<opset1::MatMul>(parent) &&
(is_type<opset1::Constant>(parent->get_input_node_ptr(0)) || is_type<opset1::Constant>(parent->get_input_node_ptr(1))))) {
return nullptr;
}
auto constant = fold<opset1::Negative>(add->get_input_node_shared_ptr(constBranchIndex));
auto constOutput = constant->output(0);
const auto subtract = std::make_shared<DequantizationSubtract>(
add->get_input_node_shared_ptr(dataBranchIndex),
constOutput,
add->get_autob());
NetworkHelper::copyInfo(add, subtract);
replace_node(add, subtract);
return subtract;
}
std::shared_ptr<opset1::Subtract> fuseWithSubtract(const std::shared_ptr<Node>& op) {
const auto add = as_type_ptr<opset1::Add>(op);
if ((add == nullptr) ||
!is_type<opset1::Subtract>(add->get_input_node_shared_ptr(0)) ||
// TODO: use general way from getDequantization: is eltwise with Constant
!is_type<opset1::Constant>(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1))) {
return nullptr;
}
const auto newSubConst = fold<opset1::Subtract>(
add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1),
add->get_input_node_shared_ptr(1));
const auto newSubtract = std::make_shared<op::TypeRelaxed<DequantizationSubtract>>(
std::vector<element::Type>{element::f32, element::f32},
std::vector<element::Type>{ element::f32 },
ngraph::op::TemporaryReplaceOutputType(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(0), element::f32).get(),
ngraph::op::TemporaryReplaceOutputType(newSubConst, element::f32).get());
NetworkHelper::copyInfo(add, newSubtract);
replace_node(add, newSubtract);
return newSubtract;
}
void AddTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
addSingleNodePattern<opset1::Add>(pass, context);
}
bool AddTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
std::shared_ptr<opset1::Add> op = as_type_ptr<opset1::Add>(m.get_match_root());
if (!canBeTransformed(context, op)) {
return false;
}
std::shared_ptr<Node> addNode = separateInStandaloneBranch(op);
std::shared_ptr<opset1::Add> add = as_type_ptr<opset1::Add>(addNode);
const int fullPathIndex = getNotEmpty(add);
std::shared_ptr<Node> newMultiply;
std::shared_ptr<Node> newAddOrSubtract;
if (fullPathIndex == -1) {
// swap constant multiply and add and possibly fuse to subtract
const auto multiplyBranch = getMultiplyConstBranch(add);
if (multiplyBranch.first == -1) {
NetworkHelper::foldDequantization(addNode, 0);
NetworkHelper::foldDequantization(addNode, 1);
return false;
}
newMultiply = NetworkHelper::swapMultiplyAndAdd(add, multiplyBranch.first);
if (is_type<opset1::Add>(newMultiply->get_input_node_shared_ptr(0))) {
newAddOrSubtract = newMultiply->get_input_node_shared_ptr(0);
auto subtract = fuseWithSubtract(newAddOrSubtract);
if (subtract != nullptr) {
newAddOrSubtract = subtract;
}
subtract = replaceToSubtract(newAddOrSubtract);
if (subtract != nullptr) {
newAddOrSubtract = subtract;
}
} else {
newAddOrSubtract = newMultiply;
}
} else {
// dequantizations are on both branches
const int emptyPathIndex = fullPathIndex == 0 ? 1 : 0;
FakeQuantizeDequantization dequantizationEmptyPath = NetworkHelper::getDequantization(add, emptyPathIndex);
if (updatePrecisions && !dequantizationEmptyPath.empty() && !dequantizationEmptyPath.isLowPrecision()) {
return false;
}
std::shared_ptr<Node> subtractEmptyPathValues;
std::shared_ptr<Node> multiplyEmptyPathValues;
std::tie(subtractEmptyPathValues, multiplyEmptyPathValues) = NetworkHelper::createEmptyValues(dequantizationEmptyPath);
FakeQuantizeDequantization dequantizationFullPath = NetworkHelper::getDequantization(add, fullPathIndex);
if (updatePrecisions && !dequantizationFullPath.empty() && !dequantizationFullPath.isLowPrecision()) {
return false;
}
std::shared_ptr<Node> subtractFullPathValues;
std::shared_ptr<Node> multiplyFullPathValues;
std::tie(subtractFullPathValues, multiplyFullPathValues) = NetworkHelper::createEmptyValues(dequantizationFullPath);
// calculation
// before: Y = (SC1 * (X1 - SH1)) + (SC2 * (X2 - SH2))
// after : Y = SC2 * ( SC1' * (X1 - SH1') + X2 ) , where :
// SC1' = SC1 / SC2
// SH1' = SH1 + SC2 * SH2 / SC1
std::shared_ptr<Node> newSubtractFullPathValues = fold<opset1::Add>(
subtractFullPathValues,
fold<opset1::Divide>(
fold<opset1::Multiply>(subtractEmptyPathValues, multiplyEmptyPathValues),
multiplyFullPathValues));
std::shared_ptr<Node> newMultiplyFullPathValues = fold<opset1::Divide>(multiplyFullPathValues, multiplyEmptyPathValues);
if (NetworkHelper::isZeroConst(newSubtractFullPathValues)) {
newSubtractFullPathValues = nullptr;
}
// graph update
std::vector<std::shared_ptr<Node>> inputs{ {}, {} };
auto fullPathInput = dequantizationFullPath.convert == nullptr ? dequantizationFullPath.data : dequantizationFullPath.convert;
inputs[emptyPathIndex] = dequantizationEmptyPath.data.get_node_shared_ptr();
inputs[fullPathIndex] = std::make_shared<DequantizationMultiply>(
newSubtractFullPathValues == nullptr ?
fullPathInput :
std::make_shared<DequantizationSubtract>(fullPathInput, newSubtractFullPathValues),
newMultiplyFullPathValues);
newAddOrSubtract = std::make_shared<op::TypeRelaxed<opset1::Add>>(
std::vector<element::Type>{element::f32, element::f32}, std::vector<element::Type>{ element::f32 },
ngraph::op::TemporaryReplaceOutputType(inputs[0], element::f32).get(),
ngraph::op::TemporaryReplaceOutputType(inputs[1], element::f32).get());
newMultiply = std::make_shared<DequantizationMultiply>(newAddOrSubtract, multiplyEmptyPathValues);
replace_node(add, newMultiply);
NetworkHelper::copyInfo(add, newAddOrSubtract);
}
updateOutput(context, newMultiply, newAddOrSubtract);
if (fullPathIndex != -1) {
std::shared_ptr<Node> node = add;
NetworkHelper::foldDequantization(node, fullPathIndex);
}
return true;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,80 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/avg_pool.hpp"
#include <memory>
#include <ngraph/ngraph.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
AvgPoolTransformation::AvgPoolTransformation(const Params& params) : LayerTransformation(params) {
}
void AvgPoolTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
addPattern(
pass,
context,
make_op_pattern<opset1::AvgPool>({ make_op_label<opset1::Multiply>() }));
}
bool AvgPoolTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
if (!canBeTransformed(context, m.get_match_root())) {
return false;
}
const std::shared_ptr<Node> pooling = separateInStandaloneBranch(m.get_match_root());
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(pooling);
bool updatePrecision;
// issue #40768
if ((children.size() == 1ul) && (!this->layerTransformationsManager->isQuantized(children[0]))) {
updatePrecision = false;
} else {
updatePrecision = false;
// NOTE: This check was added for models that don't have FQ after AvgPool
// They will have transparent precision as it was in old LPT.
for (const auto& child : children) {
if (!is_type<opset1::FakeQuantize>(child)) {
updatePrecision = true;
break;
}
}
}
moveDequantizationAfter(context, pooling, NetworkHelper::getDequantization(pooling), updatePrecision);
return true;
}
bool AvgPoolTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> operation) const {
if (!LayerTransformation::canBeTransformed(context, operation)) {
return false;
}
auto dequantization = NetworkHelper::getDequantization(operation);
return !!dequantization.multiply;
}
bool AvgPoolTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(layer);
// NOTE: This check was added for models that don't have FQ after AvgPool
// They will have transparent precision as it was in old LPT.
for (const auto& child : children) {
if (!is_type<opset1::FakeQuantize>(child)) {
return true;
}
}
return false;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,97 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/clamp.hpp"
#include <algorithm>
#include <memory>
#include <ngraph/ngraph.hpp>
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
ClampTransformation::ClampTransformation(const Params& params) : LayerTransformation(params) {}
void ClampTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
addPattern(pass,
context,
make_op_pattern<opset1::Clamp>({ make_op_label<opset1::Multiply>() }));
}
bool ClampTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher& m) const {
auto subWithTheSameValues = [](std::shared_ptr<ngraph::opset1::Subtract> sub) {
if (sub == nullptr) {
return false;
}
const auto constant = as_type_ptr<ngraph::opset1::Constant>(sub->get_input_node_shared_ptr(1));
if (constant == nullptr) {
return false;
}
return NetworkHelper::isScalarLike(constant);
};
if (!canBeTransformed(context, m.get_match_root())) {
return false;
}
const std::shared_ptr<Node> clamp = separateInStandaloneBranch(m.get_match_root());
const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(clamp);
const bool moveSubtract = subWithTheSameValues(dequantization.subtract);
if (!moveSubtract && !canSubtractBeHandled(clamp, dequantization)) {
return false;
}
const auto newClamp = as_type_ptr<opset1::Clamp>(moveDequantizationAfter(context, clamp, dequantization, false, moveSubtract));
double min = newClamp->get_min();
double max = newClamp->get_max();
if (dequantization.multiply != nullptr) {
double scale = as_type_ptr<opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
if (scale < 0.0) {
std::swap(min, max);
}
min /= scale;
max /= scale;
}
if (dequantization.subtract != nullptr && moveSubtract) {
double shift = as_type_ptr<opset1::Constant>(dequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
min += shift;
max += shift;
}
const std::shared_ptr<ngraph::opset1::Clamp> replacement = std::make_shared<ngraph::opset1::Clamp>(newClamp->get_input_node_shared_ptr(0), min, max);
replace_node(newClamp, replacement);
element::Type outputClampType = dequantization.multiply ?
dequantization.multiply->get_output_element_type(0) :
dequantization.subtract->get_output_element_type(0);
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecision(replacement, outputClampType);
return true;
}
bool ClampTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const {
if (!LayerTransformation::canBeTransformed(context, op)) {
return false;
}
const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(op);
const auto mulConst = as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1));
if (mulConst == nullptr) {
return false;
}
return NetworkHelper::isScalarLike(mulConst);
}
bool ClampTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
return false;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,103 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
#include <memory>
#include <ngraph/opsets/opset1.hpp>
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
FakeQuantizeDequantization::FakeQuantizeDequantization() {}
FakeQuantizeDequantization::FakeQuantizeDequantization(
Output<Node> data,
std::shared_ptr<opset1::Convert> convert,
std::shared_ptr<opset1::Subtract> subtract,
std::shared_ptr<opset1::Multiply> multiply) :
data(data),
convert(convert),
subtract(subtract),
multiply(multiply) {
}
bool FakeQuantizeDequantization::empty() const {
return (convert == nullptr) && (subtract == nullptr) && (multiply == nullptr);
}
bool FakeQuantizeDequantization::isShared() const {
if ((convert != nullptr) && (convert->get_output_target_inputs(0).size() > 1ul)) {
return true;
}
if ((subtract != nullptr) && (subtract->get_output_target_inputs(0).size() > 1ul)) {
return true;
}
if ((multiply != nullptr) && (multiply->get_output_target_inputs(0).size() > 1ul)) {
return true;
}
return false;
}
bool FakeQuantizeDequantization::isLowPrecision() const {
return (data.get_element_type() == element::i8) || (data.get_element_type() == element::u8);
}
bool FakeQuantizeDequantization::checkElementwise(const std::shared_ptr<ngraph::Node>& dequantizationElementwise) {
const ngraph::PartialShape partialShape = dequantizationElementwise->get_input_partial_shape(0);
if (partialShape.is_dynamic()) {
return false;
}
std::shared_ptr<opset1::Constant> constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(1));
if (constant == nullptr) {
constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(0));
}
if (constant == nullptr) {
THROW_IE_LPT_EXCEPTION(*dequantizationElementwise) << "unexpected operation type " <<
dequantizationElementwise->get_type_info().name << " on the second branch";
}
const ngraph::Shape constShape = constant->get_output_shape(0);
if ((constShape.size() > 5ul)) {
return false;
}
if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
return true;
}
const ngraph::Shape shape = partialShape.to_shape();
if (constShape.size() == shape.size()) {
if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
return false;
}
for (size_t i = 2ul; i < constShape.size(); ++i) {
if (constShape[i] != 1ul) {
return false;
}
}
} else if (constShape.size() == (shape.size() - 1)) {
if (constShape[0] != shape[1]) {
return false;
}
for (size_t i = 1ul; i < constShape.size(); ++i) {
if (constShape[i] != 1ul) {
return false;
}
}
} else {
return false;
}
return true;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,179 @@
// Copyright (C) 2018-2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include <transformations/low_precision/common/subgraph.hpp>
#include <algorithm>
#include <memory>
#include <string>
#include <unordered_set>
#include <utility>
#include <vector>
#include <ngraph/rt_info.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "transformations/low_precision/quantization_details.hpp"
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
bool isQuantizationPerChannel(const std::shared_ptr<ngraph::Node>& node) {
if (node->outputs().size() > 1ul) {
return false;
}
const auto inputs = ngraph::pass::low_precision::NetworkHelper::getInputs(node);
for (const auto& input : inputs) {
if (ngraph::is_type<opset1::Constant>(input.get_node())) {
continue;
}
const Shape& in = input.get_shape();
const Shape& out = node->output(0).get_shape();
for (size_t i = 0; i < 2; ++i) {
if (in[i] != out[i]) {
return false;
}
}
}
return true;
}
Subgraph::Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager) : layerTransformationsManager(layerTransformationsManager) {
}
bool Subgraph::fillSubgraphForQuantization(
const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize,
std::unordered_set<std::string>& handledLayers) {
quantizationLayers.push_back(fakeQuantize);
handledLayers.insert(fakeQuantize->get_friendly_name());
layers.emplace(fakeQuantize->get_friendly_name(), fakeQuantize);
for (size_t index = 0; index < fakeQuantize->get_output_size(); ++index) {
const auto childInputs = fakeQuantize->get_output_target_inputs(index);
for (const auto childInput : childInputs) {
const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
continue;
}
const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
if (concatChild != nullptr) {
if (!fillSubgraphForConcat(concatChild, handledLayers)) {
return false;
}
} else {
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
if (fakeQuantizeChild != nullptr) {
//
} else {
if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
if (!fillSubgraphForIntermediate(child, handledLayers)) {
return false;
}
}
}
}
}
}
return true;
}
bool Subgraph::fill(const std::shared_ptr<ngraph::Node>& layer, std::unordered_set<std::string>& handledLayers) {
// if at least one parent is handled incorrectly then subgraph is not in low precision
for (size_t index = 0; index < layer->get_input_size(); ++index) {
const std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(index);
if (handledLayers.find(parent->get_friendly_name()) != handledLayers.end()) {
continue;
}
const std::shared_ptr<ngraph::opset1::Concat> concatParent = ngraph::as_type_ptr<ngraph::opset1::Concat>(parent);
if (concatParent != nullptr) {
if (!fillSubgraphForConcat(concatParent, handledLayers)) {
return false;
}
} else {
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeParent = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
if (fakeQuantizeParent != nullptr) {
if (!fillSubgraphForQuantization(fakeQuantizeParent, handledLayers)) {
//
}
} else {
const std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(parent);
if (constant != nullptr) {
//
} else {
if (layerTransformationsManager->isPrecisionPreserved(parent) && isQuantizationPerChannel(parent)) {
if (!fillSubgraphForIntermediate(parent, handledLayers)) {
return false;
}
} else {
return false;
}
}
}
}
}
// TODO: if at least one child was handled correctly then subgraph is low precision
for (size_t index = 0; index < layer->get_output_size(); ++index) {
const auto childInputs = layer->get_output_target_inputs(index);
for (const auto childInput : childInputs) {
const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
continue;
}
const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
if (concatChild != nullptr) {
if (!fillSubgraphForConcat(concatChild, handledLayers)) {
return false;
}
} else {
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
if (fakeQuantizeChild != nullptr) {
//
} else if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
if (!fillSubgraphForIntermediate(child, handledLayers)) {
return false;
}
}
}
}
}
return true;
}
bool Subgraph::fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers) {
handledLayers.insert(intermediate->get_friendly_name());
layers.emplace(intermediate->get_friendly_name(), intermediate);
return fill(intermediate, handledLayers);
}
bool Subgraph::empty() const {
return quantizationLayers.empty();
}
bool Subgraph::fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers) {
concatLayers.push_back(concat);
handledLayers.insert(concat->get_friendly_name());
layers.emplace(concat->get_friendly_name(), concat);
std::shared_ptr<ngraph::Node> node = concat;
return fill(node, handledLayers);
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,428 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/concat.hpp"
#include <algorithm>
#include <map>
#include <memory>
#include <string>
#include <utility>
#include <vector>
#include <ngraph/opsets/opset1.hpp>
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
#include "transformations/low_precision/common/subgraph.hpp"
#include "transformations/low_precision/common/dequantization_op.hpp"
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
void ConcatTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
addSingleNodePattern<opset1::Concat>(pass, context);
}
bool ConcatTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
if (!canBeTransformed(context, concat)) {
return false;
}
ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
std::unordered_set<std::string> handledLayers;
if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
return false;
}
if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
return false;
}
// precisions can be different
ngraph::Node& quantizationLayer = *subgraph.quantizationLayers[0];
std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer.shared_from_this());
DataPrecision dataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
if (dataPrecision.precision == ngraph::element::undefined) {
return false;
}
std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
std::vector<QuantizationDetails> quantizationLayersDetails;
for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
const std::shared_ptr<ngraph::Node> fakeQuantizeLayer = subgraph.quantizationLayers[i];
const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
if (shape.size() < 4ul) {
return false;
}
const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
if (fq == nullptr) {
return false;
}
const QuantizationDetails& quantizationDetails = QuantizationDetails::getDetails(fq);
quantizationLayersDetails.push_back(quantizationDetails);
const DataPrecision dataPrecision2 = getDataPrecision(subgraph.quantizationLayers[i]->shared_from_this(), quantizationDetails, false);
if (dataPrecision2.precision == ngraph::element::undefined) {
return false;
}
if (dataPrecision.precision != dataPrecision2.precision) {
// quantization levels are the same, difference can be in sign
// wider interval (precision) is preferable: use signed if least one interval is signed
dataPrecision = dataPrecision.precision.is_signed() ? dataPrecision : dataPrecision2;
}
}
if (dataPrecision.precision == ngraph::element::undefined) {
return false;
}
// per tensor scale is supported only
if (quantizationLayersDetails.empty() || (quantizationLayersDetails[0].inputHighValues.size() != 1ul)) {
return false;
}
FakeQuantizeDequantization dequantization;
if ((quantizationLayersDetails[0].inputHighValues.size() == 1)) {
float outputLowValue = quantizationLayersDetails[0].outputLowValues[0];
float outputHighValue = quantizationLayersDetails[0].outputHighValues[0];
for (size_t index = 0lu; index < subgraph.quantizationLayers.size(); index++) {
const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
if (outputLowValue > quantizationDetails.outputLowValues[0]) {
outputLowValue = quantizationDetails.outputLowValues[0];
}
if (outputHighValue < quantizationDetails.outputHighValues[0]) {
outputHighValue = quantizationDetails.outputHighValues[0];
}
}
if ((outputLowValue == 0.f) && (outputHighValue == 0.f)) {
return false;
}
const float maxOutputInterval = outputHighValue - outputLowValue;
if (quantizedTensorAlignmentOnActivations == QuantizedTensorAlignment::UpdateLevel) {
const size_t minLevels = getMinQuantizationLevels(
dataPrecision,
maxOutputInterval,
quantizationLayersDetails,
outputLowValue,
outputHighValue);
if (minLevels < this->minQuantizationLevels) {
return false;
}
}
// FQ -> SUB_quantization -> MUL_quantization -[INT8]-> SUB_dequantization -> MUL_dequantization ->
const float quantizationMul = (dataPrecision.max - dataPrecision.min) / maxOutputInterval;
const float dequantizationMul = maxOutputInterval / (dataPrecision.max - dataPrecision.min);
// FQ outputLowValue = dataPrecision.min * dequantizationMul - quantizationSub
const float quantizationSub = outputLowValue - dataPrecision.min * dequantizationMul;
const float dequantizationSub = std::round(-quantizationSub * quantizationMul);
// 1. get data for dequantization. Dequantization data will be used several times later.
dequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
dequantizationMul,
dequantizationSub,
subgraph.quantizationLayers[0]->get_output_element_type(0),
subgraph.quantizationLayers[0]->get_output_shape(0),
dataPrecision.precision,
dataPrecision.min,
dataPrecision.max);
for (int index = 0; index < subgraph.quantizationLayers.size(); index++) {
std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeLayer = as_type_ptr<ngraph::opset1::FakeQuantize>(
subgraph.quantizationLayers[index]->shared_from_this());
const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
switch (quantizedTensorAlignmentOnActivations) {
case QuantizedTensorAlignment::None: {
THROW_TRANSFORMATION_EXCEPTION << "not implemented: " << quantizedTensorAlignmentOnActivations;
}
case QuantizedTensorAlignment::UpdateLevel: {
const float updatedOutputLowValue = (quantizationDetails.outputLowValues[0] - quantizationSub) * quantizationMul;
const float updatedOutputHighValue = (quantizationDetails.outputHighValues[0] - quantizationSub) * quantizationMul;
// 2. update FakeQuantize - one time action
std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
fakeQuantizeLayer,
updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
roundf(updatedOutputLowValue),
roundf(updatedOutputHighValue));
const size_t levels = static_cast<size_t>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
newFakeQuantizeLayer->set_levels(levels);
subgraph.quantizationLayers[index] = newFakeQuantizeLayer;
subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
break;
}
default: {
THROW_TRANSFORMATION_EXCEPTION << "unexpected value " << quantizedTensorAlignmentOnActivations;
}
}
}
} else {
return false;
}
auto dequantizationValuesCallback = [&](
std::shared_ptr<ngraph::Node> layer,
const std::string originalLayerName,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
dequantizationsToConcatenate.push_back(dequantization);
};
addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
if (updatePrecisions) {
for (const auto it : subgraph.layers) {
const std::shared_ptr<ngraph::Node>& node = it.second;
if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node) != nullptr) {
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
} else {
// set precision to explicitly to have updated precision during transformation
for (size_t i = 0; i < node->get_output_size(); ++i) {
node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
}
}
}
}
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
}
return true;
}
bool ConcatTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
return true;
}
bool ConcatTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const {
std::shared_ptr<opset1::Concat> concat = as_type_ptr<opset1::Concat>(layer);
return concat->get_axis() == 1ul;
}
void ConcatTransformation::addDequantizationLayers(
TransformationContext& context,
ngraph::pass::low_precision::Subgraph& subgraph,
std::function<void(
std::shared_ptr<ngraph::Node> layer,
const std::string originalLayerName,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const {
std::unordered_map<std::string, ngraph::Node*> outputs;
for (size_t i = 0; i < context.function->get_output_size(); ++i) {
ngraph::Node* node = context.function->get_output_op(i).get();
if (node->get_input_size() != 1ul) {
THROW_IE_LPT_EXCEPTION(*node) << "unexpected inputs count for result node";
}
outputs.emplace(node->get_input_node_shared_ptr(0)->get_friendly_name(), node);
}
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> notHandledSubgraphLayers = subgraph.layers;
while (notHandledSubgraphLayers.size() != 0ul) {
const auto layerIt = notHandledSubgraphLayers.begin();
std::shared_ptr<ngraph::Node> layer = layerIt->second;
notHandledSubgraphLayers.erase(layerIt);
std::vector<FakeQuantizeDequantization> layerDequantizations;
for (int i = 0; i < layer->get_output_size(); ++i) {
const auto childInputs = layer->get_output_target_inputs(i);
for (const auto childInput : childInputs) {
ngraph::Node& child = *childInput.get_node();
if (subgraph.layers.find(child.get_friendly_name()) == subgraph.layers.end()) {
if (layerDequantizations.size() == 0ul) {
getLayerDequantizationCallback(layer, layer->get_friendly_name(), layerDequantizations);
}
std::shared_ptr<ngraph::Node> source = layer->shared_from_this();
{
std::vector<std::shared_ptr<ngraph::Node>> convertNodes;
std::vector<std::shared_ptr<ngraph::Node>> subtractNodes;
std::vector<std::shared_ptr<ngraph::Node>> multiplyNodes;
if (layerDequantizations.size() > 1ul) {
auto broadcastElementWiseConst = [](
std::shared_ptr<ngraph::opset1::Constant> operation,
const ngraph::Shape targetShape) -> std::shared_ptr<Node> {
auto unsqueeze = ngraph::pass::low_precision::fold<ngraph::opset1::Unsqueeze>(
operation->shared_from_this(),
std::make_shared<ngraph::opset1::Constant>(element::i64, ngraph::Shape{ 4 }, std::vector<size_t>{ 0, 1, 2, 3 }));
auto targetShapeConst = std::make_shared<ngraph::opset1::Constant>(
element::i64, ngraph::Shape{ targetShape.size() },
targetShape);
auto broadcast = ngraph::pass::low_precision::fold<ngraph::opset1::Broadcast>(
unsqueeze,
targetShapeConst,
ngraph::op::AutoBroadcastType::NUMPY);
return broadcast;
};
bool allDequantizationShiftAreZero = true;
bool allDequantizationMultiplyAreZero = true;
for (FakeQuantizeDequantization dequantization : layerDequantizations) {
if (dequantization.subtract != nullptr) {
allDequantizationShiftAreZero = false;
}
if (dequantization.multiply != nullptr) {
allDequantizationMultiplyAreZero = false;
}
}
for (size_t i = 0; i < layerDequantizations.size(); ++i) {
const auto& dequantization = layerDequantizations[i];
convertNodes.push_back(dequantization.convert);
const ngraph::element::Type precision = dequantization.data.get_element_type();
ngraph::Shape targetShape = dequantization.data.get_shape();
targetShape[0] = 1ul;
for (size_t i = 2; i < targetShape.size(); ++i) {
targetShape[i] = 1ul;
}
if (!allDequantizationShiftAreZero) {
subtractNodes.push_back(dequantization.subtract == nullptr ?
std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 0.f })) :
broadcastElementWiseConst(
as_type_ptr<ngraph::opset1::Constant>(dequantization.subtract->input_value(1).get_node_shared_ptr()),
targetShape));
}
if (!allDequantizationMultiplyAreZero) {
multiplyNodes.push_back(dequantization.multiply == nullptr ?
std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 1.0f })) :
broadcastElementWiseConst(
as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->input_value(1).get_node_shared_ptr()),
targetShape));
}
}
} else {
// TODO: check constant shapes here - has to be scalar
if (layerDequantizations[0].convert != nullptr) {
convertNodes.push_back(layerDequantizations[0].convert);
}
if (layerDequantizations[0].subtract != nullptr) {
subtractNodes.push_back(layerDequantizations[0].subtract->input_value(1).get_node_shared_ptr());
}
if (layerDequantizations[0].multiply != nullptr) {
multiplyNodes.push_back(layerDequantizations[0].multiply->input_value(1).get_node_shared_ptr());
}
}
// TODO: the second place (first is FQ decomposition) where dequantization operations are inserted
const std::shared_ptr<ngraph::Node> destination = child.shared_from_this();
if (!convertNodes.empty()) {
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
std::shared_ptr<ngraph::Node> convert =
convertNodes[0]->clone_with_new_inputs({ destination->get_input_source_output(sourceOutputIdx) });
insert_new_node_between(source, destination, convert);
source = convert;
}
// concatenation axis is 1
if (!subtractNodes.empty()) {
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
std::shared_ptr<ngraph::opset1::Subtract> subtract = std::make_shared<DequantizationSubtract>(
destination->get_input_source_output(sourceOutputIdx),
NetworkHelper::toScalarIfPossible(subtractNodes.size() == 1ul ?
subtractNodes[0] :
ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(subtractNodes, 1)));
insert_new_node_between(source, destination, subtract);
source = subtract;
}
if (!multiplyNodes.empty()) {
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
std::shared_ptr<ngraph::opset1::Multiply> multiply = std::make_shared<DequantizationMultiply>(
destination->get_input_source_output(sourceOutputIdx),
NetworkHelper::toScalarIfPossible(multiplyNodes.size() == 1ul ?
multiplyNodes[0] :
ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(multiplyNodes, 1)));
insert_new_node_between(source, destination, multiply);
source = multiply;
}
}
// first input is used
const ngraph::element::Type precision = layerDequantizations[0].data.get_element_type();
layer->set_output_type(0, precision, layer->get_output_partial_shape(0));
const auto it = outputs.find(layer->get_friendly_name());
if (it != outputs.end()) {
const std::string originalName = layer->get_friendly_name();
const std::string newName = layer->get_friendly_name() + LayerTransformation::originalLayerPostfix;
layer->set_friendly_name(newName);
source->set_friendly_name(originalName);
subgraph.layers[layer->get_friendly_name()] = layer;
}
}
}
}
}
}
bool ConcatTransformation::isHandled(const TransformationContext& context, const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations) {
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : quantizationOperations) {
if (context.quantizedFakeQuantizeNames.find(quantizationLayer->get_friendly_name()) != context.quantizedFakeQuantizeNames.end()) {
return true;
}
}
return false;
}
size_t ConcatTransformation::getMinQuantizationLevels(
const DataPrecision& dataPrecision,
const float maxOutputInterval,
const std::vector<QuantizationDetails>& quantizationLayersDetails,
const float outputLowValue,
const float outputHighValue) const {
size_t minLevels = std::numeric_limits<std::size_t>::max();
for (const QuantizationDetails quantizationDetails : quantizationLayersDetails) {
// if there is negative part then calculation is based on `outputLowValue` if not then on `outputHighValue` only
const float updatedOutputLowValue = outputLowValue != 0.f ?
(quantizationDetails.outputLowValues[0] / outputLowValue) * dataPrecision.min :
(quantizationDetails.outputLowValues[0] / outputHighValue) * dataPrecision.max;
// if there is positive part then calculation is based on `outputHighValue` if not then on `outputLowValue` only
const float updatedOutputHighValue = outputHighValue != 0.f ?
(quantizationDetails.outputHighValues[0] / outputHighValue) * dataPrecision.max :
(quantizationDetails.outputHighValues[0] / outputLowValue) * dataPrecision.min;
const int levels = static_cast<int>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
if (minLevels > levels) {
minLevels = levels;
}
}
return minLevels;
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

View File

@ -0,0 +1,232 @@
// Copyright (C) 2020 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#include "transformations/low_precision/concat_multi_channels.hpp"
#include <queue>
#include <memory>
#include <string>
#include <unordered_map>
#include <vector>
#include <ngraph/ngraph.hpp>
#include <ngraph/opsets/opset1.hpp>
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
#include "transformations/low_precision/common/subgraph.hpp"
#include "transformations/low_precision/network_helper.hpp"
namespace ngraph {
namespace pass {
namespace low_precision {
bool ConcatMultiChannelsTransformation::isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept {
for (const std::shared_ptr<ngraph::opset1::Concat>& concat : concatLayers) {
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(concat);
for (const std::shared_ptr<ngraph::Node>& child : children) {
if (is_type<ngraph::opset1::Convolution>(child.get())) {
return false;
}
}
}
return true;
}
void ConcatMultiChannelsTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
addSingleNodePattern<opset1::Concat>(pass, context);
}
bool ConcatMultiChannelsTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
if (!canBeTransformed(context, concat)) {
return false;
}
ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
std::unordered_set<std::string> handledLayers;
if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
return false;
}
if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
return false;
}
if (!isMultiChannel(subgraph.concatLayers)) {
ConcatTransformation::transform(context, m);
return false;
}
DataPrecision dataPrecision;
{
for (auto quantizationLayer : subgraph.quantizationLayers) {
std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer->shared_from_this());
const DataPrecision tmp = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
if (dataPrecision.precision == ngraph::element::undefined) {
dataPrecision = tmp;
continue;
}
if ((tmp.precision != dataPrecision.precision) && (tmp.precision == ngraph::element::u8)) {
dataPrecision = tmp;
}
}
}
std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
const std::shared_ptr<ngraph::Node>& fakeQuantizeLayer = subgraph.quantizationLayers[i];
const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
if (shape.size() < 4ul) {
return false;
}
const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
if (fq == nullptr) {
return false;
}
const DataPrecision currentDataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
const QuantizationDetails quantizationDetails = QuantizationDetails::getDetails(fq);
// 1. get data for dequantization. Dequantization data will be used several times later.
const FakeQuantizeDequantization fakeQuantizeDequantization = ngraph::pass::low_precision::NetworkHelper::createDequantizationFromFakeQuantize(
fq,
dataPrecision.precision,
dataPrecision.min,
dataPrecision.max,
dataPrecision.precision == currentDataPrecision.precision ? currentDataPrecision.hasZeroPoint : true,
updatePrecisions);
dequantizations[fakeQuantizeLayer->get_friendly_name()] = fakeQuantizeDequantization;
// 2. update FakeQuantize - one time action
const std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
fq,
updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
roundf(dataPrecision.min),
roundf(dataPrecision.max));
subgraph.quantizationLayers[i] = newFakeQuantizeLayer;
subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
}
auto dequantizationValuesCallback = [&](
std::shared_ptr<ngraph::Node> layer,
const std::string originalLayerName,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
if (layer->get_friendly_name() != originalLayerName) {
const auto update = [](
const std::string& originalLayerName,
const std::string& newLayerName,
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationLayers) {
auto it = dequantizationLayers.find(originalLayerName);
if (it != dequantizationLayers.end()) {
dequantizationLayers.emplace(newLayerName, it->second);
dequantizationLayers.erase(it);
}
};
update(originalLayerName, layer->get_friendly_name(), dequantizations);
}
fillDequantization(
layer,
dequantizations,
dequantizationsToConcatenate);
};
addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
if (updatePrecisions) {
for (const auto it : subgraph.layers) {
const std::shared_ptr<ngraph::Node> node = it.second;
if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node)) {
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
} else {
// set precision to explicitly to have updated precision during transformation
for (size_t i = 0; i < node->get_output_size(); ++i) {
node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
}
}
}
}
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
}
return true;
}
bool ConcatMultiChannelsTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
return true;
}
// fill dequantizationsToMerge collection for layer with using dequantizationByFakeQuantize
void ConcatMultiChannelsTransformation::fillDequantization(
std::shared_ptr<ngraph::Node> layer,
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>> fakeQuantizes;
std::shared_ptr<ngraph::opset1::FakeQuantize> currentFakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(layer);
if (currentFakeQuantize != nullptr) {
fakeQuantizes.push_back(currentFakeQuantize);
} else {
fillQuantization(layer, fakeQuantizes);
if (fakeQuantizes.size() == layer->get_input_size()) {
updateDequantizationShapesIfNecessary(layer, fakeQuantizes, dequantizationByFakeQuantize);
}
}
for (const auto& fakeQuantize : fakeQuantizes) {
const auto it = dequantizationByFakeQuantize.find(fakeQuantize->get_friendly_name());
if (it == dequantizationByFakeQuantize.end()) {
THROW_IE_LPT_EXCEPTION(*fakeQuantize) << "dequantization scale values are not found";
}
const FakeQuantizeDequantization& fakeQuantizeDequantization = it->second;
dequantizationsToConcatenate.push_back(fakeQuantizeDequantization);
}
}
void ConcatMultiChannelsTransformation::updateDequantizationShapesIfNecessary(
std::shared_ptr<ngraph::Node> layer,
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize) {
for (int i = 0; i < fakeQuantizes.size(); ++i) {
ngraph::Shape inputShape = layer->get_input_shape(i);
ngraph::Shape dequantizationShape = fakeQuantizes[i]->get_shape();
if (inputShape[1] != dequantizationShape[1]) {
FakeQuantizeDequantization replacedDequantization = dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()];
const float scale = as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<float>()[0];
const float shift = replacedDequantization.subtract ?
as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<float>()[0] : 0.f;
const auto precisionBefore = replacedDequantization.data.get_element_type();
const auto precisionAfter = replacedDequantization.multiply->get_element_type();
auto newDequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
scale, shift, precisionBefore, inputShape, precisionAfter, 0.f, 5.f);
dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()] = newDequantization;
}
}
}
void ConcatMultiChannelsTransformation::fillQuantization(
const std::shared_ptr<ngraph::Node> layer,
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes) {
for (int i = 0; i < layer->get_input_size(); ++i) {
std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(i);
std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
if (fakeQuantize != nullptr) {
fakeQuantizes.push_back(fakeQuantize);
} else {
fillQuantization(parent, fakeQuantizes);
}
}
}
} // namespace low_precision
} // namespace pass
} // namespace ngraph

Some files were not shown because too many files have changed in this diff Show More