Es/lpt/lpt to ngraph fixes2 with master (#2671)
* [LPT] Replace creation of dequantization with factory * [ngraph][LPT] Add ScaleShift replace for dequantization operations * [LPT] SubtractMultiplyToMultiplyAdd refactoring * [LPT] Code style fix * [LPT] Edit SubtractMultiplyToMultiplyAdd transformation for dequantization * [LPT] Linux compilation quick fix * [LPT] [WIP] runtime info applying * [LPT] Concat transformation functional tests extending * [LPT] MultiplyToConvolution + Subtract to add fusing + improvements in LowPrecisionTransformer * [LPT] linux compilation error fix * [LPT] compilation error * [LPT] MultiplyToGroupConvolution fix: 5D support * [LPT] Multiply transformation extending: FQ weights support - wip * [LPT] FQ folding & precision selection * [LPT] code style fixes * [LPT] code style fixes * [LPT] Linux compilation error fix * [LPT] SubtractMultiplyToMultiplyAdd: refactoring * [LPT] Tests fixes * [LPT] MultiplyToGroupConvolution tests * [LPT] Convert subtract with int inputs to Eltwise sub * [LPT] Constant folding fix for quant models * [LPT] 1) Asymmetric quantization improvement 2) tests extending * [LPT] 2 fixes for se_resnext_50 * [LPT] Add transformation priority branch selection test * [LPT] AddMultiplyFusion: legacy transformation quick fix * [LPT] nGraph tests temporary disabling * [LPT] Fix for eltwise inputs with multiple outputs * [LPT] Fix for FQ fuse * [LPT] Reshape by channel, batch temporary disabled * [nGraph][LPT] MatMul fix for reading FP16 models * [LPT] 1) Add (not after Convolution/GroupConvolution/MatMul with Constant) to Subtract 2) precision selection fix: MultiplyToGroupConvolution quick fix * [LPT] DenseNet improvments: AddTransformation: Add to Subtract + tests * [LPT] AddTransformarion refactoring * [LPT] AddTransformation tests temporay disabled * [LPT] ReshapeTransformation improvements: degradation fix * [LPT] code style fix * [LPT] Concat tests temporary disabling * [LPT] tests unification 1) plugin tests: added test-cases and nGraph-validation for clamp, split and variadic split 2) func tests: added test-cases 3) transformNGraph: added the ability to run additional transformations * [LPT] split & variadic split merge fix * [LPT] Clamp: added support for asymmetric quantization * [LPT] added DequantizationAttr run-time attribute * [LPT] debug info removal * [LPT] ConcatTransformation: zero point fix * [LPT] CNNNetwork ReLU transformation quick fix * [LPT] 1) Concat fix 2) ConcatMultiChannels fix 3) Added "Concat with Split" test-cases 4) Subgraph fix * [LPT] 1) Concat fix 2) Added "Concat with different precision on childs" test-case * [LPT] concat fix Ubuntu18 * [LPT] Concat test fixes * [LPT] Not fp32 FQ input support * [LPT] MatMul Fix + separateInStandaloneBranch Fix * [LPT] Fix reference input types in mish fusion tests * [LPT] Fix cpuFuncTests on CentOS building * [nGraph][LPT] ScaleShift 2d, 3d nGraph conversion enabling * [LPT] 1) FullyConnected workaround removing 2) validate_nodes_and_infer_types for LPT * [ngraph] Add check for childs for ConvertSubtract * [LPT] Squeeze/Unsqueeze tests unification * [LPT] Squeeze/Unsqueeze change signature for getReference/getOriginal * [LPT] Mul & Add -> ScaleShift quick fix * [LPT] nGraph tests emporary disabling * [LPT] code style fix * [LPT] code style fix #2 * [LPT] nGraph tests temporary disabling * [LPT] code styl fix #3 * [LPT] shared plugin tests temporary disabling * [LPT] cleanup * [LPT] nGraph unit_tests tests temproary disabling * [LPT] nGraph unit tests disabling #2 * [LPT] nGraph tests disabling * [LPT] nGraph tests temporary disabling * [LPT] WA removing * [LPT] CentOS compilation fix * [LPT] KMB wa to avoid compilation error * [LPT] functional test temporary disabling * [nGraph] code style fixes * [LPT] ConcatTransformation: data movement operation as intermediate handling * [LPT] FuseSubtractToFakeQuantize after VariadicSplit * [LPT] ConcatWithSplitTransformation functional test temporary disabling * [LPT] Clamp and ConcatWithDifferentPrecisionsOnChilds: tests fix * [LPT] MatMul: bert-nv-mlperf-quantized fix * [LPT] Add to convolution biases fuse fix * [LPT] GPU plugin tests fixes * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] CLDNN Plugin FP16 conversion * [LPT] AvgPool update precision if there is not FQ after + convolution precision limitation on activation * [LPT] Convolution fixes * [LPT] FuseSubtractToFakequantize & FuseMultiplyToFakeQuantize improvement * [LPT] FuseSubtractToFakeQuantize test fix * [LPT] FuseSubtractToFakeQuantizeTransformation tests * [LPT] code style fix * [LPT] AvgPool child recursive extend * [LPT] AvgPool tests + fix * [LPT] compilation quick fix * [LPT] Add to convolution biases fuse fix * [LPT] Linux issues: MatMulWithOptimizedConstantFakeQuantizeTransformation temporary disabled * [LPT] Normalize GPU plugin tests fix * [LPT] test-commit * [LPT] 1) added the ability to create sub without dequantizationAttribute 2) fixed optimizeMulAfter: added copying rt_info 3) Tests Unification: Convolution transformation 4) added cleanRunTimeInfo into Network Helper * [LPT] Tests Unification: GroupConvolution * [LPT] removed debug info * [LPT] functional tests for Convolution & GroupConvolution extending * [LPT] [MatMul] Quick fix ubuntu error * [LPT] MatMulTransformation quick test fix: one constant for both intervals * [nGraph] code style fix * [LPT] added output_precision to NormalizeIE * [nGraph] NormalizeIE fix for LPT support * [LPT] nGraph WA removal * [LPT] fixed fillSubgraph for concat multi channels * [LPT] MatMul fix * [nGraph] WA removal: 1) nGraph tests enabling 2) LPT extanding: not handle in FP32 * [LPT] nGraph WA removal: function tests skip config rollback * [LPT] WA removal: precision propagation fix * [LPT] ConvertMulOrAddFinally transformation extending * [nGraph] ConvolutionMultiplyFusion rollback (move from legacy to common) * [nGraph] ConvertMulAddToScaleShiftOrPower: WA removal * [nGraph] TypeRelaxed: WA removal * [nGraph] WA removal: TypeRelaxed * [LPT] WA removal: ConcatTransformation * [nGraph] WA removal: Eltwise & ConvertMulOrAddFinally fixes to support LPT * [nGraph] MulAddConversion fix: 2D & 3D ScaleShift are supproted * [nGraph] VisualizeTree extending * [LPT] FakeQuantizeDequantization extending: check element wise dequantization operation * [LPT] FakeQuantizeDequantization extending: SubtractMultiplyToMultiplyAddTransformation & WeightableLayerTransformation * [LPT] Convolution + test infrastructure update * [LPT] GPU compilation error * [nGraph] BatchNorm plugin tests: input tensor definition * [LPT] LowPrecisionTransformer::isFunctionQuantized was added * [nGraph] WA final cleanup * [nGraph] ScaleShiftIE quick fix * [LPT] Functional tests: added test-cases "Concat with intermediate with constant" * [LPT] Transformer::isNetworkquantized fix * [LPT] SubtractMultiplyToMultiplyAdd zero Add remove: fix for ssd300 on gpu * [LPT] MultiplyToGroupConvolution not transform on Const * [LPT] workaround for negative scales * [LPT] Convert standalone dequantization Mul,Sub,Add to ScaleShift * [LPT] SubtractMultiplyToMultiplyAdd test fix * [LPT] Clamp transformation: GPU tests fix * [LPT] Transformer tests * [LPT] FakeQuantizePrecisionSelectionTransformation was disabled for GPU * [LPT] TransformerIsFunctionQuantized refactoring * [nGraph] code style fix * [LPT] mobilenet_v2_tf_depthwise test update * [LPT] TMP: dequantization folding * [LPT] Elementwise transformation fix: dequantization operations constant folding * [LPT] cleanup * [LPT] denormal values fix * [LPT] FuseFakeQuantize test fixed + negative multiply case * [LPT] FP32 -> FP16 conversion info * [LPT] FQ dot interval support + swapMultiplyAdd safely division * [LPT] test fix * [LPT] Tests for dot interval on FQ + tests for addTransformation enabling * [LPT] Clamp transformation fix * [LPT] FQ prec selection test fix * [LPT] Clamp test case * [LPT] Concat division precision fix * [LPT] cleanup * [LPT] merge fix * [LPT] WIP: MatMul asymmetric quantization fix (BERT) * [LPT] MatMulWithOptimizedConstantFakeQuantizeTransformation disabled * [LPT] GPU Plugin set config fix * [LPT] Fix merge mistakes * [LPT] Rollback device specific INT8 * [LPT] ReshapeFullyConnected fix: FullyConnected output fix * [LPT] bert-base-chinese GPU fix * [ngraph/LPT] Tests for fix convert_mul_or_add_finally with dequantization [ngraph/LPT] Fix convert mul_or_add_finally with dequantization * [LPT] ScaleShift dim < 4 only dequantization conversion * [LPT] MatMul transformation tests extensing * [LPT] ReshapeFullyConnected legacy transformation: LPT test case addition * [nGraph] VisualizeTree extending: property names displying to simplify search * [LPT] getDequantization extending * [LPT] MulAddToScaleshiftOrPower: out precision fix & tests * [LPT] Multiply to ScaleShiftIE: Multiply transformation: remove DEQUANTIZATION if not valid * [LPT] Concat test case * [nGraph] try to fix opencv compatibility * [nGraph] nGraph code style fix * [LPT] InPlace dequantization folding * [LPT] Multiply constant folding test * [LPT] Fix plugin test case for MatMulWithOptimizedConstantFakeQuantize [LPT] Enable MatMulWithOptimizedConstantFakeQuantize plugin test * [LPT] Convolution transformation: mulConst shape fix * [LPT] INT8 Constant folding branch for elementwise ops optimization removal * [LPT] eltwise for const branch fix * [LPT] linux fix * [LPT] Multiply test refactoring * [LPT] Convert Fuse in Constant + tests * [LPT] function comparation: runtime info comparation rollback * [LPT] linux build fix * [LPT] linux build fix2 * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT * [LPT] Reshape transformation update: don't broadcast by batch * [LPT] MatMul transformation limitation was added to be similar as CNNNetwork LPT - refactoring * [LPT] MatMul transformation: transpose input tensors fix * [LPT] checkElementwise for AddTransformation WA: should be moved to getDequantization * [LPT] merge fix * [LPT] MatMul fix & tests * [LPT] AddTransformation tests * [LPT] Interpolate transformation enabled * [LPT] constant folding before LPT * [LPT] WIP: not completed tests * [LPT] GPU degradation fix * [LPT] FuseConvert workaround * [LPT] code cleanup * [LPT] Interpolate GPU test quick fix * [LPT] GroupConvolution fix * [LPT] Fix fusing multiply for non-dequantization layers * [LPT] GPU pipeline update: enableInt8 initialization place update * [LPT] tests compilation fix * [LPT] merge fix * [LPT] tests enabling * [LPT] merge issue resolving * [LPT] LPT CNNNetwork usage macros: part #1: source code * [LPT] LPT CNNNetwork usage macros: part #2: cmake files update and tests addoption * [LPT] LPT workaround from nGraph core removing * [LPT] previous LPT version tests * [LPT] inference_engine_lp_transformations was returned back * [LPT] replace_node rollback * [LPT] ConvertSubtract fix * [LPT] GPU: baselineIsFP16 reuse fix * [LPT] FakeQuantizeTransformation: GPU workaround: I32 -> FP32 Convert is not fused * [LPT] AvgPool output precision workaround * [LPT] Group convolution precision + Subtract to ScaleShift const fix * [LPT] SubMulToMulAdd & Transpose: action-recognition-0001 fix * [LPT] Transpose: added test with per-tensor quantization Co-authored-by: Aleksandr Pertovsky <aleksandr.pertovsky@intel.com> Co-authored-by: Zinoviev, Vladimir <vladimir.zinoviev@intel.com> Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com> Co-authored-by: Gorokhov Dmitriy <dmitry.gorokhov@intel.com>
This commit is contained in:
parent
ca95240c91
commit
c2271da637
@ -21,9 +21,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
|
||||
SOURCES ${MAIN_SRC} ${LIBRARY_HEADERS}
|
||||
VERSION_DEFINES_FOR cldnn_engine.cpp)
|
||||
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
|
||||
clDNN_lib pugixml inference_engine_transformations)
|
||||
|
||||
if (USE_CNNNETWORK_LPT)
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
|
||||
endif()
|
||||
|
||||
set (CLDNN_TOP_FOLDER ${IE_MAIN_SOURCE_DIR}/thirdparty/clDNN)
|
||||
target_include_directories(${TARGET_NAME} PRIVATE
|
||||
${CMAKE_CURRENT_SOURCE_DIR}
|
||||
|
@ -34,7 +34,9 @@
|
||||
#include <transformations/opset_conversions/convert_opset2_to_opset1.hpp>
|
||||
#include <transformations/opset_conversions/convert_opset3_to_opset2.hpp>
|
||||
#include <transformations/init_node_info.hpp>
|
||||
#include <transformations/convert_precision.hpp>
|
||||
#include <transformations/rt_info/fused_names_attribute.hpp>
|
||||
|
||||
#include <legacy/convert_function_to_cnn_network.hpp>
|
||||
#include <legacy/ie_util_internal.hpp>
|
||||
#include <legacy/graph_transformer.h>
|
||||
@ -43,6 +45,9 @@
|
||||
#include "cldnn_executable_network.h"
|
||||
#include "cldnn_custom_layer.h"
|
||||
|
||||
#include <transformations/low_precision/transformer.hpp>
|
||||
#include <transformations/low_precision/mat_mul.hpp>
|
||||
|
||||
#ifdef __linux__
|
||||
#include <dlfcn.h>
|
||||
#endif
|
||||
@ -73,8 +78,10 @@ cldnn::device_info clDNNEngine::GetDeviceInfo(const std::map<std::string, std::s
|
||||
return device_info;
|
||||
}
|
||||
|
||||
InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const {
|
||||
InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network, CLDNNPlugin::Config config) const {
|
||||
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
|
||||
bool baselineIsFP16 = false;
|
||||
|
||||
if (clonedNetwork->getFunction()) {
|
||||
const auto transformations_callback = [](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
|
||||
// Reshape->Permute->Reshape pattern in theory can change output rank, so this check is added to be sure
|
||||
@ -113,6 +120,12 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
|
||||
return can_use_reduce;
|
||||
}
|
||||
|
||||
if (auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(node)) {
|
||||
return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
|
||||
ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
|
||||
ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
|
||||
}
|
||||
|
||||
return std::dynamic_pointer_cast<const ::ngraph::opset2::Gelu>(node) ||
|
||||
std::dynamic_pointer_cast<const ::ngraph::opset3::ShuffleChannels>(node) ||
|
||||
std::dynamic_pointer_cast<const ::ngraph::opset2::BatchToSpace>(node) ||
|
||||
@ -128,6 +141,11 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
|
||||
// Disable shape inference (WA for generic operations)
|
||||
::ngraph::op::GenericIE::DisableReshape noReshape(nGraphFunc);
|
||||
|
||||
#ifndef USE_CNNNETWORK_LPT
|
||||
bool enableInt8;
|
||||
#endif
|
||||
|
||||
{
|
||||
// Note: instead of running all Conversion Transformations you can make up your own transformation pipeline
|
||||
ngraph::pass::Manager manager;
|
||||
manager.register_pass<ngraph::pass::InitNodeInfo>();
|
||||
@ -136,16 +154,51 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
|
||||
manager.register_pass<ngraph::pass::CommonOptimizations>();
|
||||
manager.register_pass<ngraph::pass::ConvertOpSet3ToOpSet2>();
|
||||
manager.register_pass<ngraph::pass::ConvertOpSet2ToOpSet1>();
|
||||
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
|
||||
|
||||
manager.set_callback(transformations_callback);
|
||||
manager.run_passes(nGraphFunc);
|
||||
|
||||
ngraph::pass::Manager ti_manager;
|
||||
// Unroll will be called after all conversions
|
||||
// temporarily switch back to plugin unroller from NGraph unroller until TI output names are corrected
|
||||
// ti_manager.register_pass<ngraph::pass::UnrollTensorIterator>();
|
||||
ti_manager.run_passes(nGraphFunc);
|
||||
#ifndef USE_CNNNETWORK_LPT
|
||||
enableInt8 = config.enableInt8 && ngraph::pass::low_precision::LowPrecisionTransformer::isFunctionQuantized(nGraphFunc);
|
||||
if (enableInt8) {
|
||||
const auto fp16_callback = [&baselineIsFP16](const std::shared_ptr<const ::ngraph::Node> &node) -> bool {
|
||||
if (!baselineIsFP16 && node->get_output_element_type(0) == ngraph::element::f16) {
|
||||
baselineIsFP16 = true;
|
||||
}
|
||||
|
||||
return true;
|
||||
};
|
||||
|
||||
ngraph::pass::Manager conversion_manager;
|
||||
// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
|
||||
conversion_manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::f16, ngraph::element::f32);
|
||||
conversion_manager.set_callback(fp16_callback);
|
||||
conversion_manager.run_passes(nGraphFunc);
|
||||
}
|
||||
#endif
|
||||
}
|
||||
|
||||
#ifndef USE_CNNNETWORK_LPT
|
||||
using namespace ngraph::pass::low_precision;
|
||||
if (enableInt8) {
|
||||
auto params = LayerTransformation::Params(
|
||||
true, // updatePrecisions
|
||||
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
|
||||
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
|
||||
true); // supportAsymmetricQuantization
|
||||
LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
|
||||
.add<MatMulTransformation, ngraph::opset1::MatMul>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false)));
|
||||
|
||||
transformer.transform(nGraphFunc);
|
||||
}
|
||||
#endif
|
||||
|
||||
{
|
||||
ngraph::pass::Manager manager = ngraph::pass::Manager();
|
||||
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
|
||||
manager.set_callback(transformations_callback);
|
||||
manager.run_passes(nGraphFunc);
|
||||
}
|
||||
|
||||
clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
|
||||
}
|
||||
@ -157,6 +210,17 @@ InferenceEngine::ICNNNetwork::Ptr clDNNEngine::CloneAndTransformNetwork(const In
|
||||
transformator.fullTrim();
|
||||
}
|
||||
|
||||
if (baselineIsFP16) {
|
||||
// [WA part1] Store 'lpt_back_to_fp16' flag to convert FP32 operations to original FP16 after LPT
|
||||
InputsDataMap inputsMap;
|
||||
clonedNetwork->getInputsInfo(inputsMap);
|
||||
|
||||
if (!inputsMap.empty()) {
|
||||
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
|
||||
input0.begin()->second->params["lpt_back_to_fp16"];
|
||||
}
|
||||
}
|
||||
|
||||
return clonedNetwork;
|
||||
}
|
||||
|
||||
@ -259,7 +323,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
|
||||
|
||||
context = m_defaultContext;
|
||||
|
||||
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), context, conf);
|
||||
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), context, conf);
|
||||
}
|
||||
|
||||
ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network,
|
||||
@ -283,7 +347,7 @@ ExecutableNetworkInternal::Ptr clDNNEngine::LoadExeNetworkImpl(const InferenceEn
|
||||
conf.max_dynamic_batch = static_cast<int>(network.getBatchSize());
|
||||
}
|
||||
|
||||
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network), casted, conf);
|
||||
return std::make_shared<CLDNNExecNetwork>(*CloneAndTransformNetwork(network, conf), casted, conf);
|
||||
}
|
||||
|
||||
RemoteContext::Ptr clDNNEngine::CreateContext(const ParamMap& params) {
|
||||
@ -326,7 +390,7 @@ QueryNetworkResult clDNNEngine::QueryNetwork(const ICNNNetwork& network,
|
||||
for (auto&& node : function->get_ops()) {
|
||||
originalOps.emplace(node->get_friendly_name());
|
||||
}
|
||||
auto clonedNetwork = CloneAndTransformNetwork(network);
|
||||
auto clonedNetwork = CloneAndTransformNetwork(network, _impl->m_config);
|
||||
std::unordered_set<std::string> supported;
|
||||
std::unordered_set<std::string> unsupported;
|
||||
|
||||
|
@ -27,7 +27,8 @@ class clDNNEngine : public InferenceEngine::InferencePluginInternal,
|
||||
CLDNNRemoteCLContext::Ptr m_defaultContext;
|
||||
|
||||
cldnn::device_info GetDeviceInfo(const std::map<std::string, std::string> &config) const;
|
||||
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network) const;
|
||||
InferenceEngine::ICNNNetwork::Ptr CloneAndTransformNetwork(const InferenceEngine::ICNNNetwork& network,
|
||||
CLDNNPlugin::Config config) const;
|
||||
public:
|
||||
clDNNEngine();
|
||||
|
||||
|
@ -88,9 +88,11 @@
|
||||
#include <sys/stat.h>
|
||||
#include <exec_graph_info.hpp>
|
||||
|
||||
#ifdef USE_CNNNETWORK_LPT
|
||||
#include "low_precision_transformations/transformer.hpp"
|
||||
#include "low_precision_transformations/fully_connected.hpp"
|
||||
#include "low_precision_transformations/gemm.hpp"
|
||||
#endif
|
||||
|
||||
#include <iostream>
|
||||
#include <iomanip>
|
||||
@ -397,26 +399,22 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
|
||||
, p_currentOutputs({}) {
|
||||
InitFormat(network);
|
||||
|
||||
if (config.enableInt8) {
|
||||
auto params = LayerTransformation::Params(true, // updatePrecisions
|
||||
true, // quantizeOutputs
|
||||
true, // weightsToConst
|
||||
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
|
||||
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
|
||||
true, // roundQuantizedValues
|
||||
true, // updateBiases
|
||||
true, // supportAsymmetricQuantization
|
||||
{Precision::U8, Precision::I8}, // Precision on activations
|
||||
{Precision::I8}); // Precision on weights
|
||||
|
||||
auto transforms = LowPrecisionTransformer::getAllTransformations(params)
|
||||
.add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
|
||||
.add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");
|
||||
|
||||
bool fqFound = false;
|
||||
bool allFQareSupported = true;
|
||||
|
||||
bool baselineIsFP16 = false;
|
||||
{
|
||||
InputsDataMap inputsMap;
|
||||
network.getInputsInfo(inputsMap);
|
||||
if (!inputsMap.empty()) {
|
||||
auto input0 = getInputTo(inputsMap.begin()->second->getInputData());
|
||||
if (!input0.empty() && (input0.begin()->second->params.count("lpt_back_to_fp16") != 0)) {
|
||||
baselineIsFP16 = true;
|
||||
fqFound = true;
|
||||
}
|
||||
}
|
||||
|
||||
#ifdef USE_CNNNETWORK_LPT
|
||||
bool allFQareSupported = true;
|
||||
if (config.enableInt8) {
|
||||
auto it = details::CNNNetworkIterator(&network);
|
||||
auto end = details::CNNNetworkIterator();
|
||||
while (it != end) {
|
||||
@ -436,6 +434,22 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
|
||||
}
|
||||
}
|
||||
|
||||
if (config.enableInt8) {
|
||||
auto params = LayerTransformation::Params(true, // updatePrecisions
|
||||
true, // quantizeOutputs
|
||||
true, // weightsToConst
|
||||
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
|
||||
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
|
||||
true, // roundQuantizedValues
|
||||
true, // updateBiases
|
||||
true, // supportAsymmetricQuantization
|
||||
{Precision::U8, Precision::I8}, // Precision on activations
|
||||
{Precision::I8}); // Precision on weights
|
||||
|
||||
auto transforms = LowPrecisionTransformer::getAllTransformations(params)
|
||||
.add<FullyConnectedTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "FullyConnected")
|
||||
.add<GemmTransformation>(LayerTransformation::Params(params).setSupportAsymmetricQuantization(false), "GEMM");
|
||||
|
||||
// [WA part1] Convert quantized FP16 model to FP32 to avoid possible overflow and mixed precision errors
|
||||
if (fqFound && allFQareSupported) {
|
||||
NetPass::ConvertPrecision(network, Precision::FP16, Precision::FP32);
|
||||
@ -443,8 +457,11 @@ Program::Program(InferenceEngine::ICNNNetwork& network, std::shared_ptr<const cl
|
||||
|
||||
LowPrecisionTransformer transformer(transforms);
|
||||
transformer.transform(network);
|
||||
}
|
||||
#endif
|
||||
|
||||
// [WA part2] Try to find non-quantized layers and convert them back to FP16
|
||||
if (config.enableInt8) {
|
||||
if (fqFound && baselineIsFP16 && config.enable_fp16_for_quantized_models) {
|
||||
auto layersSorted = BFSSort(network);
|
||||
|
||||
|
@ -57,7 +57,12 @@ target_compile_definitions(${TARGET_NAME}_test_static
|
||||
INTEGER_LOW_P
|
||||
USE_STATIC_IE)
|
||||
|
||||
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s inference_engine_lp_transformations libGNA::API)
|
||||
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_preproc_s libGNA::API)
|
||||
|
||||
if (USE_CNNNETWORK_LPT)
|
||||
target_link_libraries(${TARGET_NAME}_test_static PUBLIC inference_engine_lp_transformations)
|
||||
endif()
|
||||
|
||||
target_include_directories(${TARGET_NAME}_test_static PUBLIC ${CMAKE_CURRENT_SOURCE_DIR})
|
||||
set_target_properties(${TARGET_NAME}_test_static PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME}_test_static)
|
||||
|
||||
|
@ -22,13 +22,17 @@ public:
|
||||
|
||||
Eltwise(const Output<Node>& data1,
|
||||
const Output<Node>& data2,
|
||||
const ELTWISE_TYPE eltwise_type);
|
||||
const ELTWISE_TYPE eltwise_type,
|
||||
const element::Type output_type = element::undefined);
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
|
||||
|
||||
ELTWISE_TYPE eltwise_type;
|
||||
|
||||
private:
|
||||
element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -29,17 +29,21 @@ public:
|
||||
FullyConnected(const Output<Node> & A,
|
||||
const Output<Node> & B,
|
||||
const Output<Node> & C,
|
||||
const Shape & output_shape);
|
||||
const Shape & output_shape,
|
||||
const element::Type output_type = element::undefined);
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
|
||||
|
||||
size_t get_out_size() { return m_output_size; }
|
||||
size_t get_out_size() const { return m_output_size; }
|
||||
|
||||
element::Type get_output_type() const { return m_output_type; }
|
||||
|
||||
private:
|
||||
size_t m_output_size = 0;
|
||||
Shape m_output_shape = {};
|
||||
element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -25,7 +25,8 @@ public:
|
||||
const Output<Node>& weights,
|
||||
float eps,
|
||||
bool across_spatial,
|
||||
bool channel_shared);
|
||||
bool channel_shared,
|
||||
const ngraph::element::Type output_type);
|
||||
|
||||
float get_eps() const { return m_eps; }
|
||||
bool get_channel_shared() const { return m_channel_shared;}
|
||||
@ -39,6 +40,7 @@ protected:
|
||||
float m_eps;
|
||||
bool m_across_spatial;
|
||||
bool m_channel_shared;
|
||||
ngraph::element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -19,13 +19,16 @@ public:
|
||||
const NodeTypeInfo& get_type_info() const override { return type_info; }
|
||||
|
||||
PowerIE(const Output<Node>& data_batch,
|
||||
const float power, const float scale, const float shift);
|
||||
const float power, const float scale, const float shift, const element::Type output_type = element::undefined);
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
|
||||
|
||||
float scale, power, shift;
|
||||
|
||||
private:
|
||||
element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -18,7 +18,7 @@ public:
|
||||
static constexpr NodeTypeInfo type_info{"ReLUIE", 1};
|
||||
const NodeTypeInfo& get_type_info() const override { return type_info; }
|
||||
|
||||
ReLUIE(const Output<Node> & data, const float & negative_slope);
|
||||
ReLUIE(const Output<Node> & data, const float & negative_slope, const element::Type output_type);
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
@ -26,8 +26,11 @@ public:
|
||||
|
||||
float get_slope() { return m_negative_slope; }
|
||||
|
||||
element::Type get_output_type() const { return m_output_type; }
|
||||
|
||||
private:
|
||||
float m_negative_slope;
|
||||
element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -20,11 +20,15 @@ public:
|
||||
|
||||
ScaleShiftIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& weights,
|
||||
const Output<Node>& bias);
|
||||
const Output<Node>& bias,
|
||||
const element::Type output_type = element::undefined);
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& new_args) const override;
|
||||
|
||||
private:
|
||||
element::Type output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
120
inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/convert_mul_or_add_finally.hpp
Normal file → Executable file
120
inference-engine/src/legacy_api/include/legacy/transformations/convert_opset1_to_legacy/convert_mul_or_add_finally.hpp
Normal file → Executable file
@ -35,6 +35,7 @@ public:
|
||||
// This pass finally converts single Multiply and Add operations to ScaleShift or Power operation
|
||||
ConvertMulOrAddFinally() : GraphRewrite() {
|
||||
convert_mul_or_add_finally<ngraph::opset1::Add>();
|
||||
convert_mul_or_add_finally<ngraph::opset1::Subtract>();
|
||||
convert_mul_or_add_finally<ngraph::opset1::Multiply>();
|
||||
}
|
||||
|
||||
@ -52,11 +53,13 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
|
||||
et = ELTWISE_TYPE::Prod;
|
||||
} else if (std::is_same<T, ngraph::opset1::Add>()) {
|
||||
et = ELTWISE_TYPE::Sum;
|
||||
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
|
||||
et = ELTWISE_TYPE::Sub;
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et);
|
||||
auto eltwise = std::make_shared<ngraph::op::Eltwise>(data1, data2, et, node->output(0).get_element_type());
|
||||
eltwise->set_friendly_name(node->get_friendly_name());
|
||||
ngraph::copy_runtime_info(node, eltwise);
|
||||
ngraph::replace_node(node, eltwise);
|
||||
@ -66,7 +69,7 @@ bool convert_to_eltwise(std::shared_ptr<T> & node,
|
||||
template <typename T>
|
||||
ngraph::graph_rewrite_callback get_callback() {
|
||||
ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher& m) {
|
||||
static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Multiply>(),
|
||||
static_assert(std::is_same<T, ngraph::opset1::Add>() || std::is_same<T, ngraph::opset1::Subtract>() || std::is_same<T, ngraph::opset1::Multiply>(),
|
||||
"Unsupported template parameter. Only Add or Multiply allowed!");
|
||||
|
||||
auto lin_op = std::dynamic_pointer_cast<T> (m.get_match_root());
|
||||
@ -77,7 +80,10 @@ ngraph::graph_rewrite_callback get_callback() {
|
||||
const auto output_shape = lin_op->output(0).get_partial_shape();
|
||||
const auto output_shape_rank = output_shape.rank().get_length();
|
||||
|
||||
if (!lin_op->get_element_type().is_real()) {
|
||||
const auto intInputs = !lin_op->get_input_element_type(0).is_real() &&
|
||||
!lin_op->get_input_element_type(1).is_real();
|
||||
|
||||
if (!lin_op->get_element_type().is_real() || intInputs) {
|
||||
return convert_to_eltwise<T>(lin_op,
|
||||
lin_op->input(0).get_source_output(),
|
||||
lin_op->input(1).get_source_output());
|
||||
@ -147,14 +153,65 @@ ngraph::graph_rewrite_callback get_callback() {
|
||||
|
||||
auto res = check_constant(const_node, data_node.get_partial_shape());
|
||||
|
||||
if (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4)) {
|
||||
auto checkElementwise = [](const std::shared_ptr<ngraph::Node>& elementwise) -> bool {
|
||||
const ngraph::PartialShape partialShape = elementwise->get_input_partial_shape(0);
|
||||
if (partialShape.is_dynamic()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(1));
|
||||
if (constant == nullptr) {
|
||||
constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(elementwise->get_input_node_shared_ptr(0));
|
||||
}
|
||||
if (constant == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const ngraph::Shape constShape = constant->get_output_shape(0);
|
||||
if ((constShape.size() > 5ul)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
|
||||
return true;
|
||||
}
|
||||
|
||||
const ngraph::Shape shape = partialShape.to_shape();
|
||||
if (constShape.size() == shape.size()) {
|
||||
if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
|
||||
return false;
|
||||
}
|
||||
for (size_t i = 2ul; i < constShape.size(); ++i) {
|
||||
if (constShape[i] != 1ul) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
} else if (constShape.size() == (shape.size() - 1)) {
|
||||
if (constShape[0] != shape[1]) {
|
||||
return false;
|
||||
}
|
||||
for (size_t i = 1ul; i < constShape.size(); ++i) {
|
||||
if (constShape[i] != 1ul) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
};
|
||||
|
||||
bool is_dequantization = (lin_op->get_rt_info().count("DEQUANTIZATION") != 0) && checkElementwise(lin_op);
|
||||
|
||||
if (!is_dequantization && (res == CONVERSION_RESULT::NONE || (res == CONVERSION_RESULT::SCALE_SHIFT && output_shape_rank < 4))) {
|
||||
return convert_to_eltwise<T>(lin_op,
|
||||
lin_op->input(0).get_source_output(),
|
||||
lin_op->input(1).get_source_output());
|
||||
}
|
||||
|
||||
// TODO: if all values in Constant are equal the best way is to convert this Eltwise to Power
|
||||
if (res == CONVERSION_RESULT::SCALE_SHIFT) {
|
||||
if (res == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
|
||||
auto weights_et = const_node->get_element_type();
|
||||
auto weights_shape = const_node->get_shape();
|
||||
|
||||
@ -162,12 +219,49 @@ ngraph::graph_rewrite_callback get_callback() {
|
||||
std::shared_ptr<ngraph::op::ScaleShiftIE> scaleshift;
|
||||
if (std::is_same<T, ngraph::opset1::Add>()) {
|
||||
auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
|
||||
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(weights, output_shape),
|
||||
ngraph::op::util::normalize_constant(const_node, output_shape));
|
||||
} else {
|
||||
auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
|
||||
auto biases_in = ngraph::op::util::normalize_constant(const_node, output_shape);
|
||||
if (is_dequantization) {
|
||||
const ngraph::Shape data_shape = data_node.get_shape();
|
||||
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
|
||||
broadcasted_shape[1] = data_shape[1];
|
||||
|
||||
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
|
||||
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
|
||||
}
|
||||
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
|
||||
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
|
||||
std::shared_ptr<ngraph::Node> new_const_node = std::make_shared<ngraph::opset1::Multiply>(
|
||||
ngraph::op::util::normalize_constant(const_node, output_shape),
|
||||
ngraph::opset1::Constant::create(weights_et, ngraph::Shape{ 1 }, { -1 }));
|
||||
|
||||
auto weights = ngraph::opset1::Constant::create(weights_et, weights_shape, {1});
|
||||
auto weights_in = ngraph::op::util::normalize_constant(weights, output_shape);
|
||||
auto biases_in = new_const_node;
|
||||
if (is_dequantization) {
|
||||
const ngraph::Shape data_shape = data_node.get_shape();
|
||||
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
|
||||
broadcasted_shape[1] = data_shape[1];
|
||||
|
||||
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
|
||||
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
|
||||
}
|
||||
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
|
||||
} else if (std::is_same<T, ngraph::opset1::Multiply>()) {
|
||||
auto bias = ngraph::opset1::Constant::create(weights_et, weights_shape, {0});
|
||||
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, ngraph::op::util::normalize_constant(const_node, output_shape),
|
||||
ngraph::op::util::normalize_constant(bias, output_shape));
|
||||
auto weights_in = ngraph::op::util::normalize_constant(const_node, output_shape);
|
||||
auto biases_in = ngraph::op::util::normalize_constant(bias, output_shape);
|
||||
if (is_dequantization) {
|
||||
const ngraph::Shape data_shape = data_node.get_shape();
|
||||
ngraph::Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
|
||||
broadcasted_shape[1] = data_shape[1];
|
||||
|
||||
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
|
||||
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
|
||||
}
|
||||
scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
scaleshift->set_friendly_name(lin_op->get_friendly_name());
|
||||
@ -182,9 +276,11 @@ ngraph::graph_rewrite_callback get_callback() {
|
||||
// In case Add we create fake scale equal to 1, in case of Multiply we create fake shift equal to 0
|
||||
std::shared_ptr<ngraph::op::PowerIE> power;
|
||||
if (std::is_same<T, ngraph::opset1::Add>()) {
|
||||
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value);
|
||||
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., value, lin_op->get_output_element_type(0));
|
||||
} else if (std::is_same<T, ngraph::opset1::Multiply>()) {
|
||||
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0.);
|
||||
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., value, 0., lin_op->get_output_element_type(0));
|
||||
} else if (std::is_same<T, ngraph::opset1::Subtract>()) {
|
||||
power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., 1., -value, lin_op->get_output_element_type(0));
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
@ -80,7 +80,8 @@ private:
|
||||
auto new_fc = std::make_shared<op::FullyConnected>(reshape->input_value(0),
|
||||
fc->input_value(1),
|
||||
fc->input_value(2),
|
||||
fc->get_shape());
|
||||
fc->get_shape(),
|
||||
fc->output(0).get_element_type());
|
||||
|
||||
new_fc->set_friendly_name(fc->get_friendly_name());
|
||||
ngraph::copy_runtime_info({reshape, fc}, new_fc);
|
||||
|
@ -1637,6 +1637,9 @@ CNNLayer::Ptr NodeConverter<ngraph::op::Eltwise>::createLayer(const std::shared_
|
||||
case ELTWISE_TYPE::Sum:
|
||||
type = "sum";
|
||||
break;
|
||||
case ELTWISE_TYPE::Sub:
|
||||
type = "sub";
|
||||
break;
|
||||
case ELTWISE_TYPE::Prod:
|
||||
type = "prod";
|
||||
break;
|
||||
|
@ -15,8 +15,8 @@ using namespace ngraph;
|
||||
|
||||
constexpr NodeTypeInfo op::Eltwise::type_info;
|
||||
|
||||
op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type)
|
||||
: Op({data1, data2}), eltwise_type(eltwise_type) {
|
||||
op::Eltwise::Eltwise(const Output<Node>& data1, const Output<Node>& data2, const ELTWISE_TYPE eltwise_type, const element::Type output_type)
|
||||
: Op({data1, data2}), eltwise_type(eltwise_type), m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
@ -25,7 +25,7 @@ std::shared_ptr<Node> op::Eltwise::clone_with_new_inputs(const OutputVector& new
|
||||
throw ngraph_error("Incorrect number of new arguments");
|
||||
}
|
||||
|
||||
return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type);
|
||||
return make_shared<Eltwise>(new_args.at(0), new_args.at(1), eltwise_type, m_output_type);
|
||||
}
|
||||
|
||||
void op::Eltwise::validate_and_infer_types() {
|
||||
@ -34,8 +34,12 @@ void op::Eltwise::validate_and_infer_types() {
|
||||
element::Type data2_et = get_input_element_type(1);
|
||||
|
||||
element::Type et_result;
|
||||
if (m_output_type == element::undefined) {
|
||||
NODE_VALIDATION_CHECK(this, element::Type::merge(et_result, data1_et, data2_et),
|
||||
"Element types for first and second do not match :", data1_et, " and ", data2_et);
|
||||
} else {
|
||||
et_result = m_output_type;
|
||||
}
|
||||
|
||||
if (get_input_partial_shape(0).rank().is_dynamic() ||
|
||||
get_input_partial_shape(1).rank().is_dynamic()) {
|
||||
|
@ -12,8 +12,13 @@ using namespace ngraph;
|
||||
|
||||
constexpr NodeTypeInfo op::FullyConnected::type_info;
|
||||
|
||||
op::FullyConnected::FullyConnected(const Output<Node>& A, const Output<Node>& B, const Output<Node>& C, const Shape & output_shape)
|
||||
: Op({A, B, C}), m_output_shape(output_shape) {
|
||||
op::FullyConnected::FullyConnected(
|
||||
const Output<Node>& A,
|
||||
const Output<Node>& B,
|
||||
const Output<Node>& C,
|
||||
const Shape & output_shape,
|
||||
const element::Type output_type)
|
||||
: Op({A, B, C}), m_output_shape(output_shape), m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
@ -26,5 +31,8 @@ void op::FullyConnected::validate_and_infer_types() {
|
||||
if (m_output_shape.size() < 2)
|
||||
throw ngraph_error("FullyConnected shape is incorrect");
|
||||
m_output_size = m_output_shape.back();
|
||||
set_output_type(0, input_value(0).get_element_type(), m_output_shape);
|
||||
set_output_type(
|
||||
0,
|
||||
m_output_type == element::undefined ? input_value(0).get_element_type() : m_output_type,
|
||||
m_output_shape);
|
||||
}
|
||||
|
@ -15,15 +15,14 @@ using namespace ngraph;
|
||||
constexpr NodeTypeInfo op::NormalizeIE::type_info;
|
||||
|
||||
op::NormalizeIE::NormalizeIE(const Output<Node>& data, const Output<Node>& weights, float eps, bool across_spatial,
|
||||
bool channel_shared)
|
||||
: Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared) {
|
||||
bool channel_shared, const ngraph::element::Type output_type)
|
||||
: Op({data, weights}), m_eps(eps), m_across_spatial(across_spatial), m_channel_shared(channel_shared), m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
void op::NormalizeIE::validate_and_infer_types() {
|
||||
element::Type arg_type = get_input_element_type(0);
|
||||
PartialShape arg_shape = get_input_partial_shape(0);
|
||||
set_output_type(0, arg_type, arg_shape);
|
||||
set_output_type(0, m_output_type, arg_shape);
|
||||
|
||||
const PartialShape& input_shape = get_input_partial_shape(0);
|
||||
|
||||
@ -34,5 +33,5 @@ void op::NormalizeIE::validate_and_infer_types() {
|
||||
|
||||
shared_ptr<Node> op::NormalizeIE::clone_with_new_inputs(const OutputVector& new_args) const {
|
||||
check_new_args_count(this, new_args);
|
||||
return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared);
|
||||
return make_shared<op::NormalizeIE>(new_args.at(0), new_args.at(1), m_eps, m_across_spatial, m_channel_shared, m_output_type);
|
||||
}
|
||||
|
@ -14,8 +14,8 @@ using namespace ngraph;
|
||||
|
||||
constexpr NodeTypeInfo op::PowerIE::type_info;
|
||||
|
||||
op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift)
|
||||
: Op({data_batch}), scale(scale), power(power), shift(shift) {
|
||||
op::PowerIE::PowerIE(const Output<ngraph::Node>& data_batch, const float power, const float scale, const float shift, const element::Type output_type)
|
||||
: Op({data_batch}), scale(scale), power(power), shift(shift), m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
@ -24,9 +24,9 @@ std::shared_ptr<Node> op::PowerIE::clone_with_new_inputs(const OutputVector& new
|
||||
throw ngraph_error("Incorrect number of new arguments");
|
||||
}
|
||||
|
||||
return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift);
|
||||
return make_shared<PowerIE>(new_args.at(0), this->power, this->scale, this->shift, this->m_output_type);
|
||||
}
|
||||
|
||||
void op::PowerIE::validate_and_infer_types() {
|
||||
set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
|
||||
set_output_type(0, m_output_type == element::undefined ? get_input_element_type(0) : m_output_type, get_input_partial_shape(0));
|
||||
}
|
||||
|
@ -15,16 +15,19 @@ using namespace ngraph;
|
||||
|
||||
constexpr NodeTypeInfo op::ReLUIE::type_info;
|
||||
|
||||
op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope)
|
||||
: Op(OutputVector {data}), m_negative_slope(negative_slope) {
|
||||
op::ReLUIE::ReLUIE(const Output<Node>& data, const float& negative_slope, const element::Type output_type)
|
||||
: Op(OutputVector {data}), m_negative_slope(negative_slope), m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> op::ReLUIE::clone_with_new_inputs(const OutputVector& new_args) const {
|
||||
check_new_args_count(this, new_args);
|
||||
return make_shared<ReLUIE>(new_args.at(0), m_negative_slope);
|
||||
return make_shared<ReLUIE>(new_args.at(0), m_negative_slope, m_output_type);
|
||||
}
|
||||
|
||||
void op::ReLUIE::validate_and_infer_types() {
|
||||
set_output_type(0, get_input_element_type(0), get_input_partial_shape(0));
|
||||
set_output_type(
|
||||
0,
|
||||
m_output_type == element::undefined ? get_input_element_type(0) : m_output_type,
|
||||
get_input_partial_shape(0));
|
||||
}
|
||||
|
@ -14,8 +14,25 @@ using namespace ngraph;
|
||||
|
||||
constexpr NodeTypeInfo op::ScaleShiftIE::type_info;
|
||||
|
||||
op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias)
|
||||
: Op({data_batch, weights, bias}) {
|
||||
element::Type getMaxBitwidth(const std::vector<element::Type>& types) {
|
||||
if (types.empty()) {
|
||||
return element::undefined;
|
||||
}
|
||||
|
||||
element::Type maxType = types[0];
|
||||
for (size_t i = 1; i < types.size(); ++i) {
|
||||
if (types[i].bitwidth() > maxType.bitwidth()) {
|
||||
maxType = types[i];
|
||||
}
|
||||
}
|
||||
return maxType;
|
||||
}
|
||||
|
||||
op::ScaleShiftIE::ScaleShiftIE(const Output<Node>& data_batch, const Output<Node>& weights, const Output<Node>& bias, const element::Type output_type)
|
||||
: Op({data_batch, weights, bias}), output_type(output_type) {
|
||||
if (this->output_type == element::undefined) {
|
||||
this->output_type = getMaxBitwidth({ data_batch.get_element_type(), weights.get_element_type(), bias.get_element_type() });
|
||||
}
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
@ -24,12 +41,12 @@ std::shared_ptr<Node> op::ScaleShiftIE::clone_with_new_inputs(const OutputVector
|
||||
throw ngraph_error("Incorrect number of new arguments");
|
||||
}
|
||||
|
||||
return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2));
|
||||
return make_shared<ScaleShiftIE>(new_args.at(0), new_args.at(1), new_args.at(2), output_type);
|
||||
}
|
||||
|
||||
void op::ScaleShiftIE::validate_and_infer_types() {
|
||||
// Check that weights and biases has the same type
|
||||
element::Type data_et = get_input_element_type(0);
|
||||
element::Type data_et = output_type == element::undefined ? get_input_element_type(0) : output_type;
|
||||
element::Type weights_et = get_input_element_type(1);
|
||||
element::Type biases_et = get_input_element_type(2);
|
||||
|
||||
|
@ -143,9 +143,9 @@ ngraph::pass::ConvertMatMulToFC::ConvertMatMulToFC() {
|
||||
|
||||
// Create FullyConnected
|
||||
std::vector<float> bias_value(O, 0);
|
||||
auto fc_bias = opset1::Constant::create(matmul->get_input_element_type(0), Shape {O}, bias_value);
|
||||
auto fc_bias = opset1::Constant::create(matmul->get_output_element_type(0), Shape {O}, bias_value);
|
||||
|
||||
auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape);
|
||||
auto fc = std::make_shared<op::FullyConnected>(fc_input_a, fc_input_b, fc_bias, output_shape, matmul->output(0).get_element_type());
|
||||
fc->set_friendly_name(matmul->get_friendly_name());
|
||||
new_ops.push_back(fc);
|
||||
|
||||
@ -207,7 +207,7 @@ ngraph::pass::ConvertMatMulToGemm::ConvertMatMulToGemm() {
|
||||
new_ops.push_back(fc_input_b.get_node_shared_ptr());
|
||||
}
|
||||
|
||||
auto gemm = std::make_shared<opset1::MatMul>(fc_input_a, fc_input_b, matmul->get_transpose_a(), matmul->get_transpose_b());
|
||||
auto gemm = matmul->copy_with_new_inputs({ fc_input_a, fc_input_b });
|
||||
new_ops.push_back(gemm);
|
||||
|
||||
if (gemm->get_shape() != output_shape) {
|
||||
|
@ -87,6 +87,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
|
||||
const_bias_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(add_input_0);
|
||||
}
|
||||
|
||||
if (const_bias_node->output(0).get_element_type() != add_node->output(0).get_element_type()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto mul_input_0 = mul_node->input(0).get_source_output().get_node_shared_ptr();
|
||||
auto mul_input_1 = mul_node->input(1).get_source_output().get_node_shared_ptr();
|
||||
|
||||
@ -97,6 +101,10 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
|
||||
const_weights_node = ngraph::as_type_ptr<ngraph::opset1::Constant>(mul_input_0);
|
||||
}
|
||||
|
||||
if (const_weights_node->output(0).get_element_type() != mul_node->output(0).get_element_type()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (add_node->get_output_partial_shape(0).rank().is_dynamic() ||
|
||||
mul_node->get_output_partial_shape(0).rank().is_dynamic()) {
|
||||
return false;
|
||||
@ -137,13 +145,16 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
|
||||
const auto output_shape = add_node->get_output_partial_shape(0);
|
||||
const auto output_shape_rank = output_shape.rank().get_length();
|
||||
|
||||
bool is_dequantization =
|
||||
(add_node->get_rt_info().count("DEQUANTIZATION") != 0 || mul_node->get_rt_info().count("DEQUANTIZATION") != 0);
|
||||
|
||||
if (res1 == CONVERSION_RESULT::NONE || res2 == CONVERSION_RESULT::NONE ||
|
||||
((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && output_shape_rank < 4)) {
|
||||
((res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) && !is_dequantization && output_shape_rank < 4)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// TODO: in case if scale and shift constants has equal values the best way is to convert them to Power
|
||||
if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT) {
|
||||
if (res1 == CONVERSION_RESULT::SCALE_SHIFT || res2 == CONVERSION_RESULT::SCALE_SHIFT || is_dequantization) {
|
||||
NodeVector new_ops;
|
||||
|
||||
auto weights_in = ngraph::op::util::normalize_constant(const_weights_node, output_shape);
|
||||
@ -151,16 +162,29 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
|
||||
new_ops.push_back(weights_in);
|
||||
new_ops.push_back(biases_in);
|
||||
|
||||
if (res1 == CONVERSION_RESULT::POWER) {
|
||||
if (is_dequantization) {
|
||||
const Shape data_shape = data_node.get_shape();
|
||||
Shape broadcasted_shape = std::vector<size_t>(data_shape.size(), 1ul);
|
||||
broadcasted_shape[1] = data_shape[1];
|
||||
|
||||
weights_in = ngraph::op::util::broadcastTo(weights_in, broadcasted_shape);
|
||||
new_ops.push_back(weights_in);
|
||||
|
||||
biases_in = ngraph::op::util::broadcastTo(biases_in, broadcasted_shape);
|
||||
new_ops.push_back(biases_in);
|
||||
}
|
||||
|
||||
if (res1 == CONVERSION_RESULT::POWER && !is_dequantization) {
|
||||
weights_in = ngraph::op::util::broadcastTo(weights_in, biases_in->get_shape());
|
||||
new_ops.push_back(weights_in);
|
||||
}
|
||||
if (res2 == CONVERSION_RESULT::POWER) {
|
||||
if (res2 == CONVERSION_RESULT::POWER && !is_dequantization) {
|
||||
biases_in = ngraph::op::util::broadcastTo(biases_in, weights_in->get_shape());
|
||||
new_ops.push_back(biases_in);
|
||||
}
|
||||
|
||||
auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in);
|
||||
auto output_type = m.get_match_root()->get_output_element_type(0);
|
||||
auto scaleshift = std::make_shared<ngraph::op::ScaleShiftIE>(data_node, weights_in, biases_in, output_type);
|
||||
new_ops.push_back(scaleshift);
|
||||
|
||||
scaleshift->set_friendly_name(add_node->get_friendly_name());
|
||||
@ -175,7 +199,8 @@ void ngraph::pass::ConvertMulAddToScaleShiftOrPower::convert_mul_add_to_scaleshi
|
||||
return false;
|
||||
}
|
||||
|
||||
auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift);
|
||||
auto output_type = m.get_match_root()->get_output_element_type(0);
|
||||
auto power = std::make_shared<ngraph::op::PowerIE>(data_node, 1., scale, shift, output_type);
|
||||
power->set_friendly_name(add_node->get_friendly_name());
|
||||
ngraph::copy_runtime_info({mul_node, add_node}, power);
|
||||
ngraph::replace_node(m.get_match_root(), power);
|
||||
|
@ -62,7 +62,8 @@ ngraph::pass::ConvertNormalizeL2WithMulToNormalizeIE::ConvertNormalizeL2WithMulT
|
||||
constant->output(0),
|
||||
normalize->get_eps(),
|
||||
across_spatial,
|
||||
channel_shared);
|
||||
channel_shared,
|
||||
normalize->get_element_type());
|
||||
|
||||
normalize_ie->set_friendly_name(mul->get_friendly_name());
|
||||
ngraph::copy_runtime_info({normalize, mul}, normalize_ie);
|
||||
@ -93,13 +94,14 @@ ngraph::pass::ConvertNormalizeL2ToLegacyMatcher::ConvertNormalizeL2ToLegacyMatch
|
||||
bool across_channels = !(axis.size() == 1 && axis[0] == 1);
|
||||
bool channel_shared = true;
|
||||
|
||||
auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->get_input_element_type(0), Shape{1}, std::vector<float>{1.0});
|
||||
auto scale = std::make_shared<ngraph::opset1::Constant>(normalize->output(0).get_element_type(), Shape{1}, std::vector<float>{1.0});
|
||||
|
||||
auto normalize_ie = std::make_shared<ngraph::op::NormalizeIE> (normalize->input(0).get_source_output(),
|
||||
scale->output(0),
|
||||
normalize->get_eps(),
|
||||
across_channels,
|
||||
channel_shared);
|
||||
channel_shared,
|
||||
normalize->get_element_type());
|
||||
|
||||
normalize_ie->set_friendly_name(normalize->get_friendly_name());
|
||||
ngraph::copy_runtime_info(normalize, normalize_ie);
|
||||
|
@ -33,7 +33,7 @@ ngraph::pass::ConvertPowerToPowerIEMatcher::ConvertPowerToPowerIEMatcher() {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0);
|
||||
auto power_ie = std::make_shared<ngraph::op::PowerIE>(power->input(0).get_source_output(), value, 1, 0, power->output(0).get_element_type());
|
||||
power_ie->set_friendly_name(power->get_friendly_name());
|
||||
ngraph::copy_runtime_info(power, power_ie);
|
||||
ngraph::replace_node(power, power_ie);
|
||||
|
@ -33,7 +33,7 @@ ngraph::pass::ConvertPReLUToReLUIE::ConvertPReLUToReLUIE() {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value);
|
||||
auto relu_ie = std::make_shared<ngraph::op::ReLUIE>(prelu->input(0).get_source_output(), value, prelu->output(0).get_element_type());
|
||||
relu_ie->set_friendly_name(prelu->get_friendly_name());
|
||||
ngraph::copy_runtime_info(prelu, relu_ie);
|
||||
ngraph::replace_node(prelu, relu_ie);
|
||||
|
@ -25,7 +25,7 @@ ngraph::pass::ConvertSqrtToPowerIEMatcher::ConvertSqrtToPowerIEMatcher() {
|
||||
if (!sqrt) {
|
||||
return false;
|
||||
}
|
||||
auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0);
|
||||
auto power_ie = std::make_shared<ngraph::op::PowerIE>(sqrt->input(0).get_source_output(), 0.5f, 1, 0, sqrt->output(0).get_element_type());
|
||||
power_ie->set_friendly_name(sqrt->get_friendly_name());
|
||||
ngraph::copy_runtime_info(sqrt, power_ie);
|
||||
ngraph::replace_node(sqrt, power_ie);
|
||||
|
@ -65,7 +65,8 @@ ngraph::pass::FullyConnectedBiasFusion::FullyConnectedBiasFusion() {
|
||||
auto new_fc = std::make_shared<op::FullyConnected>(m_fc->input(0).get_source_output(),
|
||||
m_fc->input(1).get_source_output(),
|
||||
final_bias,
|
||||
m_fc->get_shape());
|
||||
m_fc->get_shape(),
|
||||
m_fc->get_output_type());
|
||||
new_ops.push_back(new_fc);
|
||||
|
||||
new_fc->set_friendly_name(add->get_friendly_name());
|
||||
|
@ -44,6 +44,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
|
||||
new_dilations,
|
||||
new_pads_begin,
|
||||
new_pad_end,
|
||||
node->get_output_element_type(0),
|
||||
node->get_group(),
|
||||
node->get_auto_pad());
|
||||
} else {
|
||||
@ -54,6 +55,7 @@ std::shared_ptr<Node> convert(const Output<Node> & data, std::shared_ptr<op::Con
|
||||
new_dilations,
|
||||
new_pads_begin,
|
||||
new_pad_end,
|
||||
node->get_output_element_type(0),
|
||||
node->get_group(),
|
||||
node->get_auto_pad());
|
||||
}
|
||||
|
@ -52,7 +52,8 @@ ngraph::pass::ReshapeFullyConnected::ReshapeFullyConnected() {
|
||||
auto fc_new = std::make_shared<op::FullyConnected>(reshape,
|
||||
fc->input_value(1),
|
||||
fc->input_value(2),
|
||||
output_shape_new);
|
||||
output_shape_new,
|
||||
fc->get_output_type());
|
||||
new_ops.push_back(fc_new);
|
||||
|
||||
if (output_shape != output_shape_new) {
|
||||
|
@ -51,3 +51,7 @@ install(TARGETS ${TARGET_NAME}
|
||||
RUNTIME DESTINATION ${IE_CPACK_RUNTIME_PATH} COMPONENT core
|
||||
ARCHIVE DESTINATION ${IE_CPACK_ARCHIVE_PATH} COMPONENT core
|
||||
LIBRARY DESTINATION ${IE_CPACK_LIBRARY_PATH} COMPONENT core)
|
||||
|
||||
if (USE_CNNNETWORK_LPT)
|
||||
target_compile_definitions(${TARGET_NAME} PUBLIC USE_CNNNETWORK_LPT)
|
||||
endif()
|
||||
|
@ -103,16 +103,5 @@ void ActivationTransformation::transform(TransformationContext& context, CNNLaye
|
||||
CNNNetworkHelper::removeLayer(context.network, scaleShift);
|
||||
context.removeLayer(*scaleShift);
|
||||
|
||||
const std::vector<CNNLayerPtr> children = CNNNetworkHelper::getChildren(*activationLayer);
|
||||
for (const CNNLayerPtr& child : children) {
|
||||
const std::vector<CNNLayerPtr> dequantizationLayers = CNNNetworkHelper::addScaleShiftBetween(
|
||||
context,
|
||||
activationLayer,
|
||||
child,
|
||||
DequantizationDetails(scales, shifts));
|
||||
|
||||
for (const auto& dequantizationLayer : dequantizationLayers) {
|
||||
context.dequantizationLayersNames.insert(dequantizationLayer->name);
|
||||
}
|
||||
}
|
||||
addDequantizationLayer(context, *activationLayer, scales, shifts);
|
||||
}
|
||||
|
@ -1332,6 +1332,8 @@ void CNNNetworkHelper::addLayerToCNNNetworkAfterData(
|
||||
THROW_IE_EXCEPTION << "parent data is absent";
|
||||
}
|
||||
netImpl->removeOutput(parent->name);
|
||||
netImpl->addData(parent->name.c_str(), parentOutData);
|
||||
|
||||
netImpl->addData(layer->name.c_str(), newEdgeAfterLayer);
|
||||
netImpl->addOutput(layer->name);
|
||||
}
|
||||
|
@ -329,7 +329,7 @@ void WeightableLayerTransformation::updateToSupportAsymmetricQuantization(
|
||||
const PrecisionsInfo& weightsPrecisionsInfo,
|
||||
std::vector<float>& weightsShifts) const {
|
||||
const CNNLayerPtr parentOnData = CNNNetworkHelper::getParent(layer, 0ul);
|
||||
if (parentOnData->type == "ScaleShift") {
|
||||
if (parentOnData->type == "ScaleShift") { // FIXME: it is always true
|
||||
const std::shared_ptr<float> dataConvertedInBlob = CNNNetworkHelper::convertFloatData(
|
||||
dataShifts.data(),
|
||||
dataShifts.size(),
|
||||
|
@ -167,9 +167,13 @@ ie_add_plugin(NAME ${TARGET_NAME}
|
||||
set_ie_threading_interface_for(${TARGET_NAME})
|
||||
|
||||
target_compile_definitions(${TARGET_NAME} PUBLIC -DMKLDNN_THR=${MKLDNN_THR})
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine inference_engine_lp_transformations
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine
|
||||
inference_engine_transformations mkldnn)
|
||||
|
||||
if (USE_CNNNETWORK_LPT)
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE inference_engine_lp_transformations)
|
||||
endif()
|
||||
|
||||
# Cross compiled function
|
||||
# TODO: The same for proposal, proposalONNX, topk
|
||||
cross_compiled_file(${TARGET_NAME}
|
||||
|
@ -16,17 +16,20 @@
|
||||
#include <legacy/ie_util_internal.hpp>
|
||||
#include <legacy/graph_tools.hpp>
|
||||
#include <threading/ie_executor_manager.hpp>
|
||||
|
||||
#ifdef USE_CNNNETWORK_LPT
|
||||
#include "low_precision_transformations/convolution.hpp"
|
||||
#include "low_precision_transformations/eltwise.hpp"
|
||||
#include "low_precision_transformations/fully_connected.hpp"
|
||||
#include "low_precision_transformations/scaleshift_to_convolution.hpp"
|
||||
#include "low_precision_transformations/transformer.hpp"
|
||||
#endif
|
||||
|
||||
#include <threading/ie_cpu_streams_executor.hpp>
|
||||
#include <ie_system_conf.h>
|
||||
#include <threading/ie_thread_affinity.hpp>
|
||||
#include <algorithm>
|
||||
#include <unordered_set>
|
||||
#include <utility>
|
||||
#include <cstring>
|
||||
|
||||
using namespace MKLDNNPlugin;
|
||||
using namespace InferenceEngine;
|
||||
@ -51,6 +54,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
|
||||
// we are cloning network if we have statistics and we can transform network.
|
||||
_clonedNetwork = cloneNet(network);
|
||||
|
||||
#ifdef USE_CNNNETWORK_LPT
|
||||
if (_cfg.lpTransformsMode == Config::LPTransformsMode::On) {
|
||||
auto params = LayerTransformation::Params(true, // updatePrecisions
|
||||
true, // quantizeOutputs
|
||||
@ -94,6 +98,7 @@ MKLDNNExecNetwork::MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network
|
||||
bf16Transformer.convertToFloat(cnnetwork);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
MKLDNNGraph::ApplyUnrollPasses(static_cast<ICNNNetwork&>(*_clonedNetwork));
|
||||
|
||||
|
@ -32,7 +32,6 @@
|
||||
|
||||
#include "precision_utils.h"
|
||||
#include <ie_plugin_config.hpp>
|
||||
#include "low_precision_transformations/transformer.hpp"
|
||||
|
||||
#include "utils/blob_dump.h"
|
||||
|
||||
|
@ -256,6 +256,10 @@ void MKLDNNGraphOptimizer::FuseConvolutionAndZeroPoints(MKLDNNGraph &graph) {
|
||||
if (arg0->getCnnLayer()->outData[0]->getPrecision() != Precision::U8)
|
||||
return false;
|
||||
|
||||
if (parent0->getParentEdgesAtPort(1)[0]->getDims().size() < 2) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != 1 &&
|
||||
parent0->getParentEdgesAtPort(1)[0]->getDims()[1] != IC)
|
||||
return false;
|
||||
@ -495,6 +499,9 @@ void MKLDNNGraphOptimizer::MergeTwoEqualScaleShifts(MKLDNNGraph& graph) {
|
||||
};
|
||||
|
||||
auto isEqualScaleShiftNodes = [](MKLDNNNodePtr node1, MKLDNNNodePtr node2) {
|
||||
if (node1->getParentEdgeAt(0) != node2->getParentEdgeAt(0))
|
||||
return false;
|
||||
|
||||
auto *depthwiseNode1 = dynamic_cast<MKLDNNDepthwiseNode *>(node1.get());
|
||||
auto *depthwiseNode2 = dynamic_cast<MKLDNNDepthwiseNode *>(node2.get());
|
||||
|
||||
|
@ -53,6 +53,12 @@
|
||||
#include <ngraph/op/util/op_types.hpp>
|
||||
#include <ngraph/pass/manager.hpp>
|
||||
|
||||
#include <transformations/common_optimizations/lin_op_sequence_fusion.hpp>
|
||||
#include <transformations/low_precision/transformer.hpp>
|
||||
#include <transformations/low_precision/convolution.hpp>
|
||||
#include <transformations/low_precision/group_convolution.hpp>
|
||||
#include <transformations/low_precision/multiply_to_group_convolution.hpp>
|
||||
|
||||
#if !defined(__arm__) && !defined(_M_ARM) && !defined(__aarch64__) && !defined(_M_ARM64)
|
||||
#if defined(_WIN32) || defined(WIN32)
|
||||
#include <intrin.h>
|
||||
@ -76,7 +82,7 @@ Engine::~Engine() {
|
||||
ExecutorManager::getInstance()->clear("CPUCallbackExecutor");
|
||||
}
|
||||
|
||||
static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
|
||||
static void Transformation(ICNNNetwork::Ptr& clonedNetwork, const Config& conf) {
|
||||
OV_ITT_SCOPED_TASK(MKLDNNPlugin::itt::domains::MKLDNNPlugin, "Transformation");
|
||||
|
||||
auto nGraphFunc = clonedNetwork->getFunction();
|
||||
@ -104,9 +110,6 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
|
||||
manager.register_pass<ngraph::pass::ConvertPrecision>(precision.first, precision.second);
|
||||
}
|
||||
|
||||
manager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
|
||||
manager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
|
||||
|
||||
auto pass_config = manager.get_pass_config();
|
||||
|
||||
using const_node_ptr = const std::shared_ptr<const ngraph::Node>;
|
||||
@ -144,6 +147,47 @@ static void Transformation(ICNNNetwork::Ptr& clonedNetwork) {
|
||||
|
||||
manager.run_passes(nGraphFunc);
|
||||
|
||||
#ifndef USE_CNNNETWORK_LPT
|
||||
using namespace ngraph::pass::low_precision;
|
||||
if (conf.lpTransformsMode == Config::LPTransformsMode::On) {
|
||||
auto params = LayerTransformation::Params(
|
||||
true, // updatePrecisions
|
||||
LayerTransformation::QuantizedTensorAlignment::UpdateLevel, // quantizedTensorAlignmentOnActivations
|
||||
LayerTransformation::QuantizedTensorAlignment::None, // quantizedTensorAlignmentOnWeights
|
||||
true); // supportAsymmetricQuantization
|
||||
LowPrecisionTransformer transformer(LowPrecisionTransformer::getAllTransformations(params)
|
||||
.add<ConvolutionTransformation, ngraph::opset1::Convolution>(
|
||||
LayerTransformation::Params(params).setPrecisionsOnActivations({ngraph::element::u8}).setSupportAsymmetricQuantization(true))
|
||||
.add<GroupConvolutionTransformation, ngraph::opset1::GroupConvolution>(
|
||||
LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 }).setSupportAsymmetricQuantization(true))
|
||||
.addStandaloneCleanup<MultiplyToGroupConvolutionTransformation, ngraph::opset1::Multiply>(
|
||||
LayerTransformation::Params(params).setPrecisionsOnActivations({ ngraph::element::u8 })));
|
||||
|
||||
transformer.transform(nGraphFunc);
|
||||
}
|
||||
#endif
|
||||
|
||||
ngraph::pass::Manager legacyManager;
|
||||
legacyManager.register_pass<ngraph::pass::ConvertOpSet1ToLegacy>();
|
||||
legacyManager.register_pass<ngraph::pass::ConvertPrecision>(ngraph::element::i64, ngraph::element::i32);
|
||||
|
||||
auto legacyPassConfig = manager.get_pass_config();
|
||||
legacyPassConfig->set_callback<ngraph::pass::AddMultiplyFusion>([](const_node_ptr &node) -> bool {
|
||||
if (auto mul_op = std::dynamic_pointer_cast<const ngraph::opset1::Multiply>(node)) {
|
||||
auto add_op = std::dynamic_pointer_cast<const ngraph::opset1::Add>(mul_op->get_input_node_shared_ptr(0));
|
||||
auto constant = std::dynamic_pointer_cast<const ngraph::opset1::Constant>(mul_op->get_input_node_shared_ptr(1));
|
||||
bool is_dequantization = mul_op->get_rt_info().count("DEQUANTIZATION") != 0;
|
||||
if (add_op && constant && is_dequantization) {
|
||||
return ngraph::is_type<ngraph::opset1::Convolution>(add_op->get_input_node_shared_ptr(0)) ||
|
||||
ngraph::is_type<ngraph::opset1::GroupConvolution>(add_op->get_input_node_shared_ptr(0)) ||
|
||||
ngraph::is_type<ngraph::opset1::MatMul>(add_op->get_input_node_shared_ptr(0));
|
||||
}
|
||||
}
|
||||
return false;
|
||||
});
|
||||
|
||||
legacyManager.run_passes(nGraphFunc);
|
||||
|
||||
clonedNetwork = InferenceEngine::details::convertFunctionToICNNNetwork(nGraphFunc, *clonedNetwork);
|
||||
|
||||
// WA: after conversion to CNNNetwork user precision can redefine input/output precisions
|
||||
@ -187,7 +231,7 @@ Engine::LoadExeNetworkImpl(const InferenceEngine::ICNNNetwork &network, const st
|
||||
std::shared_ptr<ICNNNetwork> clonedNetwork = cloneNetwork(network);
|
||||
bool is_transformed = false;
|
||||
if (clonedNetwork->getFunction()) {
|
||||
Transformation(clonedNetwork);
|
||||
Transformation(clonedNetwork, conf);
|
||||
is_transformed = true;
|
||||
}
|
||||
auto implNetwork = std::dynamic_pointer_cast<details::CNNNetworkImpl>(clonedNetwork);
|
||||
@ -312,8 +356,17 @@ QueryNetworkResult Engine::QueryNetwork(const ICNNNetwork& network, const std::m
|
||||
for (auto&& node : function->get_ops()) {
|
||||
originalOps.emplace(node->get_friendly_name());
|
||||
}
|
||||
|
||||
// TODO: Clarify the behavior of SetConfig method. Skip eng_config or not?
|
||||
Config conf = engConfig;
|
||||
conf.readProperties(config);
|
||||
|
||||
if (conf.enableDynamicBatch) {
|
||||
conf.batchLimit = static_cast<int>(network.getBatchSize());
|
||||
}
|
||||
|
||||
auto clonedNetwork = cloneNetwork(network);
|
||||
Transformation(clonedNetwork);
|
||||
Transformation(clonedNetwork, conf);
|
||||
std::unordered_set<std::string> supported;
|
||||
std::unordered_set<std::string> unsupported;
|
||||
for (details::CNNNetworkIterator itLayer{clonedNetwork.get()}; itLayer != details::CNNNetworkIterator(); itLayer++) {
|
||||
|
@ -112,7 +112,10 @@ public:
|
||||
exec_cast<PrecisionTrait<Precision::U8>::value_type, PrecisionTrait<Precision::I32>::value_type>(inputs[0], outputs[0]);
|
||||
break;
|
||||
default:
|
||||
std::string errorMsg = "Unsupported precisions!";
|
||||
std::stringstream ss;
|
||||
ss << "Unsupported precisions: " << inputs[0]->getTensorDesc().getPrecision() << " -> " << outputs[0]->getTensorDesc().getPrecision();
|
||||
std::string errorMsg = ss.str();
|
||||
|
||||
if (resp) {
|
||||
errorMsg.copy(resp->msg, sizeof(resp->msg)-1);
|
||||
}
|
||||
|
@ -158,7 +158,7 @@ void MKLDNNGenericNode::execLayer() {
|
||||
InferenceEngine::ResponseDesc resp;
|
||||
InferenceEngine::StatusCode rc = impls[0]->execute(inputs, outputs, &resp);
|
||||
if (rc != InferenceEngine::OK) {
|
||||
THROW_IE_EXCEPTION << resp.msg;
|
||||
THROW_IE_EXCEPTION << this->getTypeStr() << ":" << this->getName() << ": " << resp.msg;
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -47,6 +47,7 @@ public:
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const element::Type output_type,
|
||||
const size_t& group = 1,
|
||||
const PadType& auto_pad = PadType::EXPLICIT);
|
||||
|
||||
@ -57,9 +58,32 @@ public:
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const element::Type output_type,
|
||||
const size_t& group = 1,
|
||||
const PadType& auto_pad = PadType::EXPLICIT);
|
||||
|
||||
// KMB compilation support
|
||||
ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& filters,
|
||||
const Strides& strides,
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const size_t& group = 1,
|
||||
const PadType& auto_pad = PadType::EXPLICIT);
|
||||
|
||||
// KMB compilation support
|
||||
ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& filters,
|
||||
const Output<Node>& bias,
|
||||
const Strides& strides,
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const size_t& group = 1,
|
||||
const PadType& auto_pad = PadType::EXPLICIT);
|
||||
|
||||
|
||||
void validate_and_infer_types() override;
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector & new_args) const override;
|
||||
@ -90,6 +114,7 @@ protected:
|
||||
CoordinateDiff m_pads_end;
|
||||
PadType m_auto_pad;
|
||||
size_t m_group;
|
||||
element::Type m_output_type;
|
||||
};
|
||||
|
||||
} // namespace op
|
||||
|
@ -12,6 +12,7 @@
|
||||
#include <transformations_visibility.hpp>
|
||||
|
||||
#include "ngraph/op/op.hpp"
|
||||
#include "transformations/low_precision/common/dequantization_op.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace op {
|
||||
@ -190,6 +191,7 @@ void TypeRelaxed<BaseOp>::validate_and_infer_types() {
|
||||
BaseOp::get_input_tensor(i).set_tensor_type(old_input_types[i], BaseOp::get_input_partial_shape(i));
|
||||
}
|
||||
|
||||
|
||||
// Override (some) output types
|
||||
for (size_t i = 0; i < BaseOp::get_output_size(); ++i) {
|
||||
auto overridden_output_type = get_overridden_output_type(i);
|
||||
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/eltwise_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API AddTransformation : public EltwiseBaseTransformation {
|
||||
public:
|
||||
AddTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
|
||||
~AddTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <algorithm>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API AvgPoolTransformation : public LayerTransformation {
|
||||
public:
|
||||
AvgPoolTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,26 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ClampTransformation : public LayerTransformation {
|
||||
public:
|
||||
ClampTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,138 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/check.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
#include "transformations_visibility.hpp"
|
||||
#include "transformations/rt_info/dequantization_attribute.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
// template<typename BaseOp2>
|
||||
// class TRANSFORMATIONS_API DequantizationOp : public BaseOp2 {
|
||||
// public:
|
||||
// template <typename ... Args>
|
||||
// DequantizationOp(Args&&... args) : BaseOp2(std::forward<Args>(args)...) {
|
||||
// init();
|
||||
// }
|
||||
//
|
||||
// std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
|
||||
// std::shared_ptr<Node> cloned = BaseOp2::clone_with_new_inputs(inputs);
|
||||
// auto& rtInfo = cloned->get_rt_info();
|
||||
// rtInfo = get_rt_info();
|
||||
//
|
||||
// return cloned;
|
||||
// }
|
||||
//
|
||||
// protected:
|
||||
// void init() {
|
||||
// auto& rtInfo = get_rt_info();
|
||||
// rtInfo["DEQUANTIZATION"] = std::make_shared<ngraph::VariantWrapper<std::string>>("");
|
||||
// }
|
||||
// };
|
||||
//
|
||||
// using DequantizationConvert = DequantizationOp<ngraph::opset1::Convert>;
|
||||
// using DequantizationSubtract = DequantizationOp<ngraph::opset1::Subtract>;
|
||||
// using DequantizationMultiply = DequantizationOp<ngraph::opset1::Multiply>;
|
||||
|
||||
namespace {
|
||||
void initRuntimeInfo(ngraph::Node& operation) {
|
||||
auto& rtInfo = operation.get_rt_info();
|
||||
rtInfo["DEQUANTIZATION"] = std::make_shared<VariantWrapper<DequantizationAttr>>(DequantizationAttr());
|
||||
}
|
||||
|
||||
// #include <ngraph/rt_info.hpp>
|
||||
// ngraph::copy_runtime_info(from, to);
|
||||
void copyRuntimeInfo(const ngraph::Node& from, ngraph::Node& to) {
|
||||
const auto& rtInfoFrom = from.get_rt_info();
|
||||
auto& rtInfoTo = to.get_rt_info();
|
||||
rtInfoTo = rtInfoFrom;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
||||
class TRANSFORMATIONS_API DequantizationConvert : public ngraph::opset1::Convert {
|
||||
public:
|
||||
DequantizationConvert(const ngraph::Output<Node>& arg, const ngraph::element::Type& destination_type) :
|
||||
ngraph::opset1::Convert(arg, destination_type) {
|
||||
initRuntimeInfo(*this);
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
|
||||
std::shared_ptr<Node> cloned = ngraph::opset1::Convert::clone_with_new_inputs(inputs);
|
||||
copyRuntimeInfo(*this, *cloned);
|
||||
return cloned;
|
||||
}
|
||||
};
|
||||
|
||||
class TRANSFORMATIONS_API DequantizationSubtract : public ngraph::opset1::Subtract {
|
||||
public:
|
||||
DequantizationSubtract(
|
||||
const ngraph::Output<Node>& arg0,
|
||||
const ngraph::Output<Node>& arg1,
|
||||
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
|
||||
ngraph::opset1::Subtract(arg0, arg1, auto_broadcast) {
|
||||
initRuntimeInfo(*this);
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
|
||||
std::shared_ptr<Node> cloned = ngraph::opset1::Subtract::clone_with_new_inputs(inputs);
|
||||
copyRuntimeInfo(*this, *cloned);
|
||||
return cloned;
|
||||
}
|
||||
};
|
||||
|
||||
class TRANSFORMATIONS_API DequantizationMultiply : public ngraph::opset1::Multiply {
|
||||
public:
|
||||
DequantizationMultiply(
|
||||
const Output<Node>& arg0,
|
||||
const Output<Node>& arg1,
|
||||
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
|
||||
ngraph::opset1::Multiply(arg0, arg1, auto_broadcast) {
|
||||
initRuntimeInfo(*this);
|
||||
}
|
||||
|
||||
DequantizationMultiply(const ngraph::opset1::Multiply& multiply) :
|
||||
ngraph::opset1::Multiply(multiply) {
|
||||
initRuntimeInfo(*this);
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
|
||||
std::shared_ptr<Node> cloned = ngraph::opset1::Multiply::clone_with_new_inputs(inputs);
|
||||
copyRuntimeInfo(*this, *cloned);
|
||||
return cloned;
|
||||
}
|
||||
};
|
||||
|
||||
class TRANSFORMATIONS_API DequantizationAdd : public ngraph::opset1::Add {
|
||||
public:
|
||||
DequantizationAdd(
|
||||
const ngraph::Output<Node>& arg0,
|
||||
const ngraph::Output<Node>& arg1,
|
||||
const ngraph::op::AutoBroadcastSpec& auto_broadcast = ngraph::op::AutoBroadcastSpec(ngraph::op::AutoBroadcastType::NUMPY)) :
|
||||
ngraph::opset1::Add(arg0, arg1, auto_broadcast) {
|
||||
initRuntimeInfo(*this);
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> clone_with_new_inputs(const OutputVector& inputs) const override {
|
||||
std::shared_ptr<Node> cloned = ngraph::opset1::Add::clone_with_new_inputs(inputs);
|
||||
copyRuntimeInfo(*this, *cloned);
|
||||
return cloned;
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,41 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <tuple>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
typedef std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> FakeQuantizeDequantizationValues;
|
||||
|
||||
class FakeQuantizeDequantization {
|
||||
public:
|
||||
FakeQuantizeDequantization();
|
||||
|
||||
FakeQuantizeDequantization(
|
||||
Output<Node> data,
|
||||
std::shared_ptr<ngraph::opset1::Convert> convert,
|
||||
std::shared_ptr<ngraph::opset1::Subtract> subtract,
|
||||
std::shared_ptr<ngraph::opset1::Multiply> multiply);
|
||||
|
||||
bool empty() const;
|
||||
bool isShared() const;
|
||||
bool isLowPrecision() const;
|
||||
static bool checkElementwise(const std::shared_ptr<ngraph::Node>& elementwise);
|
||||
|
||||
Output<Node> data;
|
||||
std::shared_ptr<opset1::Convert> convert;
|
||||
std::shared_ptr<opset1::Subtract> subtract;
|
||||
std::shared_ptr<opset1::Multiply> multiply;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,52 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <exception>
|
||||
#include <string>
|
||||
#include <ngraph/node.hpp>
|
||||
#include <transformations_visibility.hpp>
|
||||
|
||||
/**
|
||||
* @def THROW_TRANSFORMATION_EXCEPTION_LPT
|
||||
* @brief A macro used to throw the exception with a notable description for low precision transformations
|
||||
*/
|
||||
#define THROW_IE_LPT_EXCEPTION(node) throw ::ngraph::pass::low_precision::InferenceEngineLptException(__FILE__, __LINE__, node)
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API InferenceEngineException : std::exception {
|
||||
std::shared_ptr<std::ostringstream> buffer;
|
||||
mutable std::string buffer_str;
|
||||
public:
|
||||
template <typename T>
|
||||
InferenceEngineException& operator<< (const T& x) {
|
||||
*buffer << x;
|
||||
return *this;
|
||||
}
|
||||
|
||||
const char* what() const noexcept override {
|
||||
buffer_str = buffer->str();
|
||||
return buffer_str.c_str();
|
||||
}
|
||||
};
|
||||
|
||||
#define THROW_TRANSFORMATION_EXCEPTION throw ::ngraph::pass::low_precision::InferenceEngineException() << __FILE__ << ":" << __LINE__ << " "
|
||||
|
||||
|
||||
class TRANSFORMATIONS_API InferenceEngineLptException : public InferenceEngineException {
|
||||
public:
|
||||
InferenceEngineLptException(const std::string& filename, const size_t line, const Node& node) {
|
||||
*this
|
||||
<< filename << ":" << line << " Exception during low precision transformation for "
|
||||
<< node << " node with type '" << node.get_type_name() << "', name '" << node.get_friendly_name() << "'. ";
|
||||
}
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,41 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/check.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
#include "../ilayer_transformations_manager.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class Subgraph {
|
||||
public:
|
||||
Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager);
|
||||
|
||||
bool fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers);
|
||||
bool empty() const;
|
||||
|
||||
std::vector<std::shared_ptr<ngraph::Node>> quantizationLayers;
|
||||
std::vector<std::shared_ptr<ngraph::opset1::Concat>> concatLayers;
|
||||
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> layers;
|
||||
|
||||
private:
|
||||
bool fillSubgraphForQuantization(const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize, std::unordered_set<std::string>& handledLayers);
|
||||
bool fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers);
|
||||
bool fill(const std::shared_ptr<ngraph::Node>& concat, std::unordered_set<std::string>& handledLayers);
|
||||
const ngraph::pass::ILayerTransformationsManager* layerTransformationsManager;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,56 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <algorithm>
|
||||
#include <functional>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
|
||||
#include "layer_transformation.hpp"
|
||||
#include "common/subgraph.hpp"
|
||||
#include "common/fake_quantize_dequantization.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ConcatTransformation : public LayerTransformation {
|
||||
public:
|
||||
ConcatTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~ConcatTransformation() override {};
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
|
||||
protected:
|
||||
void addDequantizationLayers(
|
||||
TransformationContext& context,
|
||||
ngraph::pass::low_precision::Subgraph& subgraph,
|
||||
std::function<void(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
const std::string originalLayerName,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const;
|
||||
|
||||
static bool isHandled(
|
||||
const TransformationContext& context,
|
||||
const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations);
|
||||
|
||||
private:
|
||||
size_t getMinQuantizationLevels(
|
||||
const DataPrecision& dataPrecision,
|
||||
const float maxOutputInterval,
|
||||
const std::vector<QuantizationDetails>& quantizationLayersDetails,
|
||||
const float outputLowValue,
|
||||
const float outputHighValue) const;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,47 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
|
||||
#include "concat.hpp"
|
||||
#include "common/subgraph.hpp"
|
||||
#include "common/fake_quantize_dequantization.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ConcatMultiChannelsTransformation : public ConcatTransformation {
|
||||
public:
|
||||
ConcatMultiChannelsTransformation(const Params& params) : ConcatTransformation(params) {}
|
||||
~ConcatMultiChannelsTransformation() override {};
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
private:
|
||||
static void fillDequantization(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate);
|
||||
|
||||
static void fillQuantization(const std::shared_ptr<ngraph::Node> layer, std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes);
|
||||
|
||||
static void updateDequantizationShapesIfNecessary(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
|
||||
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize);
|
||||
|
||||
bool isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ConvertTransformation : public LayerTransformation {
|
||||
public:
|
||||
ConvertTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~ConvertTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "weightable_layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ConvolutionTransformation : public WeightableLayerTransformation {
|
||||
public:
|
||||
ConvolutionTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "transparent_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API DepthToSpaceTransformation : public TransparentBaseTransformation {
|
||||
public:
|
||||
DepthToSpaceTransformation(const Params& params) : TransparentBaseTransformation(params) {}
|
||||
~DepthToSpaceTransformation() override {}
|
||||
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,29 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API EltwiseBaseTransformation : public LayerTransformation {
|
||||
public:
|
||||
EltwiseBaseTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
static bool isBroadcasted(const Shape& shape) noexcept;
|
||||
protected:
|
||||
int getNotEmpty(const std::shared_ptr<Node>& eltwise) const;
|
||||
std::pair<int, int> getMultiplyConstBranch(const std::shared_ptr<Node>& eltwise) const;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,33 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
#include "transformations/low_precision/fuse_fake_quantize.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API FakeQuantizeTransformation : public LayerTransformation {
|
||||
public:
|
||||
FakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~FakeQuantizeTransformation() override {};
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
static bool checkElementwise(const std::shared_ptr<Node>& eltwise);
|
||||
private:
|
||||
std::shared_ptr<opset1::FakeQuantize> fuseElementwise(
|
||||
TransformationContext& context,
|
||||
const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
#include "transformations/low_precision/eltwise_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API FuseConvertTransformation : public LayerTransformation {
|
||||
public:
|
||||
FuseConvertTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~FuseConvertTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,31 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API FuseFakeQuantizeTransformation : public LayerTransformation {
|
||||
public:
|
||||
FuseFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~FuseFakeQuantizeTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
private:
|
||||
std::shared_ptr<opset1::FakeQuantize> handle(
|
||||
TransformationContext& context,
|
||||
const std::shared_ptr<opset1::FakeQuantize>& fakeQuantize) const;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API FuseMultiplyToFakeQuantizeTransformation : public LayerTransformation {
|
||||
public:
|
||||
FuseMultiplyToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~FuseMultiplyToFakeQuantizeTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API FuseSubtractToFakeQuantizeTransformation : public LayerTransformation {
|
||||
public:
|
||||
FuseSubtractToFakeQuantizeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~FuseSubtractToFakeQuantizeTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "convolution.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API GroupConvolutionTransformation : public ConvolutionTransformation {
|
||||
public:
|
||||
GroupConvolutionTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/node.hpp>
|
||||
#include "transformations_visibility.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
|
||||
/**
|
||||
* @brief low precision transformation component interface.
|
||||
*/
|
||||
class TRANSFORMATIONS_API ILayerTransformationsManager {
|
||||
public:
|
||||
virtual bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept = 0;
|
||||
virtual bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept = 0;
|
||||
};
|
||||
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "transparent_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API InterpolateTransformation : public LayerTransformation {
|
||||
public:
|
||||
InterpolateTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~InterpolateTransformation() override {}
|
||||
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <vector>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <transformations_visibility.hpp>
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
|
||||
/**
|
||||
* @brief low precision transformation component interface.
|
||||
*/
|
||||
class TRANSFORMATIONS_API IParamsManager {
|
||||
public:
|
||||
// TODO FIXME: it is not correct to have a string as a key here, try to use NodeTypeInfo
|
||||
virtual std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept = 0;
|
||||
};
|
||||
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,380 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <algorithm>
|
||||
#include <limits>
|
||||
#include <list>
|
||||
#include <memory>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/pass/graph_rewrite.hpp>
|
||||
|
||||
#include "iparams_manager.hpp"
|
||||
#include "ilayer_transformations_manager.hpp"
|
||||
#include "transformation_context.hpp"
|
||||
#include "quantization_details.hpp"
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
#include "common/fake_quantize_dequantization.hpp"
|
||||
|
||||
/*****************************************************
|
||||
* Debug capability
|
||||
* - ORIGINAL_MODEL_PATH : Specify with existing folder name
|
||||
* to serialize original model into it (XML & BIN extensions were added)
|
||||
* - TRANSFORMED_MODEL_PATH : Specify with existing folder name
|
||||
* to serialize original model into it (XML & BIN extensions were added)
|
||||
* - LPT_PRINT_DEQUANTIZATION_INFO : Define it to enable
|
||||
* dequantization layers printing
|
||||
* - LPT_DISPLAY_PRECISION : Define it to to display precision info
|
||||
* during low precision transformations
|
||||
*
|
||||
*****************************************************/
|
||||
// #define LPT_ORIGINAL_MODEL_PATH "/localdisk/orig.model"
|
||||
// #define LPT_TRANSFORMED_MODEL_PATH "/localdisk/transformed.model"
|
||||
// #define LPT_PRINT_DEQUANTIZATION_INFO
|
||||
// #define LPT_DISPLAY_PRECISION
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API DataPrecision {
|
||||
public:
|
||||
DataPrecision() : precision(element::undefined), min(0.f), max(0.f), hasZeroPoint(false) {}
|
||||
|
||||
DataPrecision(const element::Type precision, const float min, const float max, const bool hasZeroPoint) :
|
||||
precision(precision),
|
||||
min(min),
|
||||
max(max),
|
||||
hasZeroPoint(hasZeroPoint) {}
|
||||
|
||||
static float getMinValue(const element::Type precision, const size_t levels) {
|
||||
if (precision == element::i8) {
|
||||
if (levels == 255) {
|
||||
return static_cast<float>(std::numeric_limits<signed char>::lowest()) + 1.f;
|
||||
} else if (levels == 256) {
|
||||
return static_cast<float>(std::numeric_limits<signed char>::lowest());
|
||||
} else {
|
||||
NGRAPH_CHECK(false, "unexpected levels ", levels, " for precision ", precision);
|
||||
}
|
||||
} else if (precision == element::u8) {
|
||||
return static_cast<float>(std::numeric_limits<unsigned char>::lowest());
|
||||
} else if (precision == element::f16) {
|
||||
return -1.0e15f;
|
||||
} else if (precision == element::f32) {
|
||||
return std::numeric_limits<float>::lowest();
|
||||
} else {
|
||||
NGRAPH_CHECK(false, "unexpected precision ", precision);
|
||||
}
|
||||
}
|
||||
|
||||
static float getMaxValue(const element::Type precision, const size_t levels) {
|
||||
if ((levels != 255ul) && (levels != 256ul)) {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "unexpected levels " << levels;
|
||||
}
|
||||
|
||||
if (precision == element::i8) {
|
||||
return static_cast<float>(std::numeric_limits<signed char>::max());
|
||||
} else if (precision == element::u8) {
|
||||
return static_cast<float>(std::numeric_limits<unsigned char>::max()) - (256 - levels);
|
||||
} else if (precision == element::f16) {
|
||||
return 1.0e15f;
|
||||
} else if (precision == element::f32) {
|
||||
return std::numeric_limits<float>::max();
|
||||
} else {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "unexpected precision " << precision;
|
||||
}
|
||||
}
|
||||
|
||||
static bool hasNegativeValues(const std::vector<float>& values) {
|
||||
for (const float value : values) {
|
||||
if (value < 0.0) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
element::Type precision;
|
||||
float min;
|
||||
float max;
|
||||
bool hasZeroPoint;
|
||||
|
||||
static element::Type getPrecision(const std::vector<float>& outputLowValues, const std::vector<float>& outputHighValues) {
|
||||
return (hasNegativeValues(outputLowValues) || hasNegativeValues(outputHighValues)) ? element::i8 : element::u8;
|
||||
}
|
||||
|
||||
static element::Type getPrecision(const size_t /* quantizationLevels */, const bool signedInterval) {
|
||||
return signedInterval ? element::i8 : element::u8;
|
||||
}
|
||||
|
||||
static float getMin(const size_t quantizationLevels, const bool signedInterval) {
|
||||
if (quantizationLevels == 255) {
|
||||
return signedInterval ? -127.0 : 0.0;
|
||||
} else if (quantizationLevels == 256) {
|
||||
return signedInterval ? -128.0 : 0.0;
|
||||
} else {
|
||||
// THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
|
||||
// FIXME: not completed
|
||||
return signedInterval ? -128.0 : 0.0;
|
||||
}
|
||||
}
|
||||
|
||||
static float getMax(const size_t quantizationLevels, const bool signedInterval) {
|
||||
if ((quantizationLevels == 255) || (quantizationLevels == 256)) {
|
||||
return signedInterval ? 127.0 : 255.0;
|
||||
} else {
|
||||
// THROW_TRANSFORMATION_EXCEPTION << "quantization level " << quantizationLevels << " is not supported";
|
||||
// FIXME: not completed
|
||||
// return quantizationLevels - 1.0;
|
||||
return signedInterval ? 127.0 : 255.0;
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
inline bool operator==(const DataPrecision& value1, const DataPrecision& value2) {
|
||||
return
|
||||
(value1.precision == value2.precision) &&
|
||||
(value1.min == value1.min) &&
|
||||
(value1.max == value1.max);
|
||||
}
|
||||
|
||||
inline bool operator!=(const DataPrecision& value1, const DataPrecision& value2) {
|
||||
return !(value1 == value2);
|
||||
}
|
||||
|
||||
inline std::ostream &operator << (std::ostream &os, const DataPrecision& value) {
|
||||
os << value.precision << ", min: " << value.min << ", max: " << value.max;
|
||||
return os;
|
||||
}
|
||||
|
||||
// Base class for all LP transformations, holds some common data structures
|
||||
class TRANSFORMATIONS_API LayerTransformation {
|
||||
public:
|
||||
enum QuantizedTensorAlignment {
|
||||
None,
|
||||
UpdateLevel
|
||||
};
|
||||
|
||||
class Params {
|
||||
public:
|
||||
Params(
|
||||
const bool updatePrecisions = true,
|
||||
const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations = QuantizedTensorAlignment::UpdateLevel,
|
||||
const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights = QuantizedTensorAlignment::None,
|
||||
bool supportAsymmetricQuantization = false,
|
||||
std::vector<element::Type> precisionsOnActivations = { element::u8, element::i8 },
|
||||
std::vector<element::Type> precisionsOnWeights = { element::i8 }) :
|
||||
updatePrecisions(updatePrecisions),
|
||||
quantizedTensorAlignmentOnActivations(quantizedTensorAlignmentOnActivations),
|
||||
quantizedTensorAlignmentOnWeights(quantizedTensorAlignmentOnWeights),
|
||||
supportAsymmetricQuantization(supportAsymmetricQuantization),
|
||||
precisionsOnActivations(precisionsOnActivations),
|
||||
precisionsOnWeights(precisionsOnWeights) {
|
||||
if (precisionsOnActivations.size() == 0ul) {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "precisions on activations are not specisifed";
|
||||
}
|
||||
|
||||
if (precisionsOnWeights.size() == 0ul) {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "precisions on weights are not specisifed";
|
||||
}
|
||||
}
|
||||
|
||||
Params& setUpdatePrecisions(const bool updatePrecisions) {
|
||||
this->updatePrecisions = updatePrecisions;
|
||||
return *this;
|
||||
}
|
||||
|
||||
Params& setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations) {
|
||||
this->quantizedTensorAlignmentOnActivations = quantizedTensorAlignmentOnActivations;
|
||||
return *this;
|
||||
}
|
||||
|
||||
Params& setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights) {
|
||||
this->quantizedTensorAlignmentOnWeights = quantizedTensorAlignmentOnWeights;
|
||||
return *this;
|
||||
}
|
||||
|
||||
Params& setSupportAsymmetricQuantization(const bool supportAsymmetricQuantization) {
|
||||
this->supportAsymmetricQuantization = supportAsymmetricQuantization;
|
||||
return *this;
|
||||
}
|
||||
|
||||
Params& setPrecisionsOnActivations(const std::vector<element::Type>& precisionsOnActivations) {
|
||||
this->precisionsOnActivations = precisionsOnActivations;
|
||||
return *this;
|
||||
}
|
||||
|
||||
Params& setPrecisionsOnWeights(const std::vector<element::Type>& precisionsOnWeights) {
|
||||
this->precisionsOnWeights = precisionsOnWeights;
|
||||
return *this;
|
||||
}
|
||||
|
||||
bool updatePrecisions;
|
||||
QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
|
||||
QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
|
||||
bool supportAsymmetricQuantization;
|
||||
std::vector<element::Type> precisionsOnActivations;
|
||||
std::vector<element::Type> precisionsOnWeights;
|
||||
};
|
||||
|
||||
class PrecisionDetails {
|
||||
public:
|
||||
PrecisionDetails(const element::Type& precision, const bool hasNegativeOutput, const bool hasZeroPoint) :
|
||||
precision(precision),
|
||||
hasNegativeOutput(hasNegativeOutput),
|
||||
hasZeroPoint(hasZeroPoint) {}
|
||||
|
||||
const element::Type precision;
|
||||
const bool hasNegativeOutput;
|
||||
const bool hasZeroPoint;
|
||||
};
|
||||
|
||||
LayerTransformation(const Params& params);
|
||||
virtual ~LayerTransformation() = default;
|
||||
virtual void registerMatcherIn(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const = 0;
|
||||
virtual bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const = 0;
|
||||
|
||||
void setParamsManager(IParamsManager* paramsManager) noexcept;
|
||||
void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
|
||||
|
||||
void setUpdatePrecisions(const bool updatePrecisions);
|
||||
void setQuantizedTensorAlignmentOnActivations(const QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
|
||||
void setQuantizedTensorAlignmentOnWeights(const QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
|
||||
|
||||
void setQuantizationIntervalAsymmetryThreshold(const float value);
|
||||
void setZeroThreshold(const float value);
|
||||
void setMinQuantizationLevels(const size_t levels);
|
||||
|
||||
const std::vector<element::Type>& getPrecisionsOnActivations() const;
|
||||
const std::vector<element::Type>& getPrecisionsOnWeights() const;
|
||||
|
||||
virtual bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const;
|
||||
|
||||
bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const size_t parentIndex = 0ul) const;
|
||||
|
||||
bool canSubtractBeHandled(const std::shared_ptr<Node>& op, const FakeQuantizeDequantization& dequantization) const;
|
||||
|
||||
PrecisionDetails getPrecisionDetails(const QuantizationDetails& quantizationDetails) const;
|
||||
|
||||
// return true if operation can be quantized and false otherwise
|
||||
// for example: if convolution operation weights are not quantized, then isQuantize returns false and true otherwise
|
||||
// note: dequantization operations on activations are absent during method execution
|
||||
virtual bool isQuantized(std::shared_ptr<Node> layer) const noexcept;
|
||||
|
||||
// return true if operation can be preserved for precision
|
||||
// note: dequantization operations on activations are absent during method execution
|
||||
virtual bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept = 0;
|
||||
|
||||
DataPrecision getDataPrecision(
|
||||
std::shared_ptr<Node> layer,
|
||||
const QuantizationDetails& quantizationDetails,
|
||||
const bool onWeights) const;
|
||||
|
||||
void fillAvailablePrecisions(std::shared_ptr<Node> layer, std::vector<element::Type>& availablePrecisions) const;
|
||||
|
||||
std::vector<std::shared_ptr<Node>> getChildrenRecursivelyExceptPrecisionPreserved(const std::shared_ptr<Node>& op) const noexcept;
|
||||
|
||||
protected:
|
||||
#ifdef LPT_PRINT_DEQUANTIZATION_INFO
|
||||
static void printDequantizationInfo(const std::shared_ptr<Node>& layer);
|
||||
static void printDequantizationInfo(const DataPrecision& dataPrecision);
|
||||
static void printDequantizationValues(
|
||||
const std::vector<float>& dequantizationScales,
|
||||
const std::vector<float>& dequantizationShifts);
|
||||
#endif
|
||||
|
||||
bool updatePrecisions;
|
||||
QuantizedTensorAlignment quantizedTensorAlignmentOnActivations;
|
||||
QuantizedTensorAlignment quantizedTensorAlignmentOnWeights;
|
||||
bool supportAsymmetricQuantization;
|
||||
std::vector<element::Type> precisionsOnActivations;
|
||||
std::vector<element::Type> precisionsOnWeights;
|
||||
|
||||
// absolute value, used to determine quantization interval asymmetry
|
||||
float quantizationIntervalAsymmetryThreshold;
|
||||
// absolute value, used to determine zero
|
||||
float zeroThreshold;
|
||||
size_t minQuantizationLevels;
|
||||
|
||||
static const char originalLayerPostfix[];
|
||||
IParamsManager* paramsManager;
|
||||
ILayerTransformationsManager* layerTransformationsManager;
|
||||
|
||||
protected:
|
||||
std::shared_ptr<ngraph::Node> separateInStandaloneBranch(std::shared_ptr<ngraph::Node> node) const;
|
||||
|
||||
std::shared_ptr<ngraph::Node> moveDequantizationAfter(
|
||||
TransformationContext &context,
|
||||
const std::shared_ptr<ngraph::Node>& operation,
|
||||
const FakeQuantizeDequantization& dequantization,
|
||||
const bool updatePrecision,
|
||||
const bool moveSubtract = true) const;
|
||||
|
||||
void fuseConvertIfPossible(const std::shared_ptr<ngraph::Node>& operation) const;
|
||||
|
||||
void updateOutput(
|
||||
TransformationContext &context,
|
||||
std::shared_ptr<ngraph::Node> lastNode,
|
||||
std::shared_ptr<ngraph::Node> originalNode) const;
|
||||
|
||||
void updateOutput(
|
||||
TransformationContext& context,
|
||||
std::shared_ptr<ngraph::Node> lastNode,
|
||||
std::string originalName) const;
|
||||
|
||||
void addPattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context, std::shared_ptr<Node> patternRoot) const;
|
||||
|
||||
template <typename Operation>
|
||||
void addSingleNodePattern(ngraph::pass::GraphRewrite& pass, TransformationContext& context) const {
|
||||
using namespace ngraph;
|
||||
|
||||
auto is_op_type = [](std::shared_ptr<Node> n) {
|
||||
return !!as_type_ptr<Operation>(n);
|
||||
};
|
||||
auto p_node = std::make_shared<pattern::op::Label>(element::f32, Shape{}, is_op_type);
|
||||
|
||||
addPattern(pass, context, p_node);
|
||||
}
|
||||
};
|
||||
|
||||
inline std::ostream &operator << (std::ostream &os, const LayerTransformation::QuantizedTensorAlignment& value) {
|
||||
switch (value) {
|
||||
case LayerTransformation::QuantizedTensorAlignment::None: {
|
||||
os << "None";
|
||||
break;
|
||||
}
|
||||
case LayerTransformation::QuantizedTensorAlignment::UpdateLevel: {
|
||||
os << "UpdateLevel";
|
||||
break;
|
||||
}
|
||||
default: {
|
||||
os << static_cast<int>(value);
|
||||
break;
|
||||
}
|
||||
}
|
||||
return os;
|
||||
}
|
||||
|
||||
inline std::ostream &operator << (std::ostream &os, const std::vector<element::Type>& values) {
|
||||
os << "{";
|
||||
for (size_t i = 0; i < values.size(); ++i) {
|
||||
const element::Type& value = values[i];
|
||||
if (i > 0) {
|
||||
os << value;
|
||||
} else {
|
||||
os << ", " << value;
|
||||
}
|
||||
}
|
||||
os << "}";
|
||||
return os;
|
||||
}
|
||||
|
||||
typedef std::shared_ptr<LayerTransformation> LayerTransformationPtr;
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,36 @@
|
||||
// Copyright (C) 2018-2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
|
||||
#include <ie_api.h>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
|
||||
#include <ngraph/pass/graph_rewrite.hpp>
|
||||
#include <transformations/low_precision/ilayer_transformations_manager.hpp>
|
||||
#include <transformations/low_precision/iparams_manager.hpp>
|
||||
|
||||
using namespace std;
|
||||
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
|
||||
class TRANSFORMATIONS_API LowPrecisionTransformations: public ngraph::pass::GraphRewrite, IParamsManager, ILayerTransformationsManager {
|
||||
public:
|
||||
bool run_on_function(std::shared_ptr<ngraph::Function> f) override;
|
||||
|
||||
// IParamsManager interface implementation
|
||||
std::vector<element::Type> getPrecisionsOnActivations(const NodeTypeInfo& layerName) const noexcept override;
|
||||
|
||||
// ILayerTransformationsManager interface implementation
|
||||
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
}// namespace pass
|
||||
}// namespace ngraph
|
@ -0,0 +1,26 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API MatMulTransformation : public LayerTransformation {
|
||||
public:
|
||||
MatMulTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~MatMulTransformation() override {}
|
||||
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,26 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API MaxPoolTransformation : public LayerTransformation {
|
||||
public:
|
||||
MaxPoolTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/eltwise_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API MultiplyTransformation : public EltwiseBaseTransformation {
|
||||
public:
|
||||
MultiplyTransformation(const Params& params) : EltwiseBaseTransformation(params) {}
|
||||
~MultiplyTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,33 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API MultiplyToGroupConvolutionTransformation : public LayerTransformation {
|
||||
public:
|
||||
MultiplyToGroupConvolutionTransformation(const Params& params) : LayerTransformation(params), groupSize(1ul) {}
|
||||
~MultiplyToGroupConvolutionTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool isQuantized(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
void setGroupSize(const size_t groupSize);
|
||||
size_t getGroupSize() const;
|
||||
private:
|
||||
size_t groupSize;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API MVNTransformation : public LayerTransformation {
|
||||
public:
|
||||
MVNTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,245 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <cmath>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
#include <unordered_set>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/pattern/matcher.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
#include "ngraph_ops/type_relaxed.hpp"
|
||||
#include <ngraph/rt_info.hpp>
|
||||
|
||||
#include "transformation_context.hpp"
|
||||
#include "quantization_details.hpp"
|
||||
#include "transformations/utils/utils.hpp"
|
||||
#include "common/fake_quantize_dequantization.hpp"
|
||||
#include "common/ie_lpt_exception.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
/**
|
||||
* @brief NetworkHelper class encapsulates manipulations with nGraph function.
|
||||
*/
|
||||
class TRANSFORMATIONS_API NetworkHelper {
|
||||
public:
|
||||
// Return true if `type` can be castable to at least one of `type`
|
||||
static bool is_castable_to_one_of(NodeTypeInfo type, const std::unordered_set<NodeTypeInfo>& types);
|
||||
|
||||
static std::vector<Input<Node>> consumer_inputs(std::shared_ptr<Node> node);
|
||||
|
||||
// Collect and return a vector with all nodes that consumes any of the `node` output
|
||||
static std::vector<std::shared_ptr<Node>> consumers(std::shared_ptr<Node> node);
|
||||
|
||||
static Shape alignShapeForChannelDim(const Shape& shape, Rank rank);
|
||||
|
||||
// return true if at least one child uses layer on weights
|
||||
static bool onWeights(std::shared_ptr<Node> layer);
|
||||
|
||||
template <typename OperationType>
|
||||
static std::shared_ptr<Node> setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> operation, const element::Type& precision);
|
||||
|
||||
template <typename OperationType>
|
||||
static std::shared_ptr<Node> setOutDataPrecision(std::shared_ptr<OperationType> operation, const element::Type& precision);
|
||||
|
||||
static size_t getOutputChannelsCount(std::shared_ptr<const Node> layer, bool isOnWeights = false);
|
||||
|
||||
static std::vector<std::shared_ptr<Node>> getParentsRecursivelyExceptTypes(
|
||||
std::shared_ptr<Node> layer,
|
||||
const std::unordered_set<NodeTypeInfo>& exceptionLayerTypes = {},
|
||||
const int portIndex = -1);
|
||||
|
||||
static size_t getInputChannelsCount(std::shared_ptr<Node> layer);
|
||||
|
||||
static size_t getGroupsCount(std::shared_ptr<Node> layer);
|
||||
|
||||
// Remove node by connecting its 0th input with 0th output
|
||||
static void removeLayer(std::shared_ptr<Node> node);
|
||||
|
||||
static std::shared_ptr<Node> swapMultiplyAndAdd(std::shared_ptr<opset1::Add> addAfterMultiply, const int multiplyBranch);
|
||||
|
||||
static void copyInfo(const std::shared_ptr<Node>& source, const std::shared_ptr<Node>& target);
|
||||
|
||||
static void cleanRunTimeInfo(const std::shared_ptr<Node>& layer);
|
||||
|
||||
static bool isScalarLike(std::shared_ptr<opset1::Constant> constant);
|
||||
|
||||
static bool isZero(std::shared_ptr<opset1::Constant> constant);
|
||||
|
||||
static std::shared_ptr<opset1::Constant> toScalar(std::shared_ptr<opset1::Constant> constant);
|
||||
|
||||
static std::shared_ptr<Node> getConstantInput(std::shared_ptr<Node> node);
|
||||
|
||||
// Optimizes the series of multiplies after a given output port
|
||||
static std::shared_ptr<ngraph::opset1::Multiply> optimizeMultipliesAfter(std::shared_ptr<Node> multiply);
|
||||
|
||||
static std::shared_ptr<opset1::Constant> roundWithTolerance(std::shared_ptr<Node> node, element::Type target_type, float tolerance = 0.1);
|
||||
|
||||
static std::tuple<std::shared_ptr<Node>, std::shared_ptr<Node>> decomposeFakeQuantize(
|
||||
std::shared_ptr<opset1::FakeQuantize> fq,
|
||||
const element::Type precision,
|
||||
const float min,
|
||||
const float max,
|
||||
const bool hasZeroPoint,
|
||||
const bool updatePrecision);
|
||||
|
||||
static std::shared_ptr<opset1::FakeQuantize> updateFakeQuantize(
|
||||
std::shared_ptr<opset1::FakeQuantize> fq,
|
||||
element::Type precision,
|
||||
float min,
|
||||
float max);
|
||||
|
||||
static FakeQuantizeDequantization makeDequantization(
|
||||
const float dequantizationMul,
|
||||
const float dequantizationSub,
|
||||
const ngraph::element::Type originalPrecision,
|
||||
const ngraph::Shape dataNodeOutputShape,
|
||||
element::Type precision,
|
||||
float min,
|
||||
float max);
|
||||
|
||||
static FakeQuantizeDequantization createDequantizationFromFakeQuantize(
|
||||
std::shared_ptr<opset1::FakeQuantize> fq,
|
||||
element::Type precision,
|
||||
float min,
|
||||
float max,
|
||||
const bool hasZeroPoint,
|
||||
const bool updatePrecision);
|
||||
|
||||
static FakeQuantizeDequantization getDequantization(const std::shared_ptr<Node> node, const size_t parentIndex = 0ul, const bool inPlace = false);
|
||||
|
||||
static std::shared_ptr<Node> optimizeSubtract(std::shared_ptr<opset1::Subtract> add);
|
||||
|
||||
class InsertDequantizationResult {
|
||||
public:
|
||||
InsertDequantizationResult(
|
||||
const std::shared_ptr<Node>& newOperation,
|
||||
const std::shared_ptr<Node>& lastDequantization) : newOperation(newOperation), lastDequantization(lastDequantization) {}
|
||||
|
||||
std::shared_ptr<Node> newOperation;
|
||||
std::shared_ptr<Node> lastDequantization;
|
||||
};
|
||||
|
||||
static InsertDequantizationResult moveDequantizationAfter(
|
||||
const std::shared_ptr<ngraph::Node>& operation,
|
||||
const FakeQuantizeDequantization& dequantization,
|
||||
const bool updatePrecision,
|
||||
const bool moveSubtract);
|
||||
|
||||
// TODO: rename: fuseConvertIfPossible
|
||||
static void removeConvertIfPossible(
|
||||
const std::shared_ptr<ngraph::Node>& operation,
|
||||
const FakeQuantizeDequantization& dequantization);
|
||||
|
||||
static bool checkConstantValuePrecision(const element::Type expectedPrecision, const std::shared_ptr<Node>& constant);
|
||||
|
||||
static size_t getChildInputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
|
||||
|
||||
static size_t getParentOutputIndex(const std::shared_ptr<ngraph::Node>& parent, const std::shared_ptr<ngraph::Node>& child);
|
||||
|
||||
static std::vector<Output<Node>> getInputs(const std::shared_ptr<ngraph::Node>& node);
|
||||
|
||||
static FakeQuantizeDequantizationValues createEmptyValues(const FakeQuantizeDequantization& dequantization);
|
||||
|
||||
static bool isZeroConst(const std::shared_ptr<Node>& node);
|
||||
|
||||
static std::shared_ptr<Node> toScalarIfPossible(std::shared_ptr<Node> node);
|
||||
|
||||
static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq);
|
||||
static std::shared_ptr<Node> fold_fake_quantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues);
|
||||
|
||||
// multi-precision constant folding
|
||||
// handles only specific case: Constant -> [dequantization operations] -> [node]
|
||||
static void foldDequantization(std::shared_ptr<Node>& node, const size_t branchIndex, const bool inPlace = false);
|
||||
|
||||
private:
|
||||
static std::shared_ptr<Node> foldFakeQuantize(const std::shared_ptr<opset1::FakeQuantize>& fq, const bool roundValues, const bool roundValuesWasSet);
|
||||
|
||||
// 1 - on weights
|
||||
// 0 - weightable layer was not found
|
||||
// -1 - on activations
|
||||
static int onWeightsInDepth(std::shared_ptr<Node> layer);
|
||||
};
|
||||
|
||||
template <typename OperationType>
|
||||
std::shared_ptr<Node> NetworkHelper::setOutDataPrecisionForTypeRelaxed(std::shared_ptr<OperationType> layer, const element::Type& precision) {
|
||||
// check if it already exteded operation node
|
||||
if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
|
||||
relaxed_layer->set_overridden_output_type(precision);
|
||||
std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
|
||||
return layer;
|
||||
} else {
|
||||
THROW_IE_LPT_EXCEPTION(*layer) << "TypeRelaxed type is expected";
|
||||
}
|
||||
}
|
||||
|
||||
template <typename OperationType>
|
||||
std::shared_ptr<Node> NetworkHelper::setOutDataPrecision(std::shared_ptr<OperationType> layer, const element::Type& precision) {
|
||||
// check if it already exteded operation node
|
||||
if (auto relaxed_layer = std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(layer)) {
|
||||
relaxed_layer->set_overridden_output_type(precision);
|
||||
std::dynamic_pointer_cast<ngraph::Node>(layer)->validate_and_infer_types();
|
||||
return layer;
|
||||
} else {
|
||||
// Make such replacements in advance for all supported polymorphic layer types
|
||||
// extend a node with new semantics: overriden output data_type
|
||||
// OperationType should be a real type of an object, otherwise it will lead to undefined behavior
|
||||
auto replacement = std::make_shared<ngraph::op::TypeRelaxed<OperationType>>(*layer, precision);
|
||||
copy_runtime_info(layer, replacement);
|
||||
replace_node(layer, replacement);
|
||||
return replacement;
|
||||
}
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
std::shared_ptr<Node> make_op_pattern(const ngraph::NodeVector& args) {
|
||||
return std::make_shared<ngraph::pattern::op::Any>(element::undefined, PartialShape{}, [](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); }, args);
|
||||
}
|
||||
|
||||
template <typename T>
|
||||
std::shared_ptr<Node> make_op_label() {
|
||||
return std::make_shared<ngraph::pattern::op::Label>(
|
||||
element::undefined,
|
||||
PartialShape{},
|
||||
[](std::shared_ptr<Node> n) {return !!as_type_ptr<T>(n); });
|
||||
}
|
||||
|
||||
template <typename T, typename... Args>
|
||||
std::shared_ptr<Node> fold(Args&&... args) {
|
||||
auto node = std::make_shared<T>(std::forward<Args>(args)...);
|
||||
if (node->get_output_size() == 1) {
|
||||
OutputVector folded(node->get_output_size());
|
||||
if (node->constant_fold(folded, node->input_values())) {
|
||||
return folded[0].get_node_shared_ptr();
|
||||
}
|
||||
}
|
||||
return node;
|
||||
}
|
||||
|
||||
template <typename T, typename... Args>
|
||||
std::shared_ptr<Node> fold_reshape(Args&&... args) {
|
||||
std::shared_ptr<Node> node = std::make_shared<T>(std::forward<Args>(args)...);
|
||||
if (node->get_output_size() == 1) {
|
||||
OutputVector folded;
|
||||
if (is_type<opset1::Constant>(node->input_value(0).get_node_shared_ptr()) &&
|
||||
is_type<opset1::Constant>(node->input_value(1).get_node_shared_ptr())) {
|
||||
return std::make_shared<opset1::Constant>(
|
||||
node->get_input_element_type(0),
|
||||
Shape(as_type_ptr<opset1::Constant>(node->input_value(1).get_node_shared_ptr())->template cast_vector<size_t>()),
|
||||
as_type_ptr<opset1::Constant>(node->input_value(0).get_node_shared_ptr())->get_data_ptr());
|
||||
}
|
||||
}
|
||||
return node;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API NormalizeL2Transformation : public LayerTransformation {
|
||||
public:
|
||||
NormalizeL2Transformation(const Params& params) : LayerTransformation(params) {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext &context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API PReluTransformation : public LayerTransformation {
|
||||
public:
|
||||
PReluTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~PReluTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,89 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ostream>
|
||||
#include <vector>
|
||||
|
||||
#include <transformations_visibility.hpp>
|
||||
|
||||
#include <ngraph/node.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
#include <ngraph/type.hpp>
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API QuantizationDetails {
|
||||
public:
|
||||
QuantizationDetails();
|
||||
QuantizationDetails(const QuantizationDetails& quantizationDetails);
|
||||
QuantizationDetails(
|
||||
const size_t levels,
|
||||
const std::vector<float>& inputLowValues,
|
||||
const std::vector<float>& inputHighValues,
|
||||
const std::vector<float>& outputLowValues,
|
||||
const std::vector<float>& outputHighValues,
|
||||
const size_t inputIntervalsCount,
|
||||
const size_t outputIntervalsCount,
|
||||
const size_t outputChannelsCount);
|
||||
|
||||
static bool outputLayoutIsSupported(std::shared_ptr<opset1::FakeQuantize> quantize);
|
||||
|
||||
static void getInputIntervals(
|
||||
std::shared_ptr<opset1::FakeQuantize> quantize,
|
||||
std::vector<float>& inputLowValues,
|
||||
std::vector<float>& inputHighValues,
|
||||
size_t& inputIntervalsCount);
|
||||
|
||||
static void getOutputIntervals(
|
||||
std::shared_ptr<opset1::FakeQuantize> quantize,
|
||||
std::vector<float>& outputLowValues,
|
||||
std::vector<float>& outputHighValues,
|
||||
size_t& outputIntervalsCount);
|
||||
|
||||
static QuantizationDetails getDetails(std::shared_ptr<opset1::FakeQuantize>);
|
||||
bool hasNegativeOutput() const;
|
||||
float maxOutput(const size_t channel) const;
|
||||
float maxInput(const size_t channel) const;
|
||||
|
||||
float maxOutputHigh() const;
|
||||
float minOutputLow() const;
|
||||
|
||||
float getInputLowValue(const size_t channel) const;
|
||||
float getInputHighValue(const size_t channel) const;
|
||||
float getOutputLowValue(const size_t channel) const;
|
||||
float getOutputHighValue(const size_t channel) const;
|
||||
|
||||
static bool isSupportedLevel(const size_t level);
|
||||
|
||||
const size_t levels;
|
||||
const std::vector<float> inputLowValues;
|
||||
const std::vector<float> inputHighValues;
|
||||
const std::vector<float> outputLowValues;
|
||||
const std::vector<float> outputHighValues;
|
||||
const size_t inputIntervalsCount;
|
||||
const size_t outputIntervalsCount;
|
||||
const size_t outputChannelsCount;
|
||||
|
||||
private:
|
||||
QuantizationDetails &operator=(const QuantizationDetails & /*target*/) { return *this; }
|
||||
static void validate(std::shared_ptr<Node> constantLayer);
|
||||
static std::vector<float> getBlobValue(std::shared_ptr<Node> constantLayer);
|
||||
};
|
||||
|
||||
inline std::ostream &operator << (std::ostream &os, const QuantizationDetails& value) {
|
||||
os << "levels: " << value.levels <<
|
||||
", input 1/" << value.inputIntervalsCount << ": [" << value.getInputLowValue(0) << " : " << value.getInputHighValue(0) << "], " <<
|
||||
", output 1/" << value.outputIntervalsCount << ": [" << value.getOutputLowValue(0) << " : " << value.getOutputHighValue(0) << "]";
|
||||
return os;
|
||||
}
|
||||
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ReluTransformation : public LayerTransformation {
|
||||
public:
|
||||
ReluTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~ReluTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,32 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <algorithm>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API ReshapeTransformation : public LayerTransformation {
|
||||
public:
|
||||
ReshapeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~ReshapeTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
|
||||
static bool canBeTransformed(
|
||||
const ngraph::Shape& subtractShape,
|
||||
const ngraph::Shape& multiplyShape,
|
||||
const ngraph::Shape& inputShape,
|
||||
const ngraph::Shape& outputShape);
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,39 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <vector>
|
||||
|
||||
#include "layer_transformation.hpp"
|
||||
#include "ngraph/node.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API SplitTransformation : public LayerTransformation {
|
||||
public:
|
||||
SplitTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher& m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
void updateOutputs(
|
||||
TransformationContext& context,
|
||||
std::vector<std::shared_ptr<ngraph::Node>> lastNodes,
|
||||
std::shared_ptr<ngraph::Node> originalNode) const;
|
||||
protected:
|
||||
ngraph::Shape getConstSplitShape(
|
||||
const std::vector<size_t>& constSplitLengths,
|
||||
const ngraph::Shape& constShape, const size_t axis,
|
||||
const size_t idx) const;
|
||||
virtual std::vector<size_t> getConstSplitLengths(
|
||||
const OutputVector& inputs,
|
||||
const ngraph::Shape& constShape,
|
||||
const size_t outputSize) const;
|
||||
};
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API SqueezeTransformation : public LayerTransformation {
|
||||
public:
|
||||
SqueezeTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,24 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API SubtractTransformation : public LayerTransformation {
|
||||
public:
|
||||
SubtractTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~SubtractTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
#include "transformations/low_precision/eltwise_base_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API SubtractMultiplyToMultiplyAddTransformation : public LayerTransformation {
|
||||
public:
|
||||
SubtractMultiplyToMultiplyAddTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~SubtractMultiplyToMultiplyAddTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,35 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <string>
|
||||
#include <unordered_set>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/quantization_details.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API TransformationContext {
|
||||
public:
|
||||
explicit TransformationContext(std::shared_ptr<Function> function);
|
||||
std::shared_ptr<Function> function;
|
||||
|
||||
// Used to store handled FakeQuantize operations.
|
||||
// ConcatTransformation and FakeQuantizeTransformation handle FakeQuantize operations. ConcatTransformation handles FakeQuantize operation first.
|
||||
// If updatePrecision transformation option is set to False then there are no FakeQuantize operation attributes to identify that the operation
|
||||
// have been handled by ConcatTransformation already:
|
||||
// - output precision is original (FP32),
|
||||
// - intervals are changed but not equal to precision boundaries,
|
||||
// - quantization level can be or can be not changed.
|
||||
// To avoid FakeQuantize operation double handling by FakeQuantizeTransformation after ConcatTransformation, FakeQuantizeTransformation
|
||||
// has to use this member.
|
||||
std::unordered_set<std::string> quantizedFakeQuantizeNames;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,214 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <algorithm>
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph_ops/type_relaxed.hpp>
|
||||
|
||||
#include "layer_transformation.hpp"
|
||||
#include "iparams_manager.hpp"
|
||||
#include "ilayer_transformations_manager.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
struct StandaloneCleanup {
|
||||
std::string typeName;
|
||||
std::string typeId;
|
||||
LayerTransformationPtr transformation;
|
||||
};
|
||||
|
||||
class TRANSFORMATIONS_API LowPrecisionTransformations {
|
||||
public:
|
||||
LowPrecisionTransformations() {}
|
||||
LowPrecisionTransformations(
|
||||
const std::map<std::string, LayerTransformationPtr>& branchSpecificTransformations,
|
||||
const std::map<std::string, LayerTransformationPtr>& transformations,
|
||||
const std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& cleanupTransformations,
|
||||
const std::vector<StandaloneCleanup>& standaloneCleanupTransformations);
|
||||
|
||||
void setUpdatePrecisions(const bool updatePrecisions);
|
||||
void setQuantizedTensorAlignmentOnActivations(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnActivations);
|
||||
void setQuantizedTensorAlignmentOnWeights(const LayerTransformation::QuantizedTensorAlignment quantizedTensorAlignmentOnWeights);
|
||||
LowPrecisionTransformations& remove(const std::string& operationType);
|
||||
LowPrecisionTransformations& removeBranchSpecificTransformations(const std::string& operationType);
|
||||
LowPrecisionTransformations& removeTransformations(const std::string& operationType);
|
||||
LowPrecisionTransformations& removeCleanupTransformations(const std::string& operationType);
|
||||
|
||||
/**
|
||||
* Add branch specific transformation. Transformation type and operation type are required.
|
||||
* Operation type is used to find transformation by operation during precision definition.
|
||||
*/
|
||||
template <class Transformation, class Operation>
|
||||
LowPrecisionTransformations& addBranchSpecific(const LayerTransformation::Params& params) {
|
||||
const std::string typeName = getType<Operation>();
|
||||
const auto it = branchSpecificTransformations.find(typeName);
|
||||
if (it != branchSpecificTransformations.end()) {
|
||||
branchSpecificTransformations.erase(it);
|
||||
}
|
||||
|
||||
branchSpecificTransformations.emplace(typeName, std::make_shared<Transformation>(params));
|
||||
return *this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Add transformation. Transformation type and operation type are required.
|
||||
* Operation type is used to find transformation by operation during precision definition.
|
||||
*/
|
||||
template <class Transformation, class Operation>
|
||||
LowPrecisionTransformations& add(const LayerTransformation::Params& params) {
|
||||
const std::string typeName = getType<Operation>();
|
||||
const auto it = transformations.find(typeName);
|
||||
if (it != transformations.end()) {
|
||||
transformations.erase(it);
|
||||
}
|
||||
|
||||
transformations.emplace(typeName, std::make_shared<Transformation>(params));
|
||||
return *this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Add cleanup transformation. Transformation type and operation type are required.
|
||||
* Operation type is used to find transformation by operation during precision definition.
|
||||
*/
|
||||
template <class Transformation, class Operation>
|
||||
LowPrecisionTransformations& addCleanup(const LayerTransformation::Params& params) {
|
||||
const std::string typeName = getType<Operation>();
|
||||
const std::string typeId = typeid(Transformation).name();
|
||||
const auto it = cleanupTransformations.find(typeName);
|
||||
if (it == cleanupTransformations.end()) {
|
||||
cleanupTransformations.emplace(typeName,
|
||||
std::vector<std::pair<std::string, LayerTransformationPtr>>{ std::make_pair(typeId, std::make_shared<Transformation>(params)) });
|
||||
} else {
|
||||
const auto it1 = std::find_if(it->second.begin(), it->second.end(),
|
||||
[&](const std::pair<std::string, LayerTransformationPtr>& transformation) {
|
||||
return transformation.first == typeName;
|
||||
});
|
||||
if (it1 != it->second.end()) {
|
||||
it->second.erase(it1);
|
||||
}
|
||||
it->second.emplace_back(std::make_pair(typeId, std::make_shared<Transformation>(params)));
|
||||
}
|
||||
return *this;
|
||||
}
|
||||
|
||||
/**
|
||||
* Add cleanup transformation. Transformation type and operation type are required.
|
||||
* Operation type is used to find transformation by operation during precision definition.
|
||||
*/
|
||||
template <class Transformation, class Operation>
|
||||
LowPrecisionTransformations& addStandaloneCleanup(const LayerTransformation::Params& params) {
|
||||
const std::string typeName = getType<Operation>();
|
||||
const std::string typeId = typeid(Transformation).name();
|
||||
const auto it = std::find_if(standaloneCleanupTransformations.begin(), standaloneCleanupTransformations.end(),
|
||||
[&](const StandaloneCleanup& transformation) {
|
||||
return transformation.typeName == typeName && transformation.typeId == typeId;
|
||||
});
|
||||
if (it == standaloneCleanupTransformations.end()) {
|
||||
standaloneCleanupTransformations.emplace_back(StandaloneCleanup{ typeName, typeId, std::make_shared<Transformation>(params) });
|
||||
} else {
|
||||
*it = { typeName, typeId, std::make_shared<Transformation>(params) };
|
||||
}
|
||||
|
||||
return *this;
|
||||
}
|
||||
|
||||
template <class Operation>
|
||||
static std::string getType() {
|
||||
return Operation::get_type_info_static().name;
|
||||
}
|
||||
|
||||
static std::string getType(const Node& operation) {
|
||||
return operation.get_type_name();
|
||||
}
|
||||
|
||||
std::vector<LayerTransformationPtr> find(const std::string& transformationName) const;
|
||||
|
||||
template <class Operation>
|
||||
std::vector<LayerTransformationPtr> find() const {
|
||||
const std::string transformationKey = getType<Operation>();
|
||||
return find(transformationKey);
|
||||
}
|
||||
|
||||
void setParamsManager(IParamsManager* paramsManager) noexcept;
|
||||
void setLayerTransformationsManager(ILayerTransformationsManager* layerTransformationsManager) noexcept;
|
||||
|
||||
// Key is not a layer type, but just a name of transformation
|
||||
// Layer type (or a pattern) is defined by transformation itself as an ngraph matcher
|
||||
std::map<std::string, LayerTransformationPtr> branchSpecificTransformations;
|
||||
std::map<std::string, LayerTransformationPtr> transformations;
|
||||
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> cleanupTransformations;
|
||||
std::vector<StandaloneCleanup> standaloneCleanupTransformations;
|
||||
|
||||
private:
|
||||
static void setParamsManager(IParamsManager* paramsManager, std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
|
||||
static void setParamsManager(
|
||||
IParamsManager* paramsManager,
|
||||
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
|
||||
static void setParamsManager(IParamsManager* paramsManager, std::vector<StandaloneCleanup>& transformations) noexcept;
|
||||
static void setLayerTransformationsManager(
|
||||
ILayerTransformationsManager* layerTransformationsManager,
|
||||
std::map<std::string, LayerTransformationPtr>& transformations) noexcept;
|
||||
static void setLayerTransformationsManager(
|
||||
ILayerTransformationsManager* layerTransformationsManager,
|
||||
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>>& transformations) noexcept;
|
||||
static void setLayerTransformationsManager(
|
||||
ILayerTransformationsManager* layerTransformationsManager,
|
||||
std::vector<StandaloneCleanup>& transformations) noexcept;
|
||||
};
|
||||
|
||||
/**
|
||||
* @brief low precision transformation component.
|
||||
*/
|
||||
class TRANSFORMATIONS_API LowPrecisionTransformer : public IParamsManager, ILayerTransformationsManager {
|
||||
public:
|
||||
static LowPrecisionTransformations getAllTransformations(const LayerTransformation::Params& params = LayerTransformation::Params());
|
||||
|
||||
static bool isFunctionQuantized(const std::shared_ptr<Function>& function);
|
||||
|
||||
LowPrecisionTransformer();
|
||||
LowPrecisionTransformer(const LowPrecisionTransformations& transformations);
|
||||
void transform(std::shared_ptr<Function> network);
|
||||
|
||||
// IParamsManager interface implementation
|
||||
std::vector<element::Type> getPrecisionsOnActivations(const Node& op) const noexcept override;
|
||||
|
||||
// ILayerTransformationsManager interface implementation
|
||||
bool isQuantized(const std::shared_ptr<Node>& layer) const noexcept override;
|
||||
bool isPrecisionPreserved(const std::shared_ptr<Node>& layer) const noexcept override;
|
||||
|
||||
private:
|
||||
LowPrecisionTransformations transformations;
|
||||
|
||||
void registerAllMatchers(
|
||||
std::map<std::string, LayerTransformationPtr> transformations,
|
||||
GraphRewrite& pass,
|
||||
TransformationContext& context);
|
||||
|
||||
void registerAllMatchers(
|
||||
std::map<std::string, std::vector<std::pair<std::string, LayerTransformationPtr>>> transformations,
|
||||
GraphRewrite& pass,
|
||||
TransformationContext& context);
|
||||
|
||||
std::vector<element::Type> precisionIntersection(
|
||||
const std::vector<element::Type>& v1,
|
||||
const std::vector<element::Type>& v2) const noexcept;
|
||||
};
|
||||
|
||||
class TRANSFORMATIONS_API TypeRelaxedReplacer : public GraphRewrite {
|
||||
public:
|
||||
TypeRelaxedReplacer();
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API TransparentBaseTransformation : public LayerTransformation {
|
||||
public:
|
||||
TransparentBaseTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~TransparentBaseTransformation() override {};
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,27 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API TransposeTransformation : public LayerTransformation {
|
||||
public:
|
||||
TransposeTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
~TransposeTransformation() override {}
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,25 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API UnsqueezeTransformation : public LayerTransformation {
|
||||
public:
|
||||
UnsqueezeTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
bool transform(TransformationContext& context, ngraph::pattern::Matcher &m) const override;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,28 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <vector>
|
||||
|
||||
#include "split.hpp"
|
||||
#include "ngraph/node.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API VariadicSplitTransformation : public SplitTransformation {
|
||||
public:
|
||||
VariadicSplitTransformation(const Params& params);
|
||||
void registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const override;
|
||||
protected:
|
||||
std::vector<size_t> getConstSplitLengths(
|
||||
const OutputVector& inputs,
|
||||
const ngraph::Shape& constShape,
|
||||
const size_t outputSize) const override;
|
||||
};
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,34 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformation_context.hpp"
|
||||
#include "layer_transformation.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
class TRANSFORMATIONS_API WeightableLayerTransformation : public LayerTransformation{
|
||||
public:
|
||||
WeightableLayerTransformation(const Params& params);
|
||||
bool canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const override;
|
||||
bool isQuantized(std::shared_ptr<Node> layer, bool isReshape) const noexcept;
|
||||
bool isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept override;
|
||||
|
||||
protected:
|
||||
DataPrecision decomposeFakeQuantizeForWeightsPath(std::shared_ptr<Node> weightableLayer) const;
|
||||
static bool isGroup(const std::shared_ptr<Node>& node);
|
||||
static bool isDepthwise(const std::shared_ptr<Node>& node);
|
||||
|
||||
std::shared_ptr<opset1::FakeQuantize> getFakeQuantizeOnWeights(const std::shared_ptr<Node>& node) const;
|
||||
DataPrecision getDataPrecisionOnWeights(const std::shared_ptr<Node>& node) const;
|
||||
};
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,75 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
/**
|
||||
* @brief Defines fused names attribute
|
||||
* @file fused_names_attribute.hpp
|
||||
*/
|
||||
|
||||
#include <assert.h>
|
||||
#include <functional>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <set>
|
||||
|
||||
#include <ngraph/node.hpp>
|
||||
#include <ngraph/variant.hpp>
|
||||
#include <transformations_visibility.hpp>
|
||||
|
||||
|
||||
namespace ngraph {
|
||||
|
||||
/**
|
||||
* @ingroup ie_runtime_attr_api
|
||||
* @brief Dequantization class represents runtime info attribute that indicates
|
||||
* whether the operation is dequantization
|
||||
*/
|
||||
class TRANSFORMATIONS_API DequantizationAttr {
|
||||
private:
|
||||
std::string dequantization_attribute;
|
||||
|
||||
public:
|
||||
/**
|
||||
* A default constructor
|
||||
*/
|
||||
DequantizationAttr() = default;
|
||||
|
||||
/**
|
||||
* @brief Constructs a new object consisting of a single name *
|
||||
* @param[in] name The name
|
||||
*/
|
||||
explicit DequantizationAttr(const std::string& name) : dequantization_attribute(name) {}
|
||||
|
||||
/**
|
||||
* @brief return string with dequantization value
|
||||
*/
|
||||
std::string getDequantizationAttr() const;
|
||||
};
|
||||
|
||||
extern template class TRANSFORMATIONS_API VariantImpl<DequantizationAttr>;
|
||||
|
||||
template<>
|
||||
class TRANSFORMATIONS_API VariantWrapper<DequantizationAttr> : public VariantImpl<DequantizationAttr> {
|
||||
public:
|
||||
static constexpr VariantTypeInfo type_info{"DEQUANTIZATION", 0};
|
||||
|
||||
const VariantTypeInfo &get_type_info() const override {
|
||||
return type_info;
|
||||
}
|
||||
|
||||
VariantWrapper(const value_type &value) : VariantImpl<value_type>(value) {}
|
||||
|
||||
std::shared_ptr<ngraph::Variant> merge(const ngraph::NodeVector & nodes) override;
|
||||
|
||||
std::shared_ptr<ngraph::Variant> init(const std::shared_ptr<ngraph::Node> & node) override;
|
||||
};
|
||||
|
||||
/**
|
||||
* @ingroup ie_runtime_attr_api
|
||||
* @brief getPrimitivesPriority return string with dequantization value
|
||||
* @param[in] node The node will be used to get Dequantization attribute
|
||||
*/
|
||||
TRANSFORMATIONS_API std::string getDequantization(const std::shared_ptr<ngraph::Node>& node);
|
||||
|
||||
} // namespace ngraph
|
@ -22,6 +22,7 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const element::Type output_type,
|
||||
const size_t& group,
|
||||
const PadType& auto_pad)
|
||||
: Op({data_batch, filters})
|
||||
@ -30,10 +31,53 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
, m_pads_begin(pads_begin)
|
||||
, m_pads_end(pads_end)
|
||||
, m_auto_pad(auto_pad)
|
||||
, m_group(group) {
|
||||
, m_group(group)
|
||||
, m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& filters,
|
||||
const Output<Node>& bias,
|
||||
const Strides& strides,
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const element::Type output_type,
|
||||
const size_t& group,
|
||||
const PadType& auto_pad)
|
||||
: Op({data_batch, filters, bias})
|
||||
, m_strides(strides)
|
||||
, m_dilations(dilations)
|
||||
, m_pads_begin(pads_begin)
|
||||
, m_pads_end(pads_end)
|
||||
, m_auto_pad(auto_pad)
|
||||
, m_group(group)
|
||||
, m_output_type(output_type) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
// KMB compilation support
|
||||
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& filters,
|
||||
const Strides& strides,
|
||||
const Strides& dilations,
|
||||
const CoordinateDiff& pads_begin,
|
||||
const CoordinateDiff& pads_end,
|
||||
const size_t& group,
|
||||
const PadType& auto_pad)
|
||||
: Op({data_batch, filters})
|
||||
, m_strides(strides)
|
||||
, m_dilations(dilations)
|
||||
, m_pads_begin(pads_begin)
|
||||
, m_pads_end(pads_end)
|
||||
, m_auto_pad(auto_pad)
|
||||
, m_group(group)
|
||||
, m_output_type(element::undefined) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
// KMB compilation support
|
||||
op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
const Output<Node>& filters,
|
||||
const Output<Node>& bias,
|
||||
@ -49,7 +93,8 @@ op::ConvolutionIE::ConvolutionIE(const Output<Node>& data_batch,
|
||||
, m_pads_begin(pads_begin)
|
||||
, m_pads_end(pads_end)
|
||||
, m_auto_pad(auto_pad)
|
||||
, m_group(group) {
|
||||
, m_group(group)
|
||||
, m_output_type(element::undefined) {
|
||||
constructor_validate_and_infer_types();
|
||||
}
|
||||
|
||||
@ -59,23 +104,12 @@ void op::ConvolutionIE::validate_and_infer_types() {
|
||||
PartialShape filters_shape = get_input_partial_shape(1);
|
||||
element::Type filters_et = get_input_element_type(1);
|
||||
|
||||
element::Type result_et;
|
||||
|
||||
NODE_VALIDATION_CHECK(
|
||||
this,
|
||||
element::Type::merge(result_et, data_batch_et, filters_et),
|
||||
"Element types for data batch and filters do not match (data batch element type: ",
|
||||
data_batch_et,
|
||||
", filters element type: ",
|
||||
filters_et,
|
||||
").");
|
||||
|
||||
PartialShape result_shape{PartialShape::dynamic()};
|
||||
|
||||
// In case if number of groups greater than 1 and channel dimension is dynamic we can't calculate output shape
|
||||
if (m_group > 1) {
|
||||
if (data_batch_shape.rank().is_dynamic() || data_batch_shape[1].is_dynamic()) {
|
||||
set_output_type(0, result_et, result_shape);
|
||||
set_output_type(0, m_output_type, result_shape);
|
||||
return;
|
||||
} else {
|
||||
// Update channel dimension according to groups count
|
||||
@ -109,7 +143,7 @@ void op::ConvolutionIE::validate_and_infer_types() {
|
||||
m_strides,
|
||||
m_dilations);
|
||||
|
||||
set_output_type(0, result_et, result_shape);
|
||||
set_output_type(0, m_output_type, result_shape);
|
||||
}
|
||||
|
||||
shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVector & new_args) const {
|
||||
@ -120,6 +154,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
|
||||
m_dilations,
|
||||
m_pads_begin,
|
||||
m_pads_end,
|
||||
m_output_type,
|
||||
m_group,
|
||||
m_auto_pad);
|
||||
} else if (new_args.size() == 3) {
|
||||
@ -130,6 +165,7 @@ shared_ptr<Node> op::ConvolutionIE::clone_with_new_inputs(const ngraph::OutputVe
|
||||
m_dilations,
|
||||
m_pads_begin,
|
||||
m_pads_end,
|
||||
m_output_type,
|
||||
m_group,
|
||||
m_auto_pad);
|
||||
}
|
||||
|
@ -36,6 +36,32 @@ std::pair<std::shared_ptr<A>, std::shared_ptr<B>> parse_eltwise_inputs(std::shar
|
||||
return {eltwise, constant};
|
||||
}
|
||||
|
||||
template <class Conv>
|
||||
bool IsConvInLowPrecision(const std::shared_ptr<Conv>& conv) {
|
||||
if (!ngraph::is_type<ngraph::op::ConvolutionIE>(conv)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto isLowPrecision = [](const std::shared_ptr<ngraph::Node>& node, const size_t index) {
|
||||
const ngraph::element::Type inputType = node->get_input_element_type(index);
|
||||
return (inputType == ngraph::element::i8) || (inputType == ngraph::element::u8);
|
||||
};
|
||||
|
||||
// Convolution operation has to be executed in INT8 if ...
|
||||
if (isLowPrecision(conv, 0) && isLowPrecision(conv, 1)) {
|
||||
// ... INT8 on activations && INT8 on weights
|
||||
return true;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::Subtract> subtract = ngraph::as_type_ptr<ngraph::opset1::Subtract>(conv->get_input_node_shared_ptr(0));
|
||||
if (subtract == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// ... INT8 on activations with asymmetric quantization && INT8 on weights
|
||||
return isLowPrecision(subtract, 0) && isLowPrecision(subtract, 1) && isLowPrecision(conv, 1);
|
||||
}
|
||||
|
||||
template <class Conv>
|
||||
ngraph::graph_rewrite_callback get_callback() {
|
||||
ngraph::graph_rewrite_callback callback = [](ngraph::pattern::Matcher &m) {
|
||||
@ -95,7 +121,8 @@ ngraph::graph_rewrite_callback get_callback() {
|
||||
new_bias = std::make_shared<ngraph::opset1::Add>(final_const, m_conv->input_value(2));
|
||||
}
|
||||
new_conv = m_conv->clone_with_new_inputs({m_conv->input_value(0), m_conv->input_value(1), new_bias});
|
||||
} else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise)) {
|
||||
} else if (std::is_same<Conv, ngraph::op::ConvolutionIE>() && std::dynamic_pointer_cast<ngraph::opset1::Multiply>(eltwise) &&
|
||||
!IsConvInLowPrecision(m_conv)) {
|
||||
// Fuse: ConvolutionIE->Mul
|
||||
auto weights_shape = m_conv->input(1).get_shape();
|
||||
|
||||
|
@ -44,10 +44,18 @@ ngraph::pass::AddMultiplyFusion::AddMultiplyFusion() {
|
||||
auto mul = label_to_output[m_mul].get_node_shared_ptr();
|
||||
auto add = label_to_output[m_add].get_node_shared_ptr();
|
||||
|
||||
if (m_transformation_callback(mul)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
Output<Node> input = label_to_output[m_data];
|
||||
Output<Node> mul_const = label_to_output[m_mul_constant];
|
||||
Output<Node> add_const = label_to_output[m_add_constant];
|
||||
|
||||
if ((input.get_element_type() != mul_const.get_element_type()) || (add_const.get_element_type() != mul_const.get_element_type())) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// Replace Add->Multiply with Multiply->Add
|
||||
// As new Multiply can be fused with operation above it we add this Multiply
|
||||
// to the list of operations that will be used in additional matching.
|
||||
|
@ -161,6 +161,7 @@ bool ngraph::pass::ConvertPrecision::run_on_function(std::shared_ptr<ngraph::Fun
|
||||
// If output type mismatch given type we try to fuse type into this operation
|
||||
// otherwise we insert Convert operation.
|
||||
for (auto &node : f->get_ordered_ops()) {
|
||||
m_transformation_callback(node);
|
||||
// Recursively apply transformation for sub-graph based operations
|
||||
if (auto sub_graph_node = std::dynamic_pointer_cast<op::util::SubGraphOp>(node)) {
|
||||
if (auto sub_graph = sub_graph_node->get_function()) {
|
||||
|
@ -0,0 +1,203 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/add.hpp"
|
||||
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
#include "ngraph_ops/type_relaxed.hpp"
|
||||
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
#include "transformations/low_precision/common/dequantization_op.hpp"
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
std::shared_ptr<opset1::Subtract> replaceToSubtract(const std::shared_ptr<Node>& op) {
|
||||
// TODO: separate this part to standalone transformation: AddToSubtractTransformation
|
||||
// motivation:
|
||||
// - single responsibility
|
||||
// - keep AddTransformation and AddToSubtractTransformation transformations independent and optional
|
||||
const auto add = as_type_ptr<opset1::Add>(op);
|
||||
if (add == nullptr) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
// TODO: use general way from getDequantization: is eltwise with Constant
|
||||
const int constBranchIndex = is_type<opset1::Constant>(add->get_input_node_ptr(0)) ?
|
||||
0 :
|
||||
(is_type<opset1::Constant>(add->get_input_node_ptr(1)) ? 1 : -1);
|
||||
if (constBranchIndex == -1) {
|
||||
return nullptr;
|
||||
}
|
||||
const size_t dataBranchIndex = constBranchIndex == 0 ? 1ul : 0;
|
||||
|
||||
const auto parent = add->get_input_node_shared_ptr(dataBranchIndex);
|
||||
if (is_type<opset1::Convolution>(parent) ||
|
||||
is_type<opset1::GroupConvolution>(parent) ||
|
||||
(is_type<opset1::MatMul>(parent) &&
|
||||
(is_type<opset1::Constant>(parent->get_input_node_ptr(0)) || is_type<opset1::Constant>(parent->get_input_node_ptr(1))))) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
auto constant = fold<opset1::Negative>(add->get_input_node_shared_ptr(constBranchIndex));
|
||||
auto constOutput = constant->output(0);
|
||||
|
||||
const auto subtract = std::make_shared<DequantizationSubtract>(
|
||||
add->get_input_node_shared_ptr(dataBranchIndex),
|
||||
constOutput,
|
||||
add->get_autob());
|
||||
NetworkHelper::copyInfo(add, subtract);
|
||||
|
||||
replace_node(add, subtract);
|
||||
return subtract;
|
||||
}
|
||||
|
||||
std::shared_ptr<opset1::Subtract> fuseWithSubtract(const std::shared_ptr<Node>& op) {
|
||||
const auto add = as_type_ptr<opset1::Add>(op);
|
||||
if ((add == nullptr) ||
|
||||
!is_type<opset1::Subtract>(add->get_input_node_shared_ptr(0)) ||
|
||||
// TODO: use general way from getDequantization: is eltwise with Constant
|
||||
!is_type<opset1::Constant>(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1))) {
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
const auto newSubConst = fold<opset1::Subtract>(
|
||||
add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(1),
|
||||
add->get_input_node_shared_ptr(1));
|
||||
|
||||
const auto newSubtract = std::make_shared<op::TypeRelaxed<DequantizationSubtract>>(
|
||||
std::vector<element::Type>{element::f32, element::f32},
|
||||
std::vector<element::Type>{ element::f32 },
|
||||
ngraph::op::TemporaryReplaceOutputType(add->get_input_node_shared_ptr(0)->get_input_node_shared_ptr(0), element::f32).get(),
|
||||
ngraph::op::TemporaryReplaceOutputType(newSubConst, element::f32).get());
|
||||
NetworkHelper::copyInfo(add, newSubtract);
|
||||
|
||||
replace_node(add, newSubtract);
|
||||
return newSubtract;
|
||||
}
|
||||
|
||||
void AddTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
|
||||
addSingleNodePattern<opset1::Add>(pass, context);
|
||||
}
|
||||
|
||||
bool AddTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
|
||||
std::shared_ptr<opset1::Add> op = as_type_ptr<opset1::Add>(m.get_match_root());
|
||||
if (!canBeTransformed(context, op)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> addNode = separateInStandaloneBranch(op);
|
||||
std::shared_ptr<opset1::Add> add = as_type_ptr<opset1::Add>(addNode);
|
||||
|
||||
const int fullPathIndex = getNotEmpty(add);
|
||||
std::shared_ptr<Node> newMultiply;
|
||||
std::shared_ptr<Node> newAddOrSubtract;
|
||||
|
||||
if (fullPathIndex == -1) {
|
||||
// swap constant multiply and add and possibly fuse to subtract
|
||||
const auto multiplyBranch = getMultiplyConstBranch(add);
|
||||
|
||||
if (multiplyBranch.first == -1) {
|
||||
NetworkHelper::foldDequantization(addNode, 0);
|
||||
NetworkHelper::foldDequantization(addNode, 1);
|
||||
return false;
|
||||
}
|
||||
|
||||
newMultiply = NetworkHelper::swapMultiplyAndAdd(add, multiplyBranch.first);
|
||||
|
||||
if (is_type<opset1::Add>(newMultiply->get_input_node_shared_ptr(0))) {
|
||||
newAddOrSubtract = newMultiply->get_input_node_shared_ptr(0);
|
||||
|
||||
auto subtract = fuseWithSubtract(newAddOrSubtract);
|
||||
if (subtract != nullptr) {
|
||||
newAddOrSubtract = subtract;
|
||||
}
|
||||
|
||||
subtract = replaceToSubtract(newAddOrSubtract);
|
||||
if (subtract != nullptr) {
|
||||
newAddOrSubtract = subtract;
|
||||
}
|
||||
} else {
|
||||
newAddOrSubtract = newMultiply;
|
||||
}
|
||||
} else {
|
||||
// dequantizations are on both branches
|
||||
const int emptyPathIndex = fullPathIndex == 0 ? 1 : 0;
|
||||
|
||||
FakeQuantizeDequantization dequantizationEmptyPath = NetworkHelper::getDequantization(add, emptyPathIndex);
|
||||
if (updatePrecisions && !dequantizationEmptyPath.empty() && !dequantizationEmptyPath.isLowPrecision()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> subtractEmptyPathValues;
|
||||
std::shared_ptr<Node> multiplyEmptyPathValues;
|
||||
std::tie(subtractEmptyPathValues, multiplyEmptyPathValues) = NetworkHelper::createEmptyValues(dequantizationEmptyPath);
|
||||
|
||||
FakeQuantizeDequantization dequantizationFullPath = NetworkHelper::getDequantization(add, fullPathIndex);
|
||||
if (updatePrecisions && !dequantizationFullPath.empty() && !dequantizationFullPath.isLowPrecision()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::shared_ptr<Node> subtractFullPathValues;
|
||||
std::shared_ptr<Node> multiplyFullPathValues;
|
||||
std::tie(subtractFullPathValues, multiplyFullPathValues) = NetworkHelper::createEmptyValues(dequantizationFullPath);
|
||||
|
||||
// calculation
|
||||
// before: Y = (SC1 * (X1 - SH1)) + (SC2 * (X2 - SH2))
|
||||
// after : Y = SC2 * ( SC1' * (X1 - SH1') + X2 ) , where :
|
||||
// SC1' = SC1 / SC2
|
||||
// SH1' = SH1 + SC2 * SH2 / SC1
|
||||
std::shared_ptr<Node> newSubtractFullPathValues = fold<opset1::Add>(
|
||||
subtractFullPathValues,
|
||||
fold<opset1::Divide>(
|
||||
fold<opset1::Multiply>(subtractEmptyPathValues, multiplyEmptyPathValues),
|
||||
multiplyFullPathValues));
|
||||
|
||||
std::shared_ptr<Node> newMultiplyFullPathValues = fold<opset1::Divide>(multiplyFullPathValues, multiplyEmptyPathValues);
|
||||
|
||||
if (NetworkHelper::isZeroConst(newSubtractFullPathValues)) {
|
||||
newSubtractFullPathValues = nullptr;
|
||||
}
|
||||
|
||||
// graph update
|
||||
std::vector<std::shared_ptr<Node>> inputs{ {}, {} };
|
||||
auto fullPathInput = dequantizationFullPath.convert == nullptr ? dequantizationFullPath.data : dequantizationFullPath.convert;
|
||||
|
||||
inputs[emptyPathIndex] = dequantizationEmptyPath.data.get_node_shared_ptr();
|
||||
inputs[fullPathIndex] = std::make_shared<DequantizationMultiply>(
|
||||
newSubtractFullPathValues == nullptr ?
|
||||
fullPathInput :
|
||||
std::make_shared<DequantizationSubtract>(fullPathInput, newSubtractFullPathValues),
|
||||
newMultiplyFullPathValues);
|
||||
|
||||
newAddOrSubtract = std::make_shared<op::TypeRelaxed<opset1::Add>>(
|
||||
std::vector<element::Type>{element::f32, element::f32}, std::vector<element::Type>{ element::f32 },
|
||||
ngraph::op::TemporaryReplaceOutputType(inputs[0], element::f32).get(),
|
||||
ngraph::op::TemporaryReplaceOutputType(inputs[1], element::f32).get());
|
||||
newMultiply = std::make_shared<DequantizationMultiply>(newAddOrSubtract, multiplyEmptyPathValues);
|
||||
|
||||
replace_node(add, newMultiply);
|
||||
NetworkHelper::copyInfo(add, newAddOrSubtract);
|
||||
}
|
||||
|
||||
updateOutput(context, newMultiply, newAddOrSubtract);
|
||||
|
||||
if (fullPathIndex != -1) {
|
||||
std::shared_ptr<Node> node = add;
|
||||
NetworkHelper::foldDequantization(node, fullPathIndex);
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,80 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/avg_pool.hpp"
|
||||
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
AvgPoolTransformation::AvgPoolTransformation(const Params& params) : LayerTransformation(params) {
|
||||
}
|
||||
|
||||
void AvgPoolTransformation::registerMatcherIn(GraphRewrite &pass, TransformationContext &context) const {
|
||||
addPattern(
|
||||
pass,
|
||||
context,
|
||||
make_op_pattern<opset1::AvgPool>({ make_op_label<opset1::Multiply>() }));
|
||||
}
|
||||
|
||||
bool AvgPoolTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
|
||||
if (!canBeTransformed(context, m.get_match_root())) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const std::shared_ptr<Node> pooling = separateInStandaloneBranch(m.get_match_root());
|
||||
|
||||
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(pooling);
|
||||
|
||||
bool updatePrecision;
|
||||
// issue #40768
|
||||
if ((children.size() == 1ul) && (!this->layerTransformationsManager->isQuantized(children[0]))) {
|
||||
updatePrecision = false;
|
||||
} else {
|
||||
updatePrecision = false;
|
||||
// NOTE: This check was added for models that don't have FQ after AvgPool
|
||||
// They will have transparent precision as it was in old LPT.
|
||||
for (const auto& child : children) {
|
||||
if (!is_type<opset1::FakeQuantize>(child)) {
|
||||
updatePrecision = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
moveDequantizationAfter(context, pooling, NetworkHelper::getDequantization(pooling), updatePrecision);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool AvgPoolTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> operation) const {
|
||||
if (!LayerTransformation::canBeTransformed(context, operation)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto dequantization = NetworkHelper::getDequantization(operation);
|
||||
|
||||
return !!dequantization.multiply;
|
||||
}
|
||||
|
||||
bool AvgPoolTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
|
||||
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(layer);
|
||||
// NOTE: This check was added for models that don't have FQ after AvgPool
|
||||
// They will have transparent precision as it was in old LPT.
|
||||
for (const auto& child : children) {
|
||||
if (!is_type<opset1::FakeQuantize>(child)) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,97 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/clamp.hpp"
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
ClampTransformation::ClampTransformation(const Params& params) : LayerTransformation(params) {}
|
||||
|
||||
void ClampTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
|
||||
addPattern(pass,
|
||||
context,
|
||||
make_op_pattern<opset1::Clamp>({ make_op_label<opset1::Multiply>() }));
|
||||
}
|
||||
|
||||
bool ClampTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher& m) const {
|
||||
auto subWithTheSameValues = [](std::shared_ptr<ngraph::opset1::Subtract> sub) {
|
||||
if (sub == nullptr) {
|
||||
return false;
|
||||
}
|
||||
const auto constant = as_type_ptr<ngraph::opset1::Constant>(sub->get_input_node_shared_ptr(1));
|
||||
|
||||
if (constant == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return NetworkHelper::isScalarLike(constant);
|
||||
};
|
||||
|
||||
if (!canBeTransformed(context, m.get_match_root())) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const std::shared_ptr<Node> clamp = separateInStandaloneBranch(m.get_match_root());
|
||||
const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(clamp);
|
||||
|
||||
const bool moveSubtract = subWithTheSameValues(dequantization.subtract);
|
||||
if (!moveSubtract && !canSubtractBeHandled(clamp, dequantization)) {
|
||||
return false;
|
||||
}
|
||||
const auto newClamp = as_type_ptr<opset1::Clamp>(moveDequantizationAfter(context, clamp, dequantization, false, moveSubtract));
|
||||
double min = newClamp->get_min();
|
||||
double max = newClamp->get_max();
|
||||
|
||||
if (dequantization.multiply != nullptr) {
|
||||
double scale = as_type_ptr<opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
|
||||
if (scale < 0.0) {
|
||||
std::swap(min, max);
|
||||
}
|
||||
min /= scale;
|
||||
max /= scale;
|
||||
}
|
||||
|
||||
if (dequantization.subtract != nullptr && moveSubtract) {
|
||||
double shift = as_type_ptr<opset1::Constant>(dequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<double>()[0];
|
||||
min += shift;
|
||||
max += shift;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::Clamp> replacement = std::make_shared<ngraph::opset1::Clamp>(newClamp->get_input_node_shared_ptr(0), min, max);
|
||||
replace_node(newClamp, replacement);
|
||||
|
||||
element::Type outputClampType = dequantization.multiply ?
|
||||
dequantization.multiply->get_output_element_type(0) :
|
||||
dequantization.subtract->get_output_element_type(0);
|
||||
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecision(replacement, outputClampType);
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ClampTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> op) const {
|
||||
if (!LayerTransformation::canBeTransformed(context, op)) {
|
||||
return false;
|
||||
}
|
||||
const FakeQuantizeDequantization dequantization = NetworkHelper::getDequantization(op);
|
||||
|
||||
const auto mulConst = as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->get_input_node_shared_ptr(1));
|
||||
if (mulConst == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
return NetworkHelper::isScalarLike(mulConst);
|
||||
}
|
||||
|
||||
bool ClampTransformation::isPrecisionPreserved(std::shared_ptr<Node> layer) const noexcept {
|
||||
return false;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,103 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
|
||||
#include <memory>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
FakeQuantizeDequantization::FakeQuantizeDequantization() {}
|
||||
|
||||
FakeQuantizeDequantization::FakeQuantizeDequantization(
|
||||
Output<Node> data,
|
||||
std::shared_ptr<opset1::Convert> convert,
|
||||
std::shared_ptr<opset1::Subtract> subtract,
|
||||
std::shared_ptr<opset1::Multiply> multiply) :
|
||||
data(data),
|
||||
convert(convert),
|
||||
subtract(subtract),
|
||||
multiply(multiply) {
|
||||
}
|
||||
|
||||
bool FakeQuantizeDequantization::empty() const {
|
||||
return (convert == nullptr) && (subtract == nullptr) && (multiply == nullptr);
|
||||
}
|
||||
|
||||
bool FakeQuantizeDequantization::isShared() const {
|
||||
if ((convert != nullptr) && (convert->get_output_target_inputs(0).size() > 1ul)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if ((subtract != nullptr) && (subtract->get_output_target_inputs(0).size() > 1ul)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
if ((multiply != nullptr) && (multiply->get_output_target_inputs(0).size() > 1ul)) {
|
||||
return true;
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
bool FakeQuantizeDequantization::isLowPrecision() const {
|
||||
return (data.get_element_type() == element::i8) || (data.get_element_type() == element::u8);
|
||||
}
|
||||
|
||||
bool FakeQuantizeDequantization::checkElementwise(const std::shared_ptr<ngraph::Node>& dequantizationElementwise) {
|
||||
const ngraph::PartialShape partialShape = dequantizationElementwise->get_input_partial_shape(0);
|
||||
if (partialShape.is_dynamic()) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::shared_ptr<opset1::Constant> constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(1));
|
||||
if (constant == nullptr) {
|
||||
constant = as_type_ptr<opset1::Constant>(dequantizationElementwise->get_input_node_shared_ptr(0));
|
||||
}
|
||||
if (constant == nullptr) {
|
||||
THROW_IE_LPT_EXCEPTION(*dequantizationElementwise) << "unexpected operation type " <<
|
||||
dequantizationElementwise->get_type_info().name << " on the second branch";
|
||||
}
|
||||
|
||||
const ngraph::Shape constShape = constant->get_output_shape(0);
|
||||
if ((constShape.size() > 5ul)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if ((constShape.size() <= 1ul) || (std::all_of(constShape.begin(), constShape.end(), [](const size_t value) { return value == 1ul; }))) {
|
||||
return true;
|
||||
}
|
||||
|
||||
const ngraph::Shape shape = partialShape.to_shape();
|
||||
if (constShape.size() == shape.size()) {
|
||||
if ((constShape[0] != 1ul) || (constShape[1] != shape[1])) {
|
||||
return false;
|
||||
}
|
||||
for (size_t i = 2ul; i < constShape.size(); ++i) {
|
||||
if (constShape[i] != 1ul) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
} else if (constShape.size() == (shape.size() - 1)) {
|
||||
if (constShape[0] != shape[1]) {
|
||||
return false;
|
||||
}
|
||||
for (size_t i = 1ul; i < constShape.size(); ++i) {
|
||||
if (constShape[i] != 1ul) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,179 @@
|
||||
// Copyright (C) 2018-2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include <transformations/low_precision/common/subgraph.hpp>
|
||||
|
||||
#include <algorithm>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_set>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/rt_info.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
#include "transformations/low_precision/quantization_details.hpp"
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
bool isQuantizationPerChannel(const std::shared_ptr<ngraph::Node>& node) {
|
||||
if (node->outputs().size() > 1ul) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const auto inputs = ngraph::pass::low_precision::NetworkHelper::getInputs(node);
|
||||
for (const auto& input : inputs) {
|
||||
if (ngraph::is_type<opset1::Constant>(input.get_node())) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const Shape& in = input.get_shape();
|
||||
const Shape& out = node->output(0).get_shape();
|
||||
for (size_t i = 0; i < 2; ++i) {
|
||||
if (in[i] != out[i]) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
Subgraph::Subgraph(ngraph::pass::ILayerTransformationsManager* layerTransformationsManager) : layerTransformationsManager(layerTransformationsManager) {
|
||||
}
|
||||
|
||||
bool Subgraph::fillSubgraphForQuantization(
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize>& fakeQuantize,
|
||||
std::unordered_set<std::string>& handledLayers) {
|
||||
quantizationLayers.push_back(fakeQuantize);
|
||||
handledLayers.insert(fakeQuantize->get_friendly_name());
|
||||
layers.emplace(fakeQuantize->get_friendly_name(), fakeQuantize);
|
||||
|
||||
for (size_t index = 0; index < fakeQuantize->get_output_size(); ++index) {
|
||||
const auto childInputs = fakeQuantize->get_output_target_inputs(index);
|
||||
for (const auto childInput : childInputs) {
|
||||
const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
|
||||
if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
|
||||
if (concatChild != nullptr) {
|
||||
if (!fillSubgraphForConcat(concatChild, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
|
||||
if (fakeQuantizeChild != nullptr) {
|
||||
//
|
||||
} else {
|
||||
if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
|
||||
if (!fillSubgraphForIntermediate(child, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
bool Subgraph::fill(const std::shared_ptr<ngraph::Node>& layer, std::unordered_set<std::string>& handledLayers) {
|
||||
// if at least one parent is handled incorrectly then subgraph is not in low precision
|
||||
for (size_t index = 0; index < layer->get_input_size(); ++index) {
|
||||
const std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(index);
|
||||
if (handledLayers.find(parent->get_friendly_name()) != handledLayers.end()) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::Concat> concatParent = ngraph::as_type_ptr<ngraph::opset1::Concat>(parent);
|
||||
if (concatParent != nullptr) {
|
||||
if (!fillSubgraphForConcat(concatParent, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeParent = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
|
||||
if (fakeQuantizeParent != nullptr) {
|
||||
if (!fillSubgraphForQuantization(fakeQuantizeParent, handledLayers)) {
|
||||
//
|
||||
}
|
||||
} else {
|
||||
const std::shared_ptr<ngraph::opset1::Constant> constant = ngraph::as_type_ptr<ngraph::opset1::Constant>(parent);
|
||||
if (constant != nullptr) {
|
||||
//
|
||||
} else {
|
||||
if (layerTransformationsManager->isPrecisionPreserved(parent) && isQuantizationPerChannel(parent)) {
|
||||
if (!fillSubgraphForIntermediate(parent, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: if at least one child was handled correctly then subgraph is low precision
|
||||
for (size_t index = 0; index < layer->get_output_size(); ++index) {
|
||||
const auto childInputs = layer->get_output_target_inputs(index);
|
||||
for (const auto childInput : childInputs) {
|
||||
const std::shared_ptr<ngraph::Node> child = childInput.get_node()->shared_from_this();
|
||||
|
||||
if (handledLayers.find(child->get_friendly_name()) != handledLayers.end()) {
|
||||
continue;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::Concat> concatChild = ngraph::as_type_ptr<ngraph::opset1::Concat>(child);
|
||||
if (concatChild != nullptr) {
|
||||
if (!fillSubgraphForConcat(concatChild, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
} else {
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeChild = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(child);
|
||||
if (fakeQuantizeChild != nullptr) {
|
||||
//
|
||||
} else if (layerTransformationsManager->isPrecisionPreserved(child) && isQuantizationPerChannel(child)) {
|
||||
if (!fillSubgraphForIntermediate(child, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
bool Subgraph::fillSubgraphForIntermediate(const std::shared_ptr<ngraph::Node>& intermediate, std::unordered_set<std::string>& handledLayers) {
|
||||
handledLayers.insert(intermediate->get_friendly_name());
|
||||
layers.emplace(intermediate->get_friendly_name(), intermediate);
|
||||
|
||||
return fill(intermediate, handledLayers);
|
||||
}
|
||||
|
||||
bool Subgraph::empty() const {
|
||||
return quantizationLayers.empty();
|
||||
}
|
||||
|
||||
bool Subgraph::fillSubgraphForConcat(const std::shared_ptr<ngraph::opset1::Concat>& concat, std::unordered_set<std::string>& handledLayers) {
|
||||
concatLayers.push_back(concat);
|
||||
handledLayers.insert(concat->get_friendly_name());
|
||||
layers.emplace(concat->get_friendly_name(), concat);
|
||||
|
||||
std::shared_ptr<ngraph::Node> node = concat;
|
||||
return fill(node, handledLayers);
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,428 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/concat.hpp"
|
||||
|
||||
#include <algorithm>
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <utility>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
#include "transformations/low_precision/common/subgraph.hpp"
|
||||
#include "transformations/low_precision/common/dequantization_op.hpp"
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
void ConcatTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
|
||||
addSingleNodePattern<opset1::Concat>(pass, context);
|
||||
}
|
||||
|
||||
bool ConcatTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
|
||||
std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
|
||||
if (!canBeTransformed(context, concat)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
|
||||
std::unordered_set<std::string> handledLayers;
|
||||
if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// precisions can be different
|
||||
ngraph::Node& quantizationLayer = *subgraph.quantizationLayers[0];
|
||||
std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer.shared_from_this());
|
||||
DataPrecision dataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
|
||||
if (dataPrecision.precision == ngraph::element::undefined) {
|
||||
return false;
|
||||
}
|
||||
|
||||
std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
|
||||
std::vector<QuantizationDetails> quantizationLayersDetails;
|
||||
|
||||
for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
|
||||
const std::shared_ptr<ngraph::Node> fakeQuantizeLayer = subgraph.quantizationLayers[i];
|
||||
|
||||
const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
|
||||
if (shape.size() < 4ul) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
|
||||
if (fq == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const QuantizationDetails& quantizationDetails = QuantizationDetails::getDetails(fq);
|
||||
quantizationLayersDetails.push_back(quantizationDetails);
|
||||
|
||||
const DataPrecision dataPrecision2 = getDataPrecision(subgraph.quantizationLayers[i]->shared_from_this(), quantizationDetails, false);
|
||||
if (dataPrecision2.precision == ngraph::element::undefined) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (dataPrecision.precision != dataPrecision2.precision) {
|
||||
// quantization levels are the same, difference can be in sign
|
||||
// wider interval (precision) is preferable: use signed if least one interval is signed
|
||||
dataPrecision = dataPrecision.precision.is_signed() ? dataPrecision : dataPrecision2;
|
||||
}
|
||||
}
|
||||
|
||||
if (dataPrecision.precision == ngraph::element::undefined) {
|
||||
return false;
|
||||
}
|
||||
|
||||
// per tensor scale is supported only
|
||||
if (quantizationLayersDetails.empty() || (quantizationLayersDetails[0].inputHighValues.size() != 1ul)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
FakeQuantizeDequantization dequantization;
|
||||
|
||||
if ((quantizationLayersDetails[0].inputHighValues.size() == 1)) {
|
||||
float outputLowValue = quantizationLayersDetails[0].outputLowValues[0];
|
||||
float outputHighValue = quantizationLayersDetails[0].outputHighValues[0];
|
||||
|
||||
for (size_t index = 0lu; index < subgraph.quantizationLayers.size(); index++) {
|
||||
const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
|
||||
if (outputLowValue > quantizationDetails.outputLowValues[0]) {
|
||||
outputLowValue = quantizationDetails.outputLowValues[0];
|
||||
}
|
||||
if (outputHighValue < quantizationDetails.outputHighValues[0]) {
|
||||
outputHighValue = quantizationDetails.outputHighValues[0];
|
||||
}
|
||||
}
|
||||
|
||||
if ((outputLowValue == 0.f) && (outputHighValue == 0.f)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const float maxOutputInterval = outputHighValue - outputLowValue;
|
||||
if (quantizedTensorAlignmentOnActivations == QuantizedTensorAlignment::UpdateLevel) {
|
||||
const size_t minLevels = getMinQuantizationLevels(
|
||||
dataPrecision,
|
||||
maxOutputInterval,
|
||||
quantizationLayersDetails,
|
||||
outputLowValue,
|
||||
outputHighValue);
|
||||
if (minLevels < this->minQuantizationLevels) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
// FQ -> SUB_quantization -> MUL_quantization -[INT8]-> SUB_dequantization -> MUL_dequantization ->
|
||||
const float quantizationMul = (dataPrecision.max - dataPrecision.min) / maxOutputInterval;
|
||||
const float dequantizationMul = maxOutputInterval / (dataPrecision.max - dataPrecision.min);
|
||||
|
||||
// FQ outputLowValue = dataPrecision.min * dequantizationMul - quantizationSub
|
||||
const float quantizationSub = outputLowValue - dataPrecision.min * dequantizationMul;
|
||||
const float dequantizationSub = std::round(-quantizationSub * quantizationMul);
|
||||
|
||||
// 1. get data for dequantization. Dequantization data will be used several times later.
|
||||
dequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
|
||||
dequantizationMul,
|
||||
dequantizationSub,
|
||||
subgraph.quantizationLayers[0]->get_output_element_type(0),
|
||||
subgraph.quantizationLayers[0]->get_output_shape(0),
|
||||
dataPrecision.precision,
|
||||
dataPrecision.min,
|
||||
dataPrecision.max);
|
||||
|
||||
for (int index = 0; index < subgraph.quantizationLayers.size(); index++) {
|
||||
std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantizeLayer = as_type_ptr<ngraph::opset1::FakeQuantize>(
|
||||
subgraph.quantizationLayers[index]->shared_from_this());
|
||||
|
||||
const QuantizationDetails& quantizationDetails = quantizationLayersDetails[index];
|
||||
|
||||
switch (quantizedTensorAlignmentOnActivations) {
|
||||
case QuantizedTensorAlignment::None: {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "not implemented: " << quantizedTensorAlignmentOnActivations;
|
||||
}
|
||||
case QuantizedTensorAlignment::UpdateLevel: {
|
||||
const float updatedOutputLowValue = (quantizationDetails.outputLowValues[0] - quantizationSub) * quantizationMul;
|
||||
const float updatedOutputHighValue = (quantizationDetails.outputHighValues[0] - quantizationSub) * quantizationMul;
|
||||
|
||||
// 2. update FakeQuantize - one time action
|
||||
std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
|
||||
fakeQuantizeLayer,
|
||||
updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
|
||||
roundf(updatedOutputLowValue),
|
||||
roundf(updatedOutputHighValue));
|
||||
|
||||
const size_t levels = static_cast<size_t>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
|
||||
newFakeQuantizeLayer->set_levels(levels);
|
||||
|
||||
subgraph.quantizationLayers[index] = newFakeQuantizeLayer;
|
||||
subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
|
||||
break;
|
||||
}
|
||||
default: {
|
||||
THROW_TRANSFORMATION_EXCEPTION << "unexpected value " << quantizedTensorAlignmentOnActivations;
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
return false;
|
||||
}
|
||||
|
||||
auto dequantizationValuesCallback = [&](
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
const std::string originalLayerName,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
|
||||
dequantizationsToConcatenate.push_back(dequantization);
|
||||
};
|
||||
|
||||
addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
|
||||
|
||||
if (updatePrecisions) {
|
||||
for (const auto it : subgraph.layers) {
|
||||
const std::shared_ptr<ngraph::Node>& node = it.second;
|
||||
if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node) != nullptr) {
|
||||
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
|
||||
} else {
|
||||
// set precision to explicitly to have updated precision during transformation
|
||||
for (size_t i = 0; i < node->get_output_size(); ++i) {
|
||||
node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
|
||||
context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ConcatTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ConcatTransformation::canBeTransformed(const TransformationContext& context, std::shared_ptr<Node> layer) const {
|
||||
std::shared_ptr<opset1::Concat> concat = as_type_ptr<opset1::Concat>(layer);
|
||||
return concat->get_axis() == 1ul;
|
||||
}
|
||||
|
||||
|
||||
void ConcatTransformation::addDequantizationLayers(
|
||||
TransformationContext& context,
|
||||
ngraph::pass::low_precision::Subgraph& subgraph,
|
||||
std::function<void(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
const std::string originalLayerName,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate)> getLayerDequantizationCallback) const {
|
||||
std::unordered_map<std::string, ngraph::Node*> outputs;
|
||||
for (size_t i = 0; i < context.function->get_output_size(); ++i) {
|
||||
ngraph::Node* node = context.function->get_output_op(i).get();
|
||||
if (node->get_input_size() != 1ul) {
|
||||
THROW_IE_LPT_EXCEPTION(*node) << "unexpected inputs count for result node";
|
||||
}
|
||||
|
||||
outputs.emplace(node->get_input_node_shared_ptr(0)->get_friendly_name(), node);
|
||||
}
|
||||
|
||||
std::unordered_map<std::string, std::shared_ptr<ngraph::Node>> notHandledSubgraphLayers = subgraph.layers;
|
||||
while (notHandledSubgraphLayers.size() != 0ul) {
|
||||
const auto layerIt = notHandledSubgraphLayers.begin();
|
||||
std::shared_ptr<ngraph::Node> layer = layerIt->second;
|
||||
notHandledSubgraphLayers.erase(layerIt);
|
||||
|
||||
std::vector<FakeQuantizeDequantization> layerDequantizations;
|
||||
|
||||
for (int i = 0; i < layer->get_output_size(); ++i) {
|
||||
const auto childInputs = layer->get_output_target_inputs(i);
|
||||
for (const auto childInput : childInputs) {
|
||||
ngraph::Node& child = *childInput.get_node();
|
||||
|
||||
if (subgraph.layers.find(child.get_friendly_name()) == subgraph.layers.end()) {
|
||||
if (layerDequantizations.size() == 0ul) {
|
||||
getLayerDequantizationCallback(layer, layer->get_friendly_name(), layerDequantizations);
|
||||
}
|
||||
|
||||
std::shared_ptr<ngraph::Node> source = layer->shared_from_this();
|
||||
{
|
||||
std::vector<std::shared_ptr<ngraph::Node>> convertNodes;
|
||||
std::vector<std::shared_ptr<ngraph::Node>> subtractNodes;
|
||||
std::vector<std::shared_ptr<ngraph::Node>> multiplyNodes;
|
||||
|
||||
if (layerDequantizations.size() > 1ul) {
|
||||
auto broadcastElementWiseConst = [](
|
||||
std::shared_ptr<ngraph::opset1::Constant> operation,
|
||||
const ngraph::Shape targetShape) -> std::shared_ptr<Node> {
|
||||
auto unsqueeze = ngraph::pass::low_precision::fold<ngraph::opset1::Unsqueeze>(
|
||||
operation->shared_from_this(),
|
||||
std::make_shared<ngraph::opset1::Constant>(element::i64, ngraph::Shape{ 4 }, std::vector<size_t>{ 0, 1, 2, 3 }));
|
||||
|
||||
auto targetShapeConst = std::make_shared<ngraph::opset1::Constant>(
|
||||
element::i64, ngraph::Shape{ targetShape.size() },
|
||||
targetShape);
|
||||
|
||||
auto broadcast = ngraph::pass::low_precision::fold<ngraph::opset1::Broadcast>(
|
||||
unsqueeze,
|
||||
targetShapeConst,
|
||||
ngraph::op::AutoBroadcastType::NUMPY);
|
||||
|
||||
return broadcast;
|
||||
};
|
||||
|
||||
bool allDequantizationShiftAreZero = true;
|
||||
bool allDequantizationMultiplyAreZero = true;
|
||||
for (FakeQuantizeDequantization dequantization : layerDequantizations) {
|
||||
if (dequantization.subtract != nullptr) {
|
||||
allDequantizationShiftAreZero = false;
|
||||
}
|
||||
if (dequantization.multiply != nullptr) {
|
||||
allDequantizationMultiplyAreZero = false;
|
||||
}
|
||||
}
|
||||
|
||||
for (size_t i = 0; i < layerDequantizations.size(); ++i) {
|
||||
const auto& dequantization = layerDequantizations[i];
|
||||
|
||||
convertNodes.push_back(dequantization.convert);
|
||||
|
||||
const ngraph::element::Type precision = dequantization.data.get_element_type();
|
||||
ngraph::Shape targetShape = dequantization.data.get_shape();
|
||||
|
||||
targetShape[0] = 1ul;
|
||||
for (size_t i = 2; i < targetShape.size(); ++i) {
|
||||
targetShape[i] = 1ul;
|
||||
}
|
||||
|
||||
if (!allDequantizationShiftAreZero) {
|
||||
subtractNodes.push_back(dequantization.subtract == nullptr ?
|
||||
std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 0.f })) :
|
||||
broadcastElementWiseConst(
|
||||
as_type_ptr<ngraph::opset1::Constant>(dequantization.subtract->input_value(1).get_node_shared_ptr()),
|
||||
targetShape));
|
||||
}
|
||||
|
||||
if (!allDequantizationMultiplyAreZero) {
|
||||
multiplyNodes.push_back(dequantization.multiply == nullptr ?
|
||||
std::make_shared<ngraph::opset1::Constant>(precision, targetShape, std::vector<float>({ 1.0f })) :
|
||||
broadcastElementWiseConst(
|
||||
as_type_ptr<ngraph::opset1::Constant>(dequantization.multiply->input_value(1).get_node_shared_ptr()),
|
||||
targetShape));
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// TODO: check constant shapes here - has to be scalar
|
||||
if (layerDequantizations[0].convert != nullptr) {
|
||||
convertNodes.push_back(layerDequantizations[0].convert);
|
||||
}
|
||||
|
||||
if (layerDequantizations[0].subtract != nullptr) {
|
||||
subtractNodes.push_back(layerDequantizations[0].subtract->input_value(1).get_node_shared_ptr());
|
||||
}
|
||||
|
||||
if (layerDequantizations[0].multiply != nullptr) {
|
||||
multiplyNodes.push_back(layerDequantizations[0].multiply->input_value(1).get_node_shared_ptr());
|
||||
}
|
||||
}
|
||||
|
||||
// TODO: the second place (first is FQ decomposition) where dequantization operations are inserted
|
||||
const std::shared_ptr<ngraph::Node> destination = child.shared_from_this();
|
||||
|
||||
if (!convertNodes.empty()) {
|
||||
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
|
||||
std::shared_ptr<ngraph::Node> convert =
|
||||
convertNodes[0]->clone_with_new_inputs({ destination->get_input_source_output(sourceOutputIdx) });
|
||||
insert_new_node_between(source, destination, convert);
|
||||
source = convert;
|
||||
}
|
||||
|
||||
// concatenation axis is 1
|
||||
if (!subtractNodes.empty()) {
|
||||
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
|
||||
std::shared_ptr<ngraph::opset1::Subtract> subtract = std::make_shared<DequantizationSubtract>(
|
||||
destination->get_input_source_output(sourceOutputIdx),
|
||||
NetworkHelper::toScalarIfPossible(subtractNodes.size() == 1ul ?
|
||||
subtractNodes[0] :
|
||||
ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(subtractNodes, 1)));
|
||||
insert_new_node_between(source, destination, subtract);
|
||||
source = subtract;
|
||||
}
|
||||
|
||||
if (!multiplyNodes.empty()) {
|
||||
const size_t sourceOutputIdx = NetworkHelper::getChildInputIndex(source, destination);
|
||||
std::shared_ptr<ngraph::opset1::Multiply> multiply = std::make_shared<DequantizationMultiply>(
|
||||
destination->get_input_source_output(sourceOutputIdx),
|
||||
NetworkHelper::toScalarIfPossible(multiplyNodes.size() == 1ul ?
|
||||
multiplyNodes[0] :
|
||||
ngraph::pass::low_precision::fold<ngraph::opset1::Concat>(multiplyNodes, 1)));
|
||||
insert_new_node_between(source, destination, multiply);
|
||||
source = multiply;
|
||||
}
|
||||
}
|
||||
|
||||
// first input is used
|
||||
const ngraph::element::Type precision = layerDequantizations[0].data.get_element_type();
|
||||
layer->set_output_type(0, precision, layer->get_output_partial_shape(0));
|
||||
|
||||
const auto it = outputs.find(layer->get_friendly_name());
|
||||
if (it != outputs.end()) {
|
||||
const std::string originalName = layer->get_friendly_name();
|
||||
const std::string newName = layer->get_friendly_name() + LayerTransformation::originalLayerPostfix;
|
||||
layer->set_friendly_name(newName);
|
||||
source->set_friendly_name(originalName);
|
||||
subgraph.layers[layer->get_friendly_name()] = layer;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
bool ConcatTransformation::isHandled(const TransformationContext& context, const std::vector<std::shared_ptr<ngraph::Node>>& quantizationOperations) {
|
||||
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : quantizationOperations) {
|
||||
if (context.quantizedFakeQuantizeNames.find(quantizationLayer->get_friendly_name()) != context.quantizedFakeQuantizeNames.end()) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
size_t ConcatTransformation::getMinQuantizationLevels(
|
||||
const DataPrecision& dataPrecision,
|
||||
const float maxOutputInterval,
|
||||
const std::vector<QuantizationDetails>& quantizationLayersDetails,
|
||||
const float outputLowValue,
|
||||
const float outputHighValue) const {
|
||||
size_t minLevels = std::numeric_limits<std::size_t>::max();
|
||||
for (const QuantizationDetails quantizationDetails : quantizationLayersDetails) {
|
||||
// if there is negative part then calculation is based on `outputLowValue` if not then on `outputHighValue` only
|
||||
const float updatedOutputLowValue = outputLowValue != 0.f ?
|
||||
(quantizationDetails.outputLowValues[0] / outputLowValue) * dataPrecision.min :
|
||||
(quantizationDetails.outputLowValues[0] / outputHighValue) * dataPrecision.max;
|
||||
|
||||
// if there is positive part then calculation is based on `outputHighValue` if not then on `outputLowValue` only
|
||||
const float updatedOutputHighValue = outputHighValue != 0.f ?
|
||||
(quantizationDetails.outputHighValues[0] / outputHighValue) * dataPrecision.max :
|
||||
(quantizationDetails.outputHighValues[0] / outputLowValue) * dataPrecision.min;
|
||||
|
||||
const int levels = static_cast<int>(fabs(roundf(updatedOutputHighValue) - roundf(updatedOutputLowValue)) + 1.0);
|
||||
if (minLevels > levels) {
|
||||
minLevels = levels;
|
||||
}
|
||||
}
|
||||
return minLevels;
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
@ -0,0 +1,232 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
#include "transformations/low_precision/concat_multi_channels.hpp"
|
||||
|
||||
#include <queue>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <ngraph/opsets/opset1.hpp>
|
||||
|
||||
#include "transformations/low_precision/common/fake_quantize_dequantization.hpp"
|
||||
#include "transformations/low_precision/common/ie_lpt_exception.hpp"
|
||||
#include "transformations/low_precision/common/subgraph.hpp"
|
||||
#include "transformations/low_precision/network_helper.hpp"
|
||||
|
||||
namespace ngraph {
|
||||
namespace pass {
|
||||
namespace low_precision {
|
||||
|
||||
bool ConcatMultiChannelsTransformation::isMultiChannel(const std::vector<std::shared_ptr<ngraph::opset1::Concat>>& concatLayers) const noexcept {
|
||||
for (const std::shared_ptr<ngraph::opset1::Concat>& concat : concatLayers) {
|
||||
const std::vector<std::shared_ptr<ngraph::Node>> children = getChildrenRecursivelyExceptPrecisionPreserved(concat);
|
||||
for (const std::shared_ptr<ngraph::Node>& child : children) {
|
||||
if (is_type<ngraph::opset1::Convolution>(child.get())) {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
void ConcatMultiChannelsTransformation::registerMatcherIn(GraphRewrite& pass, TransformationContext& context) const {
|
||||
addSingleNodePattern<opset1::Concat>(pass, context);
|
||||
}
|
||||
|
||||
bool ConcatMultiChannelsTransformation::transform(TransformationContext& context, ngraph::pattern::Matcher &m) const {
|
||||
std::shared_ptr<ngraph::opset1::Concat> concat = ngraph::as_type_ptr<ngraph::opset1::Concat>(m.get_match_root());
|
||||
if (!canBeTransformed(context, concat)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
ngraph::pass::low_precision::Subgraph subgraph(layerTransformationsManager);
|
||||
std::unordered_set<std::string> handledLayers;
|
||||
if (!subgraph.fillSubgraphForConcat(concat, handledLayers)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (subgraph.quantizationLayers.empty() || isHandled(context, subgraph.quantizationLayers)) {
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!isMultiChannel(subgraph.concatLayers)) {
|
||||
ConcatTransformation::transform(context, m);
|
||||
return false;
|
||||
}
|
||||
|
||||
DataPrecision dataPrecision;
|
||||
{
|
||||
for (auto quantizationLayer : subgraph.quantizationLayers) {
|
||||
std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(quantizationLayer->shared_from_this());
|
||||
const DataPrecision tmp = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
|
||||
|
||||
if (dataPrecision.precision == ngraph::element::undefined) {
|
||||
dataPrecision = tmp;
|
||||
continue;
|
||||
}
|
||||
|
||||
if ((tmp.precision != dataPrecision.precision) && (tmp.precision == ngraph::element::u8)) {
|
||||
dataPrecision = tmp;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
std::unordered_map<std::string, ngraph::pass::low_precision::FakeQuantizeDequantization> dequantizations;
|
||||
|
||||
for (size_t i = 0; i < subgraph.quantizationLayers.size(); ++i) {
|
||||
const std::shared_ptr<ngraph::Node>& fakeQuantizeLayer = subgraph.quantizationLayers[i];
|
||||
const ngraph::Shape shape = fakeQuantizeLayer->get_output_shape(0);
|
||||
if (shape.size() < 4ul) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const std::shared_ptr<ngraph::opset1::FakeQuantize> fq = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(fakeQuantizeLayer->shared_from_this());
|
||||
if (fq == nullptr) {
|
||||
return false;
|
||||
}
|
||||
|
||||
const DataPrecision currentDataPrecision = getDataPrecision(fq, QuantizationDetails::getDetails(fq), false);
|
||||
const QuantizationDetails quantizationDetails = QuantizationDetails::getDetails(fq);
|
||||
|
||||
// 1. get data for dequantization. Dequantization data will be used several times later.
|
||||
const FakeQuantizeDequantization fakeQuantizeDequantization = ngraph::pass::low_precision::NetworkHelper::createDequantizationFromFakeQuantize(
|
||||
fq,
|
||||
dataPrecision.precision,
|
||||
dataPrecision.min,
|
||||
dataPrecision.max,
|
||||
dataPrecision.precision == currentDataPrecision.precision ? currentDataPrecision.hasZeroPoint : true,
|
||||
updatePrecisions);
|
||||
dequantizations[fakeQuantizeLayer->get_friendly_name()] = fakeQuantizeDequantization;
|
||||
|
||||
// 2. update FakeQuantize - one time action
|
||||
const std::shared_ptr<opset1::FakeQuantize> newFakeQuantizeLayer = ngraph::pass::low_precision::NetworkHelper::updateFakeQuantize(
|
||||
fq,
|
||||
updatePrecisions ? dataPrecision.precision : fakeQuantizeLayer->get_output_element_type(0),
|
||||
roundf(dataPrecision.min),
|
||||
roundf(dataPrecision.max));
|
||||
|
||||
subgraph.quantizationLayers[i] = newFakeQuantizeLayer;
|
||||
subgraph.layers[fakeQuantizeLayer->get_friendly_name()] = newFakeQuantizeLayer;
|
||||
}
|
||||
|
||||
auto dequantizationValuesCallback = [&](
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
const std::string originalLayerName,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
|
||||
if (layer->get_friendly_name() != originalLayerName) {
|
||||
const auto update = [](
|
||||
const std::string& originalLayerName,
|
||||
const std::string& newLayerName,
|
||||
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationLayers) {
|
||||
auto it = dequantizationLayers.find(originalLayerName);
|
||||
if (it != dequantizationLayers.end()) {
|
||||
dequantizationLayers.emplace(newLayerName, it->second);
|
||||
dequantizationLayers.erase(it);
|
||||
}
|
||||
};
|
||||
update(originalLayerName, layer->get_friendly_name(), dequantizations);
|
||||
}
|
||||
|
||||
fillDequantization(
|
||||
layer,
|
||||
dequantizations,
|
||||
dequantizationsToConcatenate);
|
||||
};
|
||||
|
||||
addDequantizationLayers(context, subgraph, dequantizationValuesCallback);
|
||||
|
||||
if (updatePrecisions) {
|
||||
for (const auto it : subgraph.layers) {
|
||||
const std::shared_ptr<ngraph::Node> node = it.second;
|
||||
if (std::dynamic_pointer_cast<ngraph::op::TypeRelaxedBase>(node)) {
|
||||
ngraph::pass::low_precision::NetworkHelper::setOutDataPrecisionForTypeRelaxed(node->shared_from_this(), dataPrecision.precision);
|
||||
} else {
|
||||
// set precision to explicitly to have updated precision during transformation
|
||||
for (size_t i = 0; i < node->get_output_size(); ++i) {
|
||||
node->set_output_type(i, dataPrecision.precision, node->get_output_partial_shape(i));
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for (const std::shared_ptr<ngraph::Node>& quantizationLayer : subgraph.quantizationLayers) {
|
||||
context.quantizedFakeQuantizeNames.insert(quantizationLayer->get_friendly_name());
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
bool ConcatMultiChannelsTransformation::isPrecisionPreserved(std::shared_ptr<Node>) const noexcept {
|
||||
return true;
|
||||
}
|
||||
|
||||
// fill dequantizationsToMerge collection for layer with using dequantizationByFakeQuantize
|
||||
void ConcatMultiChannelsTransformation::fillDequantization(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize,
|
||||
std::vector<FakeQuantizeDequantization>& dequantizationsToConcatenate) {
|
||||
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>> fakeQuantizes;
|
||||
std::shared_ptr<ngraph::opset1::FakeQuantize> currentFakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(layer);
|
||||
if (currentFakeQuantize != nullptr) {
|
||||
fakeQuantizes.push_back(currentFakeQuantize);
|
||||
} else {
|
||||
fillQuantization(layer, fakeQuantizes);
|
||||
if (fakeQuantizes.size() == layer->get_input_size()) {
|
||||
updateDequantizationShapesIfNecessary(layer, fakeQuantizes, dequantizationByFakeQuantize);
|
||||
}
|
||||
}
|
||||
|
||||
for (const auto& fakeQuantize : fakeQuantizes) {
|
||||
const auto it = dequantizationByFakeQuantize.find(fakeQuantize->get_friendly_name());
|
||||
if (it == dequantizationByFakeQuantize.end()) {
|
||||
THROW_IE_LPT_EXCEPTION(*fakeQuantize) << "dequantization scale values are not found";
|
||||
}
|
||||
const FakeQuantizeDequantization& fakeQuantizeDequantization = it->second;
|
||||
dequantizationsToConcatenate.push_back(fakeQuantizeDequantization);
|
||||
}
|
||||
}
|
||||
|
||||
void ConcatMultiChannelsTransformation::updateDequantizationShapesIfNecessary(
|
||||
std::shared_ptr<ngraph::Node> layer,
|
||||
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes,
|
||||
std::unordered_map<std::string, FakeQuantizeDequantization>& dequantizationByFakeQuantize) {
|
||||
for (int i = 0; i < fakeQuantizes.size(); ++i) {
|
||||
ngraph::Shape inputShape = layer->get_input_shape(i);
|
||||
ngraph::Shape dequantizationShape = fakeQuantizes[i]->get_shape();
|
||||
if (inputShape[1] != dequantizationShape[1]) {
|
||||
FakeQuantizeDequantization replacedDequantization = dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()];
|
||||
|
||||
const float scale = as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.multiply->get_input_node_shared_ptr(1))->cast_vector<float>()[0];
|
||||
const float shift = replacedDequantization.subtract ?
|
||||
as_type_ptr<ngraph::opset1::Constant>(replacedDequantization.subtract->get_input_node_shared_ptr(1))->cast_vector<float>()[0] : 0.f;
|
||||
const auto precisionBefore = replacedDequantization.data.get_element_type();
|
||||
const auto precisionAfter = replacedDequantization.multiply->get_element_type();
|
||||
|
||||
auto newDequantization = ngraph::pass::low_precision::NetworkHelper::makeDequantization(
|
||||
scale, shift, precisionBefore, inputShape, precisionAfter, 0.f, 5.f);
|
||||
dequantizationByFakeQuantize[fakeQuantizes[i]->get_friendly_name()] = newDequantization;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
void ConcatMultiChannelsTransformation::fillQuantization(
|
||||
const std::shared_ptr<ngraph::Node> layer,
|
||||
std::vector<std::shared_ptr<ngraph::opset1::FakeQuantize>>& fakeQuantizes) {
|
||||
for (int i = 0; i < layer->get_input_size(); ++i) {
|
||||
std::shared_ptr<ngraph::Node> parent = layer->get_input_node_shared_ptr(i);
|
||||
std::shared_ptr<ngraph::opset1::FakeQuantize> fakeQuantize = ngraph::as_type_ptr<ngraph::opset1::FakeQuantize>(parent);
|
||||
if (fakeQuantize != nullptr) {
|
||||
fakeQuantizes.push_back(fakeQuantize);
|
||||
} else {
|
||||
fillQuantization(parent, fakeQuantizes);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace low_precision
|
||||
} // namespace pass
|
||||
} // namespace ngraph
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user