Files

Nikolay Tyukaev ef45b5da8d Doc Migration (master) (#1377 )

* Doc Migration from Gitlab (#1289)

* doc migration

* fix

* Update FakeQuantize_1.md

* Update performance_benchmarks.md

* Updates graphs for FPGA

* Update performance_benchmarks.md

* Change DL Workbench structure (#1)

* Changed DL Workbench structure

* Fixed tags

* fixes

* Update ie_docs.xml

* Update performance_benchmarks_faq.md

* Fixes in DL Workbench layout

* Fixes for CVS-31290

* [DL Workbench] Minor correction

* Fix for CVS-30955

* Added nGraph deprecation notice as requested by Zoe

* fix broken links in api doxy layouts

* CVS-31131 fixes

* Additional fixes

* Fixed POT TOC

* Update PAC_Configure.md

PAC DCP 1.2.1 install guide.

* Update inference_engine_intro.md

* fix broken link

* Update opset.md

* fix

* added opset4 to layout

* added new opsets to layout, set labels for them

* Update VisionAcceleratorFPGA_Configure.md

Updated from 2020.3 to 2020.4

Co-authored-by: domi2000 <domi2000@users.noreply.github.com>

2020-07-20 17:36:08 +03:00

4.6 KiB

Raw Blame History

Model Optimization Techniques

Optimization offers methods to accelerate inference with the convolution neural networks (CNN) that do not require model retraining.

Linear Operations Fusing

Many convolution neural networks includes BatchNormalization and ScaleShift layers (for example, Resnet*, Inception*) that can be presented as a sequence of linear operations: additions and multiplications. For example ScaleShift layer can be presented as Mul → Add sequence. These layers can be fused into previous Convolution or FullyConnected layers, except that case when Convolution comes after Add operation (due to Convolution paddings).

Usage

In the Model Optimizer, this optimization is turned on by default. To disable it, you can pass --disable_fusing parameter to the Model Optimizer.

Optimization Description

This optimization method consists of three stages:

BatchNormalization and ScaleShift decomposition: on this stage, BatchNormalization layer is decomposed to Mul → Add → Mul → Add sequence, and ScaleShift layer is decomposed to Mul → Add layers sequence.
Linear operations merge: on this stage we merge sequences of Mul and Add operations to the single Mul → Add instance.
For example, if we have BatchNormalization → ScaleShift sequence in our topology, it is replaced with Mul → Add (by the first stage). On the next stage, the latter will be replaced with ScaleShift layer in case if we have no available Convolution or FullyConnected layer to fuse into (next).
Linear operations fusion: on this stage, the tool fuses Mul and Add operations to Convolution or FullyConnected layers. Notice that it searches for Convolution and FullyConnected layers both backward and forward in the graph (except for Add operation that cannot be fused to Convolution layer in forward direction).

Usage Examples

The picture below shows the depicted part of Caffe* Resnet269 topology where BatchNorm and ScaleShift layers will be fused to Convolution layers.

ResNet optimization (stride optimization)

ResNet optimization is a specific optimization that applies to Caffe ResNet topologies such as ResNet50, ResNet101, ResNet152 and to ResNet-based topologies. This optimization is turned on by default, and can be disabled with the --disable_resnet_optimization key.

Optimization Description

On the picture below, you can see the original and optimized parts of a Caffe ResNet50 model. The main idea of this optimization is to move the stride that is greater than 1 from Convolution layers with the kernel size = 1 to upper Convolution layers. In addition, the Model Optimizer adds a Pooling layer to align the input shape for a Eltwise layer, if it was changed during the optimization.

In this example, the stride from the res3a_branch1 and res3a_branch2a Convolution layers moves to the res2c_branch2b Convolution layer. Also to align the input shape for res2c Eltwise, the optimization inserts the Pooling layer with kernel size = 1 and stride = 2.

Grouped Convolution Fusing

Grouped convolution fusing is a specific optimization that applies for TensorFlow* topologies. The main idea of this optimization is to combine convolutions results for the Split outputs and then recombine them using Concat operation in the same order as they were out from Split.

Disable Fusing

Model Optimizer allows to disable optimizations for specified nodes via --finegrain_fusing <node_name1>,<node_name2>,... (regex is also supported). Using this key, you mark nodes that will noy be touched by any optimizations.

Examples of usage

On the picture below you can see two visualized Intermediate Representations (IR) of TensorFlow InceptionV4 topology. The first one is original IR that will be produced by the Model Optimizer. The second one will be produced by the Model Optimizer with key --finegrain_fusing InceptionV4/InceptionV4/Conv2d_1a_3x3/Conv2D, where you can see that Convolution was not fused with Mul1_3752 and Mul1_4061/Fused_Mul_5096/FusedScaleShift_5987 operations.

4.6 KiB Raw Blame History

Model Optimization Techniques

Linear Operations Fusing

Usage

Optimization Description

Usage Examples

ResNet optimization (stride optimization)

Optimization Description

Grouped Convolution Fusing

Disable Fusing

Examples of usage

4.6 KiB

Raw Blame History