Compare commits
75 Commits
2023.2.0.d
...
2021.2
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
4795391b73 | ||
|
|
f507ac0364 | ||
|
|
3347f930cb | ||
|
|
4dfeba454b | ||
|
|
f128545a2a | ||
|
|
2376b3b1a4 | ||
|
|
ca6f8fc9cf | ||
|
|
c6f2f9739c | ||
|
|
7d58dd59b6 | ||
|
|
5e81820a5b | ||
|
|
1d8112bba6 | ||
|
|
08794e70fe | ||
|
|
99384c7846 | ||
|
|
fc1e019ff5 | ||
|
|
8b4a953993 | ||
|
|
99c746f2c3 | ||
|
|
6b8f778d46 | ||
|
|
3d48b7f078 | ||
|
|
5c683584fc | ||
|
|
cf3e15bed5 | ||
|
|
106d5653c2 | ||
|
|
39447e08e7 | ||
|
|
176bdf5137 | ||
|
|
f730bf9ae9 | ||
|
|
422127e33d | ||
|
|
1e1eafe09d | ||
|
|
d2e3e1fc00 | ||
|
|
2a57c7bbe4 | ||
|
|
10274c0908 | ||
|
|
cb71090389 | ||
|
|
0335cdbbc6 | ||
|
|
ff878ffe92 | ||
|
|
8f507a07f5 | ||
|
|
a09de25dd2 | ||
|
|
86f96a8f3a | ||
|
|
6966e383d3 | ||
|
|
ab1a3e8d84 | ||
|
|
29d9462a9b | ||
|
|
c067ed1b9a | ||
|
|
bff2fe0b2f | ||
|
|
fd532c2611 | ||
|
|
1d68caf96f | ||
|
|
7bd3738afd | ||
|
|
5767a54fa2 | ||
|
|
56326a3d2d | ||
|
|
1baf72a780 | ||
|
|
c22e65e8a6 | ||
|
|
3964168205 | ||
|
|
a416a80c86 | ||
|
|
81152e07f3 | ||
|
|
a14fdef9b6 | ||
|
|
ce3bea3c07 | ||
|
|
707a0833d8 | ||
|
|
71cde81827 | ||
|
|
f4ed2572a5 | ||
|
|
ec02bc4dbd | ||
|
|
5a05e52d40 | ||
|
|
fee727f415 | ||
|
|
01c3951887 | ||
|
|
7715d033d2 | ||
|
|
1da3539bc1 | ||
|
|
ec8527f21e | ||
|
|
1c35875a60 | ||
|
|
1e392b0180 | ||
|
|
a05b7c76b2 | ||
|
|
3a9c731c2b | ||
|
|
641f1425df | ||
|
|
e67dffa6a3 | ||
|
|
7b85eef15f | ||
|
|
ea0f6fce5a | ||
|
|
6959f6671f | ||
|
|
322bc3498b | ||
|
|
6effe67358 | ||
|
|
759c91d547 | ||
|
|
5abbe2fec5 |
@@ -1,5 +1,5 @@
|
||||
# [OpenVINO™ Toolkit](https://01.org/openvinotoolkit) - Deep Learning Deployment Toolkit repository
|
||||
[](https://github.com/openvinotoolkit/openvino/releases/tag/2021.1)
|
||||
[](https://github.com/openvinotoolkit/openvino/releases/tag/2021.2)
|
||||
[](LICENSE)
|
||||

|
||||
|
||||
|
||||
@@ -132,14 +132,6 @@ function(build_docs)
|
||||
COMMAND ${Python3_EXECUTABLE} ${PYX_FILTER} ${PYTHON_API_OUT}
|
||||
COMMENT "Pre-process Python API")
|
||||
|
||||
# Plugin API
|
||||
|
||||
add_custom_target(plugin_api
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${PLUGIN_CONFIG_BINARY}
|
||||
WORKING_DIRECTORY ${DOCS_BINARY_DIR}
|
||||
COMMENT "Generating Plugin API Reference"
|
||||
VERBATIM)
|
||||
|
||||
# Preprocess docs
|
||||
|
||||
add_custom_target(preprocess_docs
|
||||
@@ -165,6 +157,15 @@ function(build_docs)
|
||||
WORKING_DIRECTORY ${DOCS_BINARY_DIR}
|
||||
VERBATIM)
|
||||
|
||||
# Plugin API
|
||||
|
||||
add_custom_target(plugin_api
|
||||
DEPENDS ie_docs
|
||||
COMMAND ${DOXYGEN_EXECUTABLE} ${PLUGIN_CONFIG_BINARY}
|
||||
WORKING_DIRECTORY ${DOCS_BINARY_DIR}
|
||||
COMMENT "Generating Plugin API Reference"
|
||||
VERBATIM)
|
||||
|
||||
# Umbrella OpenVINO target
|
||||
|
||||
add_custom_target(openvino_docs
|
||||
|
||||
@@ -1,200 +1,362 @@
|
||||
# Custom Layers Guide {#openvino_docs_HOWTO_Custom_Layers_Guide}
|
||||
# Custom Operations Guide {#openvino_docs_HOWTO_Custom_Layers_Guide}
|
||||
|
||||
The Intel® Distribution of OpenVINO™ toolkit supports neural network model layers in multiple frameworks including TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX*. The list of known layers is different for each of the supported frameworks. To see the layers supported by your framework, refer to [supported frameworks](../MO_DG/prepare_model/Supported_Frameworks_Layers.md).
|
||||
The Intel® Distribution of OpenVINO™ toolkit supports neural network models trained with multiple frameworks including
|
||||
TensorFlow*, Caffe*, MXNet*, Kaldi* and ONNX* file format. The list of supported operations (layers) is different for
|
||||
each of the supported frameworks. To see the operations supported by your framework, refer to
|
||||
[Supported Framework Layers](../MO_DG/prepare_model/Supported_Frameworks_Layers.md).
|
||||
|
||||
Custom layers are layers that are not included in the list of known layers. If your topology contains any layers that are not in the list of known layers, the Model Optimizer classifies them as custom.
|
||||
Custom operations are operations that are not included in the list of known operations. If your model contains any
|
||||
operation that is not in the list of known operations, the Model Optimizer is not able to generate an Intermediate
|
||||
Representation (IR) for this model.
|
||||
|
||||
This guide illustrates the workflow for running inference on topologies featuring custom layers, allowing you to plug in your own implementation for existing or completely new layers.
|
||||
For a step-by-step example of creating and executing a custom layer, see the [Custom Layer Implementation Tutorials for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0)
|
||||
This guide illustrates the workflow for running inference on topologies featuring custom operations, allowing you to
|
||||
plug in your own implementation for existing or completely new operation.
|
||||
|
||||
## Terms used in this guide
|
||||
> **NOTE:** *Layer* — The legacy term for an *operation* which came from Caffe\* framework. Currently it is not used.
|
||||
> Refer to the [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md)
|
||||
> for more information on the topic.
|
||||
|
||||
- *Layer* — The abstract concept of a math function that is selected for a specific purpose (relu, sigmoid, tanh, convolutional). This is one of a sequential series of building blocks within the neural network.
|
||||
- *Kernel* — The implementation of a layer function, in this case, the math programmed (in C++ and Python) to perform the layer operation for target hardware (CPU or GPU).
|
||||
- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the different frameworks and describing topology, layer parameters and weights.
|
||||
The original format will be a supported framework such as TensorFlow, Caffe, or MXNet.
|
||||
## Terms Used in This Guide
|
||||
|
||||
- *Model Extension Generator* — Generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine.
|
||||
- *Intermediate Representation (IR)* — Neural Network used only by the Inference Engine in OpenVINO abstracting the
|
||||
different frameworks and describing the model topology, operations parameters and weights.
|
||||
|
||||
- *Operation* — The abstract concept of a math function that is selected for a specific purpose. Operations supported by
|
||||
OpenVINO™ are listed in the supported operation set provided in the [Available Operations Sets](../ops/opset.md).
|
||||
Examples of the operations are: [ReLU](../ops/activation/ReLU_1.md), [Convolution](../ops/convolution/Convolution_1.md),
|
||||
[Add](../ops/arithmetic/Add_1.md), etc.
|
||||
|
||||
- *Kernel* — The implementation of a operation function in the OpenVINO™ plugin, in this case, the math programmed (in
|
||||
C++ and OpenCL) to perform the operation for a target hardware (CPU or GPU).
|
||||
|
||||
- *Inference Engine Extension* — Device-specific module implementing custom operations (a set of kernels).
|
||||
|
||||
## Custom Operation Support Overview
|
||||
|
||||
There are three steps to support inference of a model with custom operation(s):
|
||||
1. Add support for a custom operation in the [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) so
|
||||
the Model Optimizer can generate the IR with the operation.
|
||||
2. Create an operation set and implement a custom nGraph operation in it as described in the
|
||||
[Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md).
|
||||
3. Implement a customer operation in one of the [Inference Engine](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md)
|
||||
plugins to support inference of this operation using a particular target hardware (CPU, GPU or VPU).
|
||||
|
||||
To see the operations that are supported by each device plugin for the Inference Engine, refer to the
|
||||
[Supported Devices](../IE_DG/supported_plugins/Supported_Devices.md).
|
||||
|
||||
> **NOTE:** If a device doesn't support a particular operation, an alternative to creating a new operation is to target
|
||||
> an additional device using the HETERO plugin. The [Heterogeneous Plugin](../IE_DG/supported_plugins/HETERO.md) may be
|
||||
> used to run an inference model on multiple devices allowing the unsupported operations on one device to "fallback" to
|
||||
> run on another device (e.g., CPU) that does support those operations.
|
||||
|
||||
### Custom Operation Support for the Model Optimizer
|
||||
|
||||
Model Optimizer model conversion pipeline is described in details in "Model Conversion Pipeline" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
|
||||
It is recommended to read that article first for a better understanding of the following material.
|
||||
|
||||
Model Optimizer provides extensions mechanism to support new operations and implement custom model transformations to
|
||||
generate optimized IR. This mechanism is described in the "Model Optimizer Extensions" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
|
||||
|
||||
Two types of the Model Optimizer extensions should be implemented to support custom operation at minimum:
|
||||
1. Operation class for a new operation. This class stores information about the operation, its attributes, shape
|
||||
inference function, attributes to be saved to an IR and some others internally used attributes. Refer to the
|
||||
"Model Optimizer Operation" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
|
||||
detailed instruction on how to implement it.
|
||||
2. Operation attributes extractor. The extractor is responsible for parsing framework-specific representation of the
|
||||
operation and uses corresponding operation class to update graph node attributes with necessary attributes of the
|
||||
operation. Refer to the "Operation Extractor" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
|
||||
detailed instruction on how to implement it.
|
||||
|
||||
> **NOTE:** In some cases you may need to implement some transformation to support the operation. This topic is covered
|
||||
> in the "Graph Transformation Extensions" section on the
|
||||
> [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md).
|
||||
|
||||
## Custom Operations Extensions for the Inference Engine
|
||||
|
||||
Inference Engine provides extensions mechanism to support new operations. This mechanism is described in the
|
||||
[Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md).
|
||||
|
||||
Each device plugin includes a library of optimized implementations to execute known operations which must be extended to
|
||||
execute a custom operation. The custom operation extension is implemented according to the target device:
|
||||
|
||||
- Custom Operation CPU Extension
|
||||
- A compiled shared library (`.so`, `.dylib` or `.dll`) needed by the CPU Plugin for executing the custom operation
|
||||
on a CPU. Refer to the [How to Implement Custom CPU Operations](../IE_DG/Extensibility_DG/CPU_Kernel.md) for more
|
||||
details.
|
||||
- Custom Operation GPU Extension
|
||||
- OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the GPU along with a
|
||||
operation description file (.xml) needed by the GPU Plugin for the custom operation kernel. Refer to the
|
||||
[How to Implement Custom GPU Operations](../IE_DG/Extensibility_DG/GPU_Kernel.md) for more details.
|
||||
- Custom Operation VPU Extension
|
||||
- OpenCL source code (.cl) for the custom operation kernel that will be compiled to execute on the VPU along with a
|
||||
operation description file (.xml) needed by the VPU Plugin for the custom operation kernel. Refer to the
|
||||
[How to Implement Custom Operations for VPU](../IE_DG/Extensibility_DG/VPU_Kernel.md) for more details.
|
||||
|
||||
Also, it is necessary to implement nGraph custom operation according to the
|
||||
[Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md) so the Inference Engine can read an IR with this
|
||||
operation and correctly infer output tensors shape and type.
|
||||
|
||||
## Enabling Magnetic Resonance Image Reconstruction Model
|
||||
This chapter provides a step-by-step instruction on how to enable the magnetic resonance image reconstruction model
|
||||
implemented in the [repository](https://github.com/rmsouza01/Hybrid-CS-Model-MRI/) using a custom operation on CPU. The
|
||||
example is prepared for a model generated from the repository with hash `2ede2f96161ce70dcdc922371fe6b6b254aafcc8`.
|
||||
|
||||
### Download and Convert the Model to a Frozen TensorFlow\* Model Format
|
||||
The original pre-trained model is provided in the hdf5 format which is not supported by OpenVINO directly and needs to
|
||||
be converted to TensorFlow\* frozen model format first.
|
||||
|
||||
1. Download repository `https://github.com/rmsouza01/Hybrid-CS-Model-MRI`:<br
|
||||
```bash
|
||||
git clone https://github.com/rmsouza01/Hybrid-CS-Model-MRI
|
||||
git checkout 2ede2f96161ce70dcdc922371fe6b6b254aafcc8
|
||||
```
|
||||
|
||||
2. Convert pre-trained `.hdf5` to a frozen `.pb` graph using the following script (tested with TensorFlow==1.15.0 and
|
||||
Keras==2.2.4) which should be executed from the root of the cloned repository:<br>
|
||||
```py
|
||||
import keras as K
|
||||
import numpy as np
|
||||
import Modules.frequency_spatial_network as fsnet
|
||||
import tensorflow as tf
|
||||
|
||||
under_rate = '20'
|
||||
|
||||
stats = np.load("Data/stats_fs_unet_norm_" + under_rate + ".npy")
|
||||
var_sampling_mask = np.load("Data/sampling_mask_" + under_rate + "perc.npy")
|
||||
|
||||
model = fsnet.wnet(stats[0], stats[1], stats[2], stats[3], kshape = (5,5), kshape2=(3,3))
|
||||
model_name = "Models/wnet_" + under_rate + ".hdf5"
|
||||
model.load_weights(model_name)
|
||||
|
||||
inp = np.random.standard_normal([1, 256, 256, 2]).astype(np.float32)
|
||||
np.save('inp', inp)
|
||||
|
||||
sess = K.backend.get_session()
|
||||
sess.as_default()
|
||||
graph_def = sess.graph.as_graph_def()
|
||||
graph_def = tf.graph_util.convert_variables_to_constants(sess, graph_def, ['conv2d_44/BiasAdd'])
|
||||
with tf.gfile.FastGFile('wnet_20.pb', 'wb') as f:
|
||||
f.write(graph_def.SerializeToString())
|
||||
```
|
||||
|
||||
- *Inference Engine Extension* — Device-specific module implementing custom layers (a set of kernels).
|
||||
As a result the TensorFlow\* frozen model file "wnet_20.pb" is generated.
|
||||
|
||||
### Convert the Frozen TensorFlow\* Model to Intermediate Representation
|
||||
|
||||
## Custom Layer Overview
|
||||
|
||||
The [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) searches the list of known layers for each layer contained in the input model topology before building the model's internal representation, optimizing the model, and producing the Intermediate Representation files.
|
||||
|
||||
The [Inference Engine](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md) loads the layers from the input model IR files into the specified device plugin, which will search a list of known layer implementations for the device. If your topology contains layers that are not in the list of known layers for the device, the Inference Engine considers the layer to be unsupported and reports an error. To see the layers that are supported by each device plugin for the Inference Engine, refer to the [Supported Devices](../IE_DG/supported_plugins/Supported_Devices.md) documentation.
|
||||
<br>
|
||||
> **NOTE:** If a device doesn't support a particular layer, an alternative to creating a new custom layer is to target an additional device using the HETERO plugin. The [Heterogeneous Plugin](../IE_DG/supported_plugins/HETERO.md) may be used to run an inference model on multiple devices allowing the unsupported layers on one device to "fallback" to run on another device (e.g., CPU) that does support those layers.
|
||||
|
||||
## Custom Layer Implementation Workflow
|
||||
|
||||
When implementing a custom layer for your pre-trained model in the Intel® Distribution of OpenVINO™ toolkit, you will need to add extensions to both the Model Optimizer and the Inference Engine.
|
||||
|
||||
## Custom Layer Extensions for the Model Optimizer
|
||||
|
||||
The following figure shows the basic processing steps for the Model Optimizer highlighting the two necessary custom layer extensions, the Custom Layer Extractor and the Custom Layer Operation.
|
||||
|
||||

|
||||
|
||||
|
||||
The Model Optimizer first extracts information from the input model which includes the topology of the model layers along with parameters, input and output format, etc., for each layer. The model is then optimized from the various known characteristics of the layers, interconnects, and data flow which partly comes from the layer operation providing details including the shape of the output for each layer. Finally, the optimized model is output to the model IR files needed by the Inference Engine to run the model.
|
||||
|
||||
The Model Optimizer starts with a library of known extractors and operations for each [supported model framework](../MO_DG/prepare_model/Supported_Frameworks_Layers.md) which must be extended to use each unknown custom layer. The custom layer extensions needed by the Model Optimizer are:
|
||||
|
||||
- Custom Layer Extractor
|
||||
- Responsible for identifying the custom layer operation and extracting the parameters for each instance of the custom layer. The layer parameters are stored per instance and used by the layer operation before finally appearing in the output IR. Typically the input layer parameters are unchanged, which is the case covered by this tutorial.
|
||||
- Custom Layer Operation
|
||||
- Responsible for specifying the attributes that are supported by the custom layer and computing the output shape for each instance of the custom layer from its parameters. <br> The `--mo-op` command-line argument shown in the examples below generates a custom layer operation for the Model Optimizer.
|
||||
|
||||
## Custom Layer Extensions for the Inference Engine
|
||||
|
||||
The following figure shows the basic flow for the Inference Engine highlighting two custom layer extensions for the CPU and GPU Plugins, the Custom Layer CPU extension and the Custom Layer GPU Extension.
|
||||
|
||||

|
||||
|
||||
Each device plugin includes a library of optimized implementations to execute known layer operations which must be extended to execute a custom layer. The custom layer extension is implemented according to the target device:
|
||||
|
||||
- Custom Layer CPU Extension
|
||||
- A compiled shared library (.so or .dll binary) needed by the CPU Plugin for executing the custom layer on the CPU.
|
||||
- Custom Layer GPU Extension
|
||||
- OpenCL source code (.cl) for the custom layer kernel that will be compiled to execute on the GPU along with a layer description file (.xml) needed by the GPU Plugin for the custom layer kernel.
|
||||
|
||||
## Model Extension Generator
|
||||
|
||||
Using answers to interactive questions or a *.json* configuration file, the Model Extension Generator tool generates template source code files for each of the extensions needed by the Model Optimizer and the Inference Engine. To complete the implementation of each extension, the template functions may need to be edited to fill-in details specific to the custom layer or the actual custom layer functionality itself.
|
||||
|
||||
### Command-line
|
||||
|
||||
The Model Extension Generator is included in the Intel® Distribution of OpenVINO™ toolkit installation and is run using the command (here with the "--help" option):
|
||||
Firstly, open the model in the TensorBoard or other TensorFlow* model visualization tool. The model supports dynamic
|
||||
batch dimension because the value for the batch dimension is not hardcoded in the model. Model Optimizer need to set all
|
||||
dynamic dimensions to some specific value to create the IR, therefore specify the command line parameter `-b 1` to set
|
||||
the batch dimension equal to 1. The actual batch size dimension can be changed at runtime using the Inference Engine API
|
||||
described in the [Using Shape Inference](../IE_DG/ShapeInference.md). Also refer to
|
||||
[Converting a Model Using General Conversion Parameters](../MO_DG/prepare_model/convert_model/Converting_Model_General.md)
|
||||
and [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md)
|
||||
for more details and command line parameters used for the model conversion.
|
||||
|
||||
```bash
|
||||
python3 /opt/intel/openvino/deployment_tools/tools/extension_generator/extgen.py new --help
|
||||
./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1
|
||||
```
|
||||
|
||||
where the output will appear similar to:
|
||||
|
||||
```
|
||||
usage: You can use any combination of the following arguments:
|
||||
|
||||
Arguments to configure extension generation in the interactive mode:
|
||||
|
||||
optional arguments:
|
||||
-h, --help show this help message and exit
|
||||
--mo-caffe-ext generate a Model Optimizer Caffe* extractor
|
||||
--mo-mxnet-ext generate a Model Optimizer MXNet* extractor
|
||||
--mo-tf-ext generate a Model Optimizer TensorFlow* extractor
|
||||
--mo-op generate a Model Optimizer operation
|
||||
--ie-cpu-ext generate an Inference Engine CPU extension
|
||||
--ie-gpu-ext generate an Inference Engine GPU extension
|
||||
--output_dir OUTPUT_DIR
|
||||
set an output directory. If not specified, the current
|
||||
directory is used by default.
|
||||
Model Optimizer produces the following error:
|
||||
```bash
|
||||
[ ERROR ] List of operations that cannot be converted to Inference Engine IR:
|
||||
[ ERROR ] Complex (1)
|
||||
[ ERROR ] lambda_2/Complex
|
||||
[ ERROR ] IFFT2D (1)
|
||||
[ ERROR ] lambda_2/IFFT2D
|
||||
[ ERROR ] ComplexAbs (1)
|
||||
[ ERROR ] lambda_2/Abs
|
||||
[ ERROR ] Part of the nodes was not converted to IR. Stopped.
|
||||
```
|
||||
|
||||
The available command-line arguments are used to specify which extension(s) to generate templates for the Model Optimizer or Inference Engine. The generated extension files for each argument will appear starting from the top of the output directory as follows:
|
||||
The error means that the Model Optimizer doesn't know how to handle 3 types of TensorFlow\* operations: "Complex",
|
||||
"IFFT2D" and "ComplexAbs". In order to see more details about the conversion process run the model conversion with
|
||||
additional parameter `--log_level DEBUG`. It is worth to mention the following lines from the detailed output:
|
||||
|
||||
Command-line Argument | Output Directory Location |
|
||||
--------------------- | ------------------------------ |
|
||||
`--mo-caffe-ext` | user_mo_extensions/front/caffe |
|
||||
`--mo-mxnet-ext` | user_mo_extensions/front/mxnet |
|
||||
`--mo-tf-ext` | user_mo_extensions/front/tf |
|
||||
`--mo-op` | user_mo_extensions/ops |
|
||||
`--ie-cpu-ext` | user_ie_extensions/cpu |
|
||||
`--ie-gpu-ext` | user_ie_extensions/gpu |
|
||||
```bash
|
||||
[ INFO ] Called "tf_native_tf_node_infer" for node "lambda_2/Complex"
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:228 ] Added placeholder with name 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_3/strided_slice' with input 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '0' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_3/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:241 ] update_input_in_pbs: replace input 'lambda_2/lambda_4/strided_slice' with input 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:249 ] Replacing input '1' of the node 'lambda_2/Complex' with placeholder 'lambda_2/lambda_4/strided_slice_port_0_ie_placeholder'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ tf:148 ] Inferred shape of the output tensor with index '0' of the node 'lambda_2/Complex': '[ 1 256 256]'
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:145 ] Outputs:
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] output[0]: shape = [ 1 256 256], value = <UNKNOWN>
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:129 ] --------------------
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:130 ] Partial infer for lambda_2/IFFT2D
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:131 ] Op: IFFT2D
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:132 ] Inputs:
|
||||
[ <TIMESTAMP> ] [ DEBUG ] [ infer:32 ] input[0]: shape = [ 1 256 256], value = <UNKNOWN>
|
||||
```
|
||||
|
||||
### Extension Workflow
|
||||
This is a part of the log of the partial inference phase of the model conversion. See the "Partial Inference" section on
|
||||
the [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for
|
||||
more information about this phase. Model Optimizer inferred output shape for the unknown operation of type "Complex"
|
||||
using a "fallback" to TensorFlow\*. However, it is not enough to generate the IR because Model Optimizer doesn't know
|
||||
which attributes of the operation should be saved to IR. So it is necessary to implement Model Optimizer extensions to
|
||||
support these operations.
|
||||
|
||||
The workflow for each generated extension follows the same basic steps:
|
||||
Before going into the extension development it is necessary to understand what these unsupported operations do according
|
||||
to the TensorFlow\* framework specification.
|
||||
|
||||

|
||||
* "Complex" - returns a tensor of complex type constructed from two real input tensors specifying real and imaginary
|
||||
part of a complex number.
|
||||
* "IFFT2D" - returns a tensor with inverse 2-dimensional discrete Fourier transform over the inner-most 2 dimensions of
|
||||
an input.
|
||||
* "ComplexAbs" - returns a tensor with absolute values of input tensor with complex numbers.
|
||||
|
||||
**Step 1: Generate:** Use the Model Extension Generator to generate the Custom Layer Template Files.
|
||||
The part of the model with all three unsupported operations is depicted below:
|
||||
|
||||
**Step 2: Edit:** Edit the Custom Layer Template Files as necessary to create the specialized Custom Layer Extension Source Code.
|
||||

|
||||
|
||||
**Step 3: Specify:** Specify the custom layer extension locations to be used by the Model Optimizer or Inference Engine.
|
||||
This model uses complex numbers during the inference but Inference Engine does not support tensors of this data type. So
|
||||
it is necessary to find a way how to avoid using tensors of such a type in the model. Fortunately, the complex tensor
|
||||
appear as a result of "Complex" operation, is used as input in the "IFFT2D" operation then is passed to "ComplexAbs"
|
||||
which produces real value tensor as output. So there are just 3 operations consuming/producing complex tensors in the
|
||||
model.
|
||||
|
||||
## Caffe\* Models with Custom Layers <a name="caffe-models-with-custom-layers"></a>
|
||||
Let's design an OpenVINO operation "FFT" which get a single real number tensor describing the complex number and
|
||||
produces a single real number tensor describing output complex tensor. This way the fact that the model uses complex
|
||||
numbers is hidden inside the "FFT" operation implementation. The operation gets a tensor of shape `[N, H, W, 2]` and
|
||||
produces the output tensor with the same shape, where the innermost dimension contains pairs of real numbers describing
|
||||
the complex number (its real and imaginary part). As we will see further this operation will allow us to support the
|
||||
model. The implementation of the Model Optimizer operation should be saved to `mo_extensions/ops/FFT.py` file:
|
||||
|
||||
If your Caffe\* model has custom layers:
|
||||
@snippet FFT.py fft:operation
|
||||
|
||||
**Register the custom layers as extensions to the Model Optimizer**. For instructions, see [Extending Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You will need a bit of Python\* code that lets the Model Optimizer;
|
||||
The attribute `inverse` is a flag specifying type of the FFT to apply: forward or inverse.
|
||||
|
||||
- Generate a valid Intermediate Representation according to the rules you specified.
|
||||
- Be independent from the availability of Caffe on your computer.
|
||||
|
||||
If your model contains Custom Layers, it is important to understand the internal workflow of the Model Optimizer. Consider the following example.
|
||||
See the "Model Optimizer Operation" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for the
|
||||
detailed instruction on how to implement the operation.
|
||||
|
||||
**Example**:
|
||||
Now it is necessary to implement extractor for the "IFFT2D" operation according to the
|
||||
"Operation Extractor" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md). The
|
||||
following snippet provides two extractors: one for "IFFT2D", another one for "FFT2D", however only on of them is used
|
||||
in this example. The implementation should be saved to the file `mo_extensions/front/tf/FFT_ext.py`.
|
||||
|
||||
The network has:
|
||||
@snippet FFT_ext.py fft_ext:extractor
|
||||
|
||||
* One input layer (#1)
|
||||
* One output Layer (#5)
|
||||
* Three internal layers (#2, 3, 4)
|
||||
> **NOTE:** The graph is in inconsistent state after extracting node attributes because according to original operation
|
||||
> "IFFT2D" semantic it should have an input consuming a tensor of complex numbers, but the extractor instantiated an
|
||||
> operation "FFT" which expects a real tensor with specific layout. But the inconsistency will be resolved during
|
||||
> applying front phase transformations discussed below.
|
||||
|
||||
The custom and standard layer types are:
|
||||
The output shape of the operation "AddV2" from the picture above is `[N, H, W, 2]`. Where the innermost dimension
|
||||
contains pairs of real numbers describing the complex number (its real and imaginary part). The following "StridedSlice"
|
||||
operations split the input tensor into 2 parts to get a tensor of real and a tensor of imaginary parts which are then
|
||||
consumed with the "Complex" operation to produce a tensor of complex numbers. These "StridedSlice" and "Complex"
|
||||
operations can be removed so the "FFT" operation will get a real value tensor encoding complex numbers. To achieve this
|
||||
we implement the front phase transformation which searches for a pattern of two "StridedSlice" operations with specific
|
||||
attributes producing data to "Complex" operation and removes it from the graph. Refer to the
|
||||
"Pattern-Defined Front Phase Transformations" section on the
|
||||
[Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) for more
|
||||
information on how this type of transformation works. The code snippet should be saved to the file
|
||||
`mo_extensions/front/tf/Complex.py`.
|
||||
|
||||
* Layers #2 and #5 are implemented as Model Optimizer extensions.
|
||||
* Layers #1 and #4 are supported in Model Optimizer out-of-the box.
|
||||
* Layer #3 is neither in the list of supported layers nor in extensions, but is specified in CustomLayersMapping.xml.
|
||||
@snippet Complex.py complex:transformation
|
||||
|
||||
> **NOTE**: If any of the layers are not in one of three categories described above, the Model Optimizer fails with an appropriate message and a link to the corresponding question in [Model Optimizer FAQ](../MO_DG/prepare_model/Model_Optimizer_FAQ.md).
|
||||
> **NOTE:** The graph is in inconsistent state because the "ComplexAbs" operation consumes complex value tensor but
|
||||
> "FFT" produces real value tensor.
|
||||
|
||||
The general process is as shown:
|
||||
Now lets implement a transformation which replace a "ComplexAbs" operation with a sub-graph of primitive operations
|
||||
which calculate the result using the following formulae: \f$module(z) = \sqrt{real(z) \cdot real(z) + imag(z) \cdot imag(z)}\f$.
|
||||
Original "IFFT2D" operation produces tensor of complex values, but the "FFT" operation produces a real value tensor with
|
||||
the same format and shape as the input for the operation. So the input shape for the "ComplexAbs" will be `[N, H, W, 2]`
|
||||
with the innermost dimension containing tuple with real and imaginary part of a complex number. In order to calculate
|
||||
absolute values for the complex tensor we do the following:
|
||||
1. Raise all elements in the power of 2.
|
||||
2. Calculate a reduced sum over the innermost dimension.
|
||||
3. Calculate a square root.
|
||||
|
||||

|
||||
<br>
|
||||
The implementation should be saved to the file `mo_extensions/front/tf/ComplexAbs.py` and provided below:
|
||||
|
||||
**Step 1:** The example model is fed to the Model Optimizer that **loads the model** with the special parser built on top of the `caffe.proto` file. In case of failure, the Model Optimizer asks you to prepare the parser that can read the model. For more information, refer to the Model Optimizer, <a href="MO_FAQ.html#FAQ1">FAQ #1</a>.
|
||||
@snippet ComplexAbs.py complex_abs:transformation
|
||||
|
||||
**Step 2:** The Model Optimizer **extracts the attributes of all layers** by going through the list of layers and attempting to find the appropriate extractor. In order of priority, the Model Optimizer checks if the layer is:
|
||||
|
||||
* A. Registered as a Model Optimizer extension
|
||||
* B. Registered as a standard Model Optimizer layer
|
||||
|
||||
When the Model Optimizer finds a satisfying condition from the list above, it extracts the attributes according to the following rules:
|
||||
|
||||
* For A. - takes only the parameters specified in the extension
|
||||
* For B. - takes only the parameters specified in the standard extractor
|
||||
<br>
|
||||
Now it is possible to convert the model using the following command line:
|
||||
```bash
|
||||
./<MO_INSTALL_DIR>/mo.py --input_model <PATH_TO_MODEL>/wnet_20.pb -b 1 --extensions mo_extensions/
|
||||
```
|
||||
|
||||
**Step 3:** The Model Optimizer **calculates the output shape of all layers**. The logic is the same as it is for the priorities. **Important:** the Model Optimizer always takes the first available option.
|
||||
The sub-graph corresponding to the originally non-supported one is depicted on the image below:
|
||||
|
||||
**Step 4:** The Model Optimizer **optimizes the original model and produces the two Intermediate Representation (IR) files in .xml and .bin**.
|
||||
<br>
|
||||

|
||||
|
||||
## TensorFlow\* Models with Custom Layers <a name="Tensorflow-models-with-custom-layers"></a>
|
||||
> **NOTE:** Model Optimizer performed conversion of the model from NHWC to NCHW layout that is why the dimension with
|
||||
> the value 2 moved to another position.
|
||||
|
||||
You have two options for TensorFlow\* models with custom layers:
|
||||
<br>
|
||||
### Inference Engine Extension Implementation
|
||||
Now it is necessary to implement the extension for the CPU plugin with operation "FFT" introduced previously. The code
|
||||
below is based on the template extension described on the
|
||||
[Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md).
|
||||
|
||||
* **Register those layers as extensions to the Model Optimizer.** In this case, the Model Optimizer generates a valid and optimized Intermediate Representation.
|
||||
* **If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option.** This feature is helpful for many TensorFlow models. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md).
|
||||
|
||||
## MXNet\* Models with Custom Layers <a name="mxnet-models-with-custom-layers"></a>
|
||||
#### CMake Build File
|
||||
The first step is to create a CMake configuration file which builds the extension. The content of the "CMakeLists.txt"
|
||||
file is the following:
|
||||
|
||||
There are two options to convert your MXNet* model that contains custom layers:
|
||||
@snippet ie_cpu_extension/CMakeLists.txt fft_cmake_list:cmake
|
||||
|
||||
1. Register the custom layers as extensions to the Model Optimizer. For instructions, see [Extending MXNet Model Optimizer with New Primitives](../MO_DG/prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md). When your custom layers are registered as extensions, the Model Optimizer generates a valid and optimized Intermediate Representation. You can create Model Optimizer extensions for both MXNet layers with op `Custom` and layers which are not standard MXNet layers.
|
||||
The CPU FFT kernel implementation uses OpenCV to perform the FFT that is why the extension library is linked with
|
||||
"opencv_core" which comes with the OpenVINO.
|
||||
|
||||
2. If you have sub-graphs that should not be expressed with the analogous sub-graph in the Intermediate Representation, but another sub-graph should appear in the model, the Model Optimizer provides such an option. In MXNet the function is actively used for ssd models provides an opportunity to for the necessary subgraph sequences and replace them. To read more, see [Sub-graph Replacement in the Model Optimizer](../MO_DG/prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md).
|
||||
#### Custom nGraph Operation "FFT" Implementation
|
||||
The next step is to create the nGraph operation FFT. The header file "fft_op.hpp" has the following content:
|
||||
|
||||
## Kaldi\* Models with Custom Layers <a name="Kaldi-models-with-custom-layers"></a>
|
||||
For information on converting your Kaldi* model containing custom layers see [Converting a Kaldi Model in the Model Optimizer Developer Guide](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md).
|
||||
@snippet fft_op.hpp fft_op:header
|
||||
|
||||
## ONNX\* Models with Custom Layers <a name="ONNX-models-with-custom-layers"></a>
|
||||
For information on converting your ONNX* model containing custom layers see [Converting an ONNX Model in the Model Optimizer Developer Guide](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md).
|
||||
The operation has just one boolean attribute `inverse`. Implementation of the necessary nGraph operation functions are
|
||||
in the "fft_op.cpp" file with the following content:
|
||||
|
||||
## Step-by-Step Custom Layers Tutorial
|
||||
For a step-by-step walk-through creating and executing a custom layer, see [Custom Layer Implementation Tutorial for Linux and Windows.](https://github.com/david-drew/OpenVINO-Custom-Layers/tree/master/2019.r2.0)
|
||||
@snippet fft_op.cpp fft_op:implementation
|
||||
|
||||
Refer to the [Custom nGraph Operation](../IE_DG/Extensibility_DG/AddingNGraphOps.md) for more details.
|
||||
|
||||
#### CPU FFT Kernel Implementation
|
||||
The operation implementation for CPU plugin uses OpenCV to perform the FFT. The header file "fft_kernel.hpp" has the
|
||||
following content:
|
||||
|
||||
@snippet fft_kernel.hpp fft_kernel:header
|
||||
|
||||
The "fft_kernel.cpp" with the implementation of the CPU has the following content:
|
||||
|
||||
@snippet fft_kernel.cpp fft_kernel:implementation
|
||||
|
||||
Refer to the [How to Implement Custom CPU Operations](../IE_DG/Extensibility_DG/CPU_Kernel.md) for more details.
|
||||
|
||||
### Building and Running the Custom Extension
|
||||
In order to build the extension run the following:<br>
|
||||
```bash
|
||||
mkdir build && cd build
|
||||
source /opt/intel/openvino/bin/setupvars.sh
|
||||
cmake .. -DCMAKE_BUILD_TYPE=Release
|
||||
make --jobs=$(nproc)
|
||||
```
|
||||
|
||||
The result of this command is a compiled shared library (`.so`, `.dylib` or `.dll`). It should be loaded in the
|
||||
application using `Core` class instance method `AddExtension` like this
|
||||
`core.AddExtension(make_so_pointer<IExtension>(compiled_library_file_name), "CPU");`.
|
||||
|
||||
To test that the extension is implemented correctly we can run the [Benchmark App](../../inference-engine/tools/benchmark_tool/README.md)
|
||||
the following way:
|
||||
```bash
|
||||
python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/benchmark_tool/benchmark_app.py \
|
||||
-m <PATH_TO_IR>/wnet_20.xml \
|
||||
-l <PATH_TO_BUILD_DIR>/libfft_cpu_extension.so \
|
||||
-d CPU
|
||||
```
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit)
|
||||
- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org)
|
||||
- [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
|
||||
- [Model Optimizer Extensibility](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
- [Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md)
|
||||
- [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md)
|
||||
- [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index)
|
||||
@@ -204,9 +366,7 @@ For a step-by-step walk-through creating and executing a custom layer, see [Cust
|
||||
## Converting Models:
|
||||
|
||||
- [Convert Your Caffe* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Caffe.md)
|
||||
- [Convert Your Kaldi* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md)
|
||||
- [Convert Your TensorFlow* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md)
|
||||
- [Convert Your MXNet* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_MxNet.md)
|
||||
- [Convert Your ONNX* Model](../MO_DG/prepare_model/convert_model/Convert_Model_From_ONNX.md)
|
||||
|
||||
|
||||
|
||||
|
||||
36
docs/HOWTO/ie_cpu_extension/CMakeLists.txt
Normal file
36
docs/HOWTO/ie_cpu_extension/CMakeLists.txt
Normal file
@@ -0,0 +1,36 @@
|
||||
#
|
||||
# Copyright (C) 2018-2019 Intel Corporation
|
||||
# SPDX-License-Identifier: Apache-2.0
|
||||
|
||||
# ===============================================================================
|
||||
# Generated file for building library with user generated CPU extensions
|
||||
#
|
||||
# Contains implementation of the basic layer methods
|
||||
#
|
||||
# Refer to the section "Adding Your Own Kernels to the Inference Engine" in
|
||||
# OpenVINO* documentation (either online or offline in
|
||||
# <INSTALL_DIR>/deployment_tools/documentation/docs/index.html an then navigate
|
||||
# to the corresponding section).
|
||||
# ===============================================================================
|
||||
|
||||
# [fft_cmake_list:cmake]
|
||||
set(CPU_EXTENSIONS_BASE_DIR ${CMAKE_CURRENT_SOURCE_DIR} CACHE INTERNAL "")
|
||||
|
||||
set(CMAKE_CXX_STANDARD 11)
|
||||
|
||||
find_package(ngraph REQUIRED OPTIONAL_COMPONENTS onnx_importer)
|
||||
find_package(InferenceEngine REQUIRED)
|
||||
find_package(OpenCV REQUIRED COMPONENTS core)
|
||||
|
||||
set(TARGET_NAME fft_cpu_extension)
|
||||
|
||||
file(GLOB SRC ${CPU_EXTENSIONS_BASE_DIR}/*.cpp)
|
||||
|
||||
add_library(${TARGET_NAME} SHARED ${SRC})
|
||||
|
||||
target_compile_definitions(${TARGET_NAME} PRIVATE IMPLEMENT_INFERENCE_EXTENSION_API)
|
||||
target_link_libraries(${TARGET_NAME} PRIVATE ${InferenceEngine_LIBRARIES}
|
||||
${NGRAPH_LIBRARIES}
|
||||
opencv_core)
|
||||
# [fft_cmake_list:cmake]
|
||||
|
||||
67
docs/HOWTO/ie_cpu_extension/extension.cpp
Normal file
67
docs/HOWTO/ie_cpu_extension/extension.cpp
Normal file
@@ -0,0 +1,67 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
|
||||
|
||||
//! [fft_extension:implementation]
|
||||
#include "extension.hpp"
|
||||
#include "fft_kernel.hpp"
|
||||
#include "fft_op.hpp"
|
||||
#include <ngraph/factory.hpp>
|
||||
#include <ngraph/opsets/opset.hpp>
|
||||
|
||||
#include <map>
|
||||
#include <memory>
|
||||
#include <string>
|
||||
#include <unordered_map>
|
||||
#include <vector>
|
||||
|
||||
using namespace FFTExtension;
|
||||
|
||||
void Extension::GetVersion(const InferenceEngine::Version *&versionInfo) const noexcept {
|
||||
static InferenceEngine::Version ExtensionDescription = {
|
||||
{1, 0}, // extension API version
|
||||
"1.0",
|
||||
"The CPU plugin extension with FFT operation" // extension description message
|
||||
};
|
||||
|
||||
versionInfo = &ExtensionDescription;
|
||||
}
|
||||
|
||||
std::map<std::string, ngraph::OpSet> Extension::getOpSets() {
|
||||
std::map<std::string, ngraph::OpSet> opsets;
|
||||
ngraph::OpSet opset;
|
||||
opset.insert<FFTOp>();
|
||||
opsets["fft_extension"] = opset;
|
||||
return opsets;
|
||||
}
|
||||
|
||||
std::vector<std::string> Extension::getImplTypes(const std::shared_ptr<ngraph::Node> &node) {
|
||||
if (std::dynamic_pointer_cast<FFTOp>(node)) {
|
||||
return {"CPU"};
|
||||
}
|
||||
return {};
|
||||
}
|
||||
|
||||
InferenceEngine::ILayerImpl::Ptr Extension::getImplementation(const std::shared_ptr<ngraph::Node> &node, const std::string &implType) {
|
||||
if (std::dynamic_pointer_cast<FFTOp>(node) && implType == "CPU") {
|
||||
return std::make_shared<FFTImpl>(node);
|
||||
}
|
||||
return nullptr;
|
||||
}
|
||||
|
||||
INFERENCE_EXTENSION_API(InferenceEngine::StatusCode) InferenceEngine::CreateExtension(InferenceEngine::IExtension *&ext,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept {
|
||||
try {
|
||||
ext = new Extension();
|
||||
return OK;
|
||||
} catch (std::exception &ex) {
|
||||
if (resp) {
|
||||
std::string err = ((std::string) "Couldn't create extension: ") + ex.what();
|
||||
err.copy(resp->msg, 255);
|
||||
}
|
||||
return InferenceEngine::GENERAL_ERROR;
|
||||
}
|
||||
}
|
||||
//! [fft_extension:implementation]
|
||||
|
||||
32
docs/HOWTO/ie_cpu_extension/extension.hpp
Normal file
32
docs/HOWTO/ie_cpu_extension/extension.hpp
Normal file
@@ -0,0 +1,32 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
|
||||
|
||||
//! [fft_extension:header]
|
||||
#pragma once
|
||||
|
||||
#include <ie_iextension.h>
|
||||
#include <ie_api.h>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
#include <memory>
|
||||
#include <vector>
|
||||
#include <string>
|
||||
#include <map>
|
||||
|
||||
namespace FFTExtension {
|
||||
|
||||
class Extension : public InferenceEngine::IExtension {
|
||||
public:
|
||||
Extension() = default;
|
||||
void GetVersion(const InferenceEngine::Version*& versionInfo) const noexcept override;
|
||||
void Unload() noexcept override {}
|
||||
void Release() noexcept override { delete this; }
|
||||
|
||||
std::map<std::string, ngraph::OpSet> getOpSets() override;
|
||||
std::vector<std::string> getImplTypes(const std::shared_ptr<ngraph::Node>& node) override;
|
||||
InferenceEngine::ILayerImpl::Ptr getImplementation(const std::shared_ptr<ngraph::Node>& node, const std::string& implType) override;
|
||||
};
|
||||
|
||||
}
|
||||
//! [fft_extension:header]
|
||||
119
docs/HOWTO/ie_cpu_extension/fft_kernel.cpp
Normal file
119
docs/HOWTO/ie_cpu_extension/fft_kernel.cpp
Normal file
@@ -0,0 +1,119 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
//! [fft_kernel:implementation]
|
||||
#include "fft_kernel.hpp"
|
||||
#include "fft_op.hpp"
|
||||
#include <details/ie_exception.hpp>
|
||||
#include <ie_layouts.h>
|
||||
|
||||
#include <opencv2/opencv.hpp>
|
||||
|
||||
using namespace FFTExtension;
|
||||
|
||||
FFTImpl::FFTImpl(const std::shared_ptr<ngraph::Node> &node) {
|
||||
auto castedNode = std::dynamic_pointer_cast<FFTOp>(node);
|
||||
if (!castedNode)
|
||||
THROW_IE_EXCEPTION << "Cannot create implementation for unknown operation!";
|
||||
if (castedNode->inputs().size() != 1 || castedNode->outputs().size() != 1)
|
||||
THROW_IE_EXCEPTION << "Cannot create implementation for operation with incorrect number of inputs or outputs!";
|
||||
if (castedNode->get_input_partial_shape(0).is_dynamic() || castedNode->get_output_partial_shape(0).is_dynamic())
|
||||
THROW_IE_EXCEPTION << "Cannot create implementation for op with dynamic shapes!";
|
||||
if (castedNode->get_input_element_type(0) != ngraph::element::f32 || castedNode->get_output_element_type(0) != ngraph::element::f32)
|
||||
THROW_IE_EXCEPTION << "Operation supports only FP32 tensors.";
|
||||
inpShape = castedNode->get_input_shape(0);
|
||||
outShape = castedNode->get_output_shape(0);
|
||||
inverse = castedNode->inverse;
|
||||
}
|
||||
|
||||
InferenceEngine::StatusCode FFTImpl::getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept {
|
||||
std::vector<InferenceEngine::DataConfig> inDataConfig;
|
||||
std::vector<InferenceEngine::DataConfig> outDataConfig;
|
||||
InferenceEngine::SizeVector order(inpShape.size());
|
||||
std::iota(order.begin(), order.end(), 0);
|
||||
|
||||
// Allow any offset before data
|
||||
size_t offset((std::numeric_limits<size_t>::max)());
|
||||
|
||||
// Input shape
|
||||
InferenceEngine::DataConfig inpConf;
|
||||
inpConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, inpShape, {inpShape, order, offset});
|
||||
inDataConfig.push_back(inpConf);
|
||||
|
||||
// Output shape
|
||||
InferenceEngine::DataConfig outConf;
|
||||
outConf.desc = InferenceEngine::TensorDesc(InferenceEngine::Precision::FP32, outShape, {outShape, order, offset});
|
||||
outDataConfig.push_back(outConf);
|
||||
|
||||
InferenceEngine::LayerConfig layerConfig;
|
||||
layerConfig.inConfs = inDataConfig;
|
||||
layerConfig.outConfs = outDataConfig;
|
||||
|
||||
conf.push_back(layerConfig);
|
||||
return InferenceEngine::StatusCode::OK;
|
||||
}
|
||||
|
||||
InferenceEngine::StatusCode FFTImpl::init(InferenceEngine::LayerConfig &config, InferenceEngine::ResponseDesc *resp) noexcept {
|
||||
try {
|
||||
if (config.inConfs.size() != 1 || config.outConfs.size() != 1) {
|
||||
THROW_IE_EXCEPTION << "Operation cannot be initialized with incorrect number of inputs/outputs!";
|
||||
}
|
||||
|
||||
if (config.outConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32 ||
|
||||
config.inConfs[0].desc.getPrecision() != InferenceEngine::Precision::FP32) {
|
||||
THROW_IE_EXCEPTION << "Operation supports only FP32 precisions!";
|
||||
}
|
||||
} catch (InferenceEngine::details::InferenceEngineException& ex) {
|
||||
if (resp) {
|
||||
strncpy(resp->msg, error.c_str(), sizeof(resp->msg) - 1);
|
||||
resp->msg[sizeof(resp->msg)-1] = 0;
|
||||
}
|
||||
return InferenceEngine::GENERAL_ERROR;
|
||||
}
|
||||
return InferenceEngine::OK;
|
||||
}
|
||||
|
||||
static cv::Mat infEngineBlobToMat(const InferenceEngine::Blob::Ptr& blob)
|
||||
{
|
||||
// NOTE: Inference Engine sizes are reversed.
|
||||
std::vector<size_t> dims = blob->getTensorDesc().getDims();
|
||||
std::vector<int> size(dims.begin(), dims.end());
|
||||
auto precision = blob->getTensorDesc().getPrecision();
|
||||
CV_Assert(precision == InferenceEngine::Precision::FP32);
|
||||
return cv::Mat(size, CV_32F, (void*)blob->buffer());
|
||||
}
|
||||
|
||||
InferenceEngine::StatusCode FFTImpl::execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
|
||||
std::vector<InferenceEngine::Blob::Ptr> &outputs,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept {
|
||||
cv::Mat inp = infEngineBlobToMat(inputs[0]);
|
||||
cv::Mat out = infEngineBlobToMat(outputs[0]);
|
||||
|
||||
const int n = inp.size[0];
|
||||
const int h = inp.size[2];
|
||||
const int w = inp.size[3];
|
||||
cv::Mat complex(h, w, CV_32FC2), interleavedOut(h, w, CV_32FC2);
|
||||
for (int i = 0; i < n; ++i) {
|
||||
std::vector<cv::Mat> components = {
|
||||
cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 0)),
|
||||
cv::Mat(h, w, CV_32F, inp.ptr<float>(i, 1))
|
||||
};
|
||||
cv::merge(components, complex);
|
||||
|
||||
if (!inverse)
|
||||
cv::dft(complex, interleavedOut);
|
||||
else
|
||||
cv::idft(complex, interleavedOut, cv::DFT_SCALE);
|
||||
|
||||
components = {
|
||||
cv::Mat(h, w, CV_32F, out.ptr<float>(i, 0)),
|
||||
cv::Mat(h, w, CV_32F, out.ptr<float>(i, 1))
|
||||
};
|
||||
cv::split(interleavedOut, components);
|
||||
}
|
||||
return InferenceEngine::OK;
|
||||
}
|
||||
//! [fft_kernel:implementation]
|
||||
|
||||
32
docs/HOWTO/ie_cpu_extension/fft_kernel.hpp
Normal file
32
docs/HOWTO/ie_cpu_extension/fft_kernel.hpp
Normal file
@@ -0,0 +1,32 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
// source: https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension
|
||||
|
||||
//! [fft_kernel:header]
|
||||
#pragma once
|
||||
|
||||
#include <ie_iextension.h>
|
||||
#include <ngraph/ngraph.hpp>
|
||||
|
||||
namespace FFTExtension {
|
||||
|
||||
class FFTImpl : public InferenceEngine::ILayerExecImpl {
|
||||
public:
|
||||
explicit FFTImpl(const std::shared_ptr<ngraph::Node>& node);
|
||||
InferenceEngine::StatusCode getSupportedConfigurations(std::vector<InferenceEngine::LayerConfig> &conf,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept override;
|
||||
InferenceEngine::StatusCode init(InferenceEngine::LayerConfig &config,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept override;
|
||||
InferenceEngine::StatusCode execute(std::vector<InferenceEngine::Blob::Ptr> &inputs,
|
||||
std::vector<InferenceEngine::Blob::Ptr> &outputs,
|
||||
InferenceEngine::ResponseDesc *resp) noexcept override;
|
||||
private:
|
||||
ngraph::Shape inpShape;
|
||||
ngraph::Shape outShape;
|
||||
bool inverse;
|
||||
std::string error;
|
||||
};
|
||||
|
||||
}
|
||||
//! [fft_kernel:header]
|
||||
34
docs/HOWTO/ie_cpu_extension/fft_op.cpp
Normal file
34
docs/HOWTO/ie_cpu_extension/fft_op.cpp
Normal file
@@ -0,0 +1,34 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
//! [fft_op:implementation]
|
||||
#include "fft_op.hpp"
|
||||
|
||||
using namespace FFTExtension;
|
||||
|
||||
constexpr ngraph::NodeTypeInfo FFTOp::type_info;
|
||||
|
||||
FFTOp::FFTOp(const ngraph::Output<ngraph::Node>& inp, bool _inverse) : Op({inp}) {
|
||||
constructor_validate_and_infer_types();
|
||||
inverse = _inverse;
|
||||
}
|
||||
|
||||
void FFTOp::validate_and_infer_types() {
|
||||
auto outShape = get_input_partial_shape(0);
|
||||
set_output_type(0, get_input_element_type(0), outShape);
|
||||
}
|
||||
|
||||
std::shared_ptr<ngraph::Node> FFTOp::clone_with_new_inputs(const ngraph::OutputVector &new_args) const {
|
||||
if (new_args.size() != 1) {
|
||||
throw ngraph::ngraph_error("Incorrect number of new arguments");
|
||||
}
|
||||
return std::make_shared<FFTOp>(new_args.at(0), inverse);
|
||||
}
|
||||
|
||||
bool FFTOp::visit_attributes(ngraph::AttributeVisitor &visitor) {
|
||||
visitor.on_attribute("inverse", inverse);
|
||||
return true;
|
||||
}
|
||||
//! [fft_op:implementation]
|
||||
|
||||
28
docs/HOWTO/ie_cpu_extension/fft_op.hpp
Normal file
28
docs/HOWTO/ie_cpu_extension/fft_op.hpp
Normal file
@@ -0,0 +1,28 @@
|
||||
// Copyright (C) 2020 Intel Corporation
|
||||
// SPDX-License-Identifier: Apache-2.0
|
||||
//
|
||||
|
||||
//! [fft_op:header]
|
||||
#pragma once
|
||||
|
||||
#include <ngraph/ngraph.hpp>
|
||||
|
||||
namespace FFTExtension {
|
||||
|
||||
class FFTOp : public ngraph::op::Op {
|
||||
public:
|
||||
static constexpr ngraph::NodeTypeInfo type_info{"FFT", 0};
|
||||
const ngraph::NodeTypeInfo& get_type_info() const override { return type_info; }
|
||||
|
||||
FFTOp() = default;
|
||||
FFTOp(const ngraph::Output<ngraph::Node>& inp, bool inverse);
|
||||
void validate_and_infer_types() override;
|
||||
std::shared_ptr<ngraph::Node> clone_with_new_inputs(const ngraph::OutputVector& new_args) const override;
|
||||
bool visit_attributes(ngraph::AttributeVisitor& visitor) override;
|
||||
|
||||
bool inverse;
|
||||
};
|
||||
|
||||
}
|
||||
//! [fft_op:header]
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c2f362a39ae6c2af080e4f055b6fdba4954f918f85731545d1df3d687d9213d5
|
||||
size 421056
|
||||
@@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:cb5c700d003936779455353bfa4ed9432410c0975c46e2dfd30c6a1abccd1727
|
||||
size 23320
|
||||
@@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:99d6b5146be85fa408dc5432883c3e2745cffe890133854a97dcf22f5c5962d4
|
||||
size 47564
|
||||
3
docs/HOWTO/img/converted_subgraph.png
Normal file
3
docs/HOWTO/img/converted_subgraph.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f7c8ab4f15874d235968471bcf876c89c795d601e69891208107b8b72aa58eb1
|
||||
size 70014
|
||||
@@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0a4de6e502cae7542f1f311bcdbea6bb145f960f0d27d86a03160d1a60133778
|
||||
size 301310
|
||||
3
docs/HOWTO/img/unsupported_subgraph.png
Normal file
3
docs/HOWTO/img/unsupported_subgraph.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:3d5ccf51fe1babb93d96d042494695a6a6e055d1f8ebf7eef5083d54d8987a23
|
||||
size 58789
|
||||
57
docs/HOWTO/mo_extensions/front/tf/Complex.py
Normal file
57
docs/HOWTO/mo_extensions/front/tf/Complex.py
Normal file
@@ -0,0 +1,57 @@
|
||||
"""
|
||||
Copyright (C) 2018-2020 Intel Corporation
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
"""
|
||||
|
||||
#! [complex:transformation]
|
||||
import logging as log
|
||||
|
||||
import numpy as np
|
||||
|
||||
from mo.front.common.replacement import FrontReplacementSubgraph
|
||||
from mo.graph.graph import Graph
|
||||
|
||||
|
||||
class Complex(FrontReplacementSubgraph):
|
||||
enabled = True
|
||||
|
||||
def pattern(self):
|
||||
return dict(
|
||||
nodes=[
|
||||
('strided_slice_real', dict(op='StridedSlice')),
|
||||
('strided_slice_imag', dict(op='StridedSlice')),
|
||||
('complex', dict(op='Complex')),
|
||||
],
|
||||
edges=[
|
||||
('strided_slice_real', 'complex', {'in': 0}),
|
||||
('strided_slice_imag', 'complex', {'in': 1}),
|
||||
])
|
||||
|
||||
@staticmethod
|
||||
def replace_sub_graph(graph: Graph, match: dict):
|
||||
strided_slice_real = match['strided_slice_real']
|
||||
strided_slice_imag = match['strided_slice_imag']
|
||||
complex_node = match['complex']
|
||||
|
||||
# make sure that both strided slice operations get the same data as input
|
||||
assert strided_slice_real.in_port(0).get_source() == strided_slice_imag.in_port(0).get_source()
|
||||
|
||||
# identify the output port of the operation producing datat for strided slice nodes
|
||||
input_node_output_port = strided_slice_real.in_port(0).get_source()
|
||||
input_node_output_port.disconnect()
|
||||
|
||||
# change the connection so now all consumers of "complex_node" get data from input node of strided slice nodes
|
||||
complex_node.out_port(0).get_connection().set_source(input_node_output_port)
|
||||
#! [complex:transformation]
|
||||
|
||||
40
docs/HOWTO/mo_extensions/front/tf/ComplexAbs.py
Normal file
40
docs/HOWTO/mo_extensions/front/tf/ComplexAbs.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""
|
||||
Copyright (C) 2018-2020 Intel Corporation
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
"""
|
||||
|
||||
#! [complex_abs:transformation]
|
||||
import numpy as np
|
||||
|
||||
from extensions.ops.elementwise import Pow
|
||||
from extensions.ops.ReduceOps import ReduceSum
|
||||
from mo.front.common.replacement import FrontReplacementOp
|
||||
from mo.graph.graph import Graph, Node
|
||||
from mo.ops.const import Const
|
||||
|
||||
|
||||
class ComplexAbs(FrontReplacementOp):
|
||||
op = "ComplexAbs"
|
||||
enabled = True
|
||||
|
||||
def replace_op(self, graph: Graph, node: Node):
|
||||
pow_2 = Const(graph, {'value': np.float32(2.0)}).create_node()
|
||||
reduce_axis = Const(graph, {'value': np.int32(-1)}).create_node()
|
||||
pow_0_5 = Const(graph, {'value': np.float32(0.5)}).create_node()
|
||||
|
||||
sq = Pow(graph, dict(name=node.in_node(0).name + '/sq', power=2.0)).create_node([node.in_node(0), pow_2])
|
||||
sum = ReduceSum(graph, dict(name=sq.name + '/sum')).create_node([sq, reduce_axis])
|
||||
sqrt = Pow(graph, dict(name=sum.name + '/sqrt', power=0.5)).create_node([sum, pow_0_5])
|
||||
return [sqrt.id]
|
||||
#! [complex_abs:transformation]
|
||||
47
docs/HOWTO/mo_extensions/front/tf/FFT_ext.py
Normal file
47
docs/HOWTO/mo_extensions/front/tf/FFT_ext.py
Normal file
@@ -0,0 +1,47 @@
|
||||
"""
|
||||
Copyright (C) 2018-2020 Intel Corporation
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
"""
|
||||
|
||||
# ! [fft_ext:extractor]
|
||||
from ...ops.FFT import FFT
|
||||
from mo.front.extractor import FrontExtractorOp
|
||||
from mo.utils.error import Error
|
||||
|
||||
|
||||
class FFT2DFrontExtractor(FrontExtractorOp):
|
||||
op = 'FFT2D'
|
||||
enabled = True
|
||||
|
||||
@classmethod
|
||||
def extract(cls, node):
|
||||
attrs = {
|
||||
'inverse': 0
|
||||
}
|
||||
FFT.update_node_stat(node, attrs)
|
||||
return cls.enabled
|
||||
|
||||
|
||||
class IFFT2DFrontExtractor(FrontExtractorOp):
|
||||
op = 'IFFT2D'
|
||||
enabled = True
|
||||
|
||||
@classmethod
|
||||
def extract(cls, node):
|
||||
attrs = {
|
||||
'inverse': 1
|
||||
}
|
||||
FFT.update_node_stat(node, attrs)
|
||||
return cls.enabled
|
||||
# ! [fft_ext:extractor]
|
||||
40
docs/HOWTO/mo_extensions/ops/FFT.py
Normal file
40
docs/HOWTO/mo_extensions/ops/FFT.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""
|
||||
Copyright (C) 2018-2020 Intel Corporation
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
"""
|
||||
|
||||
#! [fft:operation]
|
||||
from mo.front.common.partial_infer.elemental import copy_shape_infer
|
||||
from mo.graph.graph import Node, Graph
|
||||
from mo.ops.op import Op
|
||||
|
||||
|
||||
class FFT(Op):
|
||||
op = 'FFT'
|
||||
enabled = False
|
||||
|
||||
def __init__(self, graph: Graph, attrs: dict):
|
||||
super().__init__(graph, {
|
||||
'type': self.op,
|
||||
'op': self.op,
|
||||
'version': 'fft_extension',
|
||||
'inverse': None,
|
||||
'in_ports_count': 1,
|
||||
'out_ports_count': 1,
|
||||
'infer': copy_shape_infer
|
||||
}, attrs)
|
||||
|
||||
def backend_attrs(self):
|
||||
return ['inverse']
|
||||
#! [fft:operation]
|
||||
@@ -2,6 +2,29 @@
|
||||
|
||||
The sections below contain detailed list of changes made to the Inference Engine API in recent releases.
|
||||
|
||||
## 2021.2
|
||||
|
||||
### New API
|
||||
|
||||
**State API**
|
||||
|
||||
* InferenceEngine::InferRequest::QueryState query state value of network on current infer request
|
||||
* InferenceEngine::IVariableState class instead of IMemoryState (rename)
|
||||
* InferenceEngine::IVariableState::GetState instead of IMemoryState::GetLastState (rename)
|
||||
|
||||
**BatchedBlob** - represents a InferenceEngine::BatchedBlob containing other blobs - one per batch.
|
||||
|
||||
**Transformations API** - added a new header `ie_transformations.hpp` which contains transformations for InferenceEngine::CNNNetwork object. Such transformations can be called prior to loading network for compilation for particular device:
|
||||
|
||||
* InferenceEngine::LowLatency
|
||||
|
||||
### Deprecated API
|
||||
|
||||
**State API**
|
||||
|
||||
* InferenceEngine::ExecutableNetwork::QueryState - use InferenceEngine::InferRequest::QueryState
|
||||
* InferenceEngine::IVariableState::GetLastState - use InferenceEngine::IVariableState::GetState
|
||||
|
||||
## 2021.1
|
||||
|
||||
### Deprecated API
|
||||
|
||||
@@ -20,7 +20,7 @@ There are two ways to check if CPU device can support bfloat16 computations for
|
||||
1. Query the instruction set via system `lscpu | grep avx512_bf16` or `cat /proc/cpuinfo | grep avx512_bf16`.
|
||||
2. Use [Query API](InferenceEngine_QueryAPI.md) with `METRIC_KEY(OPTIMIZATION_CAPABILITIES)`, which should return `BF16` in the list of CPU optimization options:
|
||||
|
||||
@snippet openvino/docs/snippets/Bfloat16Inference0.cpp part0
|
||||
@snippet snippets/Bfloat16Inference0.cpp part0
|
||||
|
||||
Current Inference Engine solution for bfloat16 inference uses Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) and supports inference of the following layers in BF16 computation mode:
|
||||
* Convolution
|
||||
@@ -46,11 +46,11 @@ Bfloat16 data usage provides the following benefits that increase performance:
|
||||
For default optimization on CPU, source model converts from FP32 or FP16 to BF16 and executes internally on platforms with native BF16 support. In that case, `KEY_ENFORCE_BF16` is set to `YES`.
|
||||
The code below demonstrates how to check if the key is set:
|
||||
|
||||
@snippet openvino/docs/snippets/Bfloat16Inference1.cpp part1
|
||||
@snippet snippets/Bfloat16Inference1.cpp part1
|
||||
|
||||
To disable BF16 internal transformations, set the `KEY_ENFORCE_BF16` to `NO`. In this case, the model infers AS IS without modifications with precisions that were set on each layer edge.
|
||||
|
||||
@snippet openvino/docs/snippets/Bfloat16Inference2.cpp part2
|
||||
@snippet snippets/Bfloat16Inference2.cpp part2
|
||||
|
||||
An exception with message `Platform doesn't support BF16 format` is formed in case of setting `KEY_ENFORCE_BF16` to `YES` on CPU without native BF16 support.
|
||||
|
||||
|
||||
@@ -18,7 +18,7 @@ The batch size that was set in passed <code>CNNNetwork</code> object will be use
|
||||
|
||||
Here is a code example:
|
||||
|
||||
@snippet openvino/docs/snippets/DynamicBatching.cpp part0
|
||||
@snippet snippets/DynamicBatching.cpp part0
|
||||
|
||||
|
||||
## Limitations
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# Add Custom nGraph Operations {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps}
|
||||
# Custom nGraph Operation {#openvino_docs_IE_DG_Extensibility_DG_AddingNGraphOps}
|
||||
|
||||
Inference Engine Extension API allows to register operation sets (opsets) with custom nGraph operations, it allows to support Networks with unknown operations.
|
||||
|
||||
@@ -20,7 +20,7 @@ To add your custom nGraph operation, create a new class that extends `ngraph::Op
|
||||
|
||||
Based on that, declaration of a operation class can look as follows:
|
||||
|
||||
@snippet op.hpp op:header
|
||||
@snippet template_extension/op.hpp op:header
|
||||
|
||||
### Class Fields
|
||||
|
||||
@@ -33,37 +33,37 @@ The provided implementation has several fields:
|
||||
|
||||
nGraph operation contains two constructors: a default constructor, which allows to create operation without attributes and a constructor that creates and validates operation with specified inputs and attributes.
|
||||
|
||||
@snippet op.cpp op:ctor
|
||||
@snippet template_extension/op.cpp op:ctor
|
||||
|
||||
### `validate_and_infer_types()`
|
||||
|
||||
`ngraph::Node::validate_and_infer_types` method validates operation attributes and calculates output shapes using attributes of operation.
|
||||
|
||||
@snippet op.cpp op:validate
|
||||
@snippet template_extension/op.cpp op:validate
|
||||
|
||||
### `clone_with_new_inputs()`
|
||||
|
||||
`ngraph::Node::clone_with_new_inputs` method creates a copy of nGraph operation with new inputs.
|
||||
|
||||
@snippet op.cpp op:copy
|
||||
@snippet template_extension/op.cpp op:copy
|
||||
|
||||
### `visit_attributes()`
|
||||
|
||||
`ngraph::Node::visit_attributes` method allows to visit all operation attributes.
|
||||
|
||||
@snippet op.cpp op:visit_attributes
|
||||
@snippet template_extension/op.cpp op:visit_attributes
|
||||
|
||||
### `evaluate()`
|
||||
|
||||
`ngraph::Node::evaluate` method allows to apply constant folding to an operation.
|
||||
|
||||
@snippet op.cpp op:evaluate
|
||||
@snippet template_extension/op.cpp op:evaluate
|
||||
|
||||
## Register Custom Operations in Extension Class
|
||||
|
||||
To add custom operations to the [Extension](Extension.md) class, create an operation set with custom operations and implement the `InferenceEngine::IExtension::getOpSets` method:
|
||||
|
||||
@snippet extension.cpp extension:getOpSets
|
||||
@snippet template_extension/extension.cpp extension:getOpSets
|
||||
|
||||
This method returns a map of opsets that exist in the extension library.
|
||||
|
||||
@@ -71,10 +71,9 @@ nGraph provides opsets mechanism for operation versioning. Different opsets dist
|
||||
|
||||
When specifying opset names, follow the rules below:
|
||||
* Use unique opset names.
|
||||
* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opest2`.
|
||||
* Do not use the following built-in opset names: `extension`, `experimental`, `opset1`, `opset2`, `opset3`, ... , `opsetN`.
|
||||
* Make sure that the Model Optimizer and your extension use the same opset names.
|
||||
* IR v10 layers have the mandatory `version` attribute specifying the opset.
|
||||
* `opset1` is the name of default operations set.
|
||||
* IR v10 operations have the mandatory `version` attribute specifying the opset.
|
||||
Operations from the default opset cannot be redefined.
|
||||
|
||||
Use a custom opset to create a new operation or extend functionality of an existing operation from another opset.
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
# How to Implement Custom CPU Layers {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel}
|
||||
# How to Implement Custom CPU Operations {#openvino_docs_IE_DG_Extensibility_DG_CPU_Kernel}
|
||||
|
||||
The primary vehicle for the performance of the CPU codepath in the Inference Engine is the Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), and new CPU kernels extend the Inference Engine plugin for the Intel MKL-DNN. Implementing the InferenceEngine::ILayerExecImpl defines a general CPU-side extension. There are no Intel MKL-DNN specifics in the way you need to implement a kernel.
|
||||
|
||||
@@ -7,7 +7,7 @@ The primary vehicle for the performance of the CPU codepath in the Inference Eng
|
||||
All custom kernels for the CPU plugin should be inherited from the InferenceEngine::ILayerExecImpl interface.
|
||||
Based on that, declaration of a kernel implementation class can look as follows:
|
||||
|
||||
@snippet cpu_kernel.hpp cpu_implementation:header
|
||||
@snippet template_extension/cpu_kernel.hpp cpu_implementation:header
|
||||
|
||||
### Class Fields
|
||||
|
||||
@@ -22,25 +22,25 @@ The provided implementation has several fields:
|
||||
|
||||
An implementation constructor checks parameters of nGraph operation, stores needed attributes, and stores an error message in the case of an error.
|
||||
|
||||
@snippet cpu_kernel.cpp cpu_implementation:ctor
|
||||
@snippet template_extension/cpu_kernel.cpp cpu_implementation:ctor
|
||||
|
||||
### `getSupportedConfigurations`
|
||||
|
||||
InferenceEngine::ILayerExecImpl::getSupportedConfigurations method returns all supported configuration formats (input/output tensor layouts) for your implementation. To specify formats of data, use InferenceEngine::TensorDesc. Refer to the [Memory Primitives](../Memory_primitives.md) section for instructions on how to do it.
|
||||
|
||||
@snippet cpu_kernel.cpp cpu_implementation:getSupportedConfigurations
|
||||
@snippet template_extension/cpu_kernel.cpp cpu_implementation:getSupportedConfigurations
|
||||
|
||||
### `init`
|
||||
|
||||
InferenceEngine::ILayerExecImpl::init method gets a runtime-selected configuration from a vector that is populated from the `getSupportedConfigurations` method and checks the parameters:
|
||||
|
||||
@snippet cpu_kernel.cpp cpu_implementation:init
|
||||
@snippet template_extension/cpu_kernel.cpp cpu_implementation:init
|
||||
|
||||
### `execute`
|
||||
|
||||
InferenceEngine::ILayerExecImpl::execute method accepts and processes the actual tenors as input/output blobs:
|
||||
|
||||
@snippet cpu_kernel.cpp cpu_implementation:execute
|
||||
@snippet template_extension/cpu_kernel.cpp cpu_implementation:execute
|
||||
|
||||
## Register Implementation in `Extension` Class
|
||||
|
||||
@@ -52,18 +52,18 @@ To register custom kernel implementation in the [Extension](Extension.md) class,
|
||||
|
||||
InferenceEngine::IExtension::getImplTypes returns a vector of implementation types for an operation.
|
||||
|
||||
@snippet extension.cpp extension:getImplTypes
|
||||
@snippet template_extension/extension.cpp extension:getImplTypes
|
||||
|
||||
### <a name="getImplementation"><code>getImplementation</code></a>
|
||||
|
||||
InferenceEngine::IExtension::getImplementation returns the kernel implementation with a specified type for an operation.
|
||||
|
||||
@snippet extension.cpp extension:getImplementation
|
||||
@snippet template_extension/extension.cpp extension:getImplementation
|
||||
|
||||
|
||||
## Load Extension with Executable Kernels to Plugin
|
||||
|
||||
Use the `AddExtension` method of the general plugin interface to load your primitives:
|
||||
|
||||
@snippet openvino/docs/snippets/CPU_Kernel.cpp part0
|
||||
@snippet snippets/CPU_Kernel.cpp part0
|
||||
|
||||
|
||||
@@ -24,11 +24,11 @@ The `ngraph::onnx_import::Node` class represents a node in ONNX model. It provid
|
||||
New operator registration must happen before the ONNX model is read, for example, if an ONNX model uses the 'CustomRelu' operator, `register_operator("CustomRelu", ...)` must be called before InferenceEngine::Core::ReadNetwork.
|
||||
Re-registering ONNX operators within the same process is supported. During registration of the existing operator, a warning is printed.
|
||||
|
||||
The example below demonstrates an examplary model that requires previously created 'CustomRelu' operator:
|
||||
The example below demonstrates an exemplary model that requires previously created 'CustomRelu' operator:
|
||||
@snippet onnx_custom_op/onnx_custom_op.cpp onnx_custom_op:model
|
||||
|
||||
|
||||
For a reference on how to create a graph with nGraph operations, visit [nGraph tutorial](../nGraphTutorial.md).
|
||||
For a reference on how to create a graph with nGraph operations, visit [Custom nGraph Operation](AddingNGraphOps.md).
|
||||
For a complete list of predefined nGraph operators, visit [available operations sets](../../ops/opset.md).
|
||||
|
||||
If operator is no longer needed, it can be unregistered by calling `unregister_operator`. The function takes three arguments `op_type`, `version`, and `domain`.
|
||||
@@ -38,12 +38,12 @@ If operator is no longer needed, it can be unregistered by calling `unregister_o
|
||||
|
||||
The same principles apply when registering custom ONNX operator based on custom nGraph operations.
|
||||
This example shows how to register custom ONNX operator based on `Operation` presented in [this tutorial](AddingNGraphOps.md), which is used in [TemplateExtension](Extension.md).
|
||||
@snippet extension.cpp extension:ctor
|
||||
@snippet template_extension/extension.cpp extension:ctor
|
||||
|
||||
Here, the `register_operator` function is called in Extension's constructor, which makes sure that it is called before InferenceEngine::Core::ReadNetwork (since InferenceEngine::Core::AddExtension must be called before a model with custom operator is read).
|
||||
|
||||
The example below demonstrates how to unregister operator from Extension's destructor:
|
||||
@snippet extension.cpp extension:dtor
|
||||
@snippet template_extension/extension.cpp extension:dtor
|
||||
Note that it is mandatory to unregister custom ONNX operator if it is defined in dynamic shared library.
|
||||
|
||||
## Requirements for building with CMake
|
||||
|
||||
@@ -5,11 +5,11 @@ All extension libraries should be inherited from this interface.
|
||||
|
||||
Based on that, declaration of an extension class can look as follows:
|
||||
|
||||
@snippet extension.hpp extension:header
|
||||
@snippet template_extension/extension.hpp extension:header
|
||||
|
||||
The extension library should contain and export the method InferenceEngine::CreateExtension, which creates an `Extension` class:
|
||||
|
||||
@snippet extension.cpp extension:CreateExtension
|
||||
@snippet template_extension/extension.cpp extension:CreateExtension
|
||||
|
||||
Also, an `Extension` object should implement the following methods:
|
||||
|
||||
@@ -17,7 +17,7 @@ Also, an `Extension` object should implement the following methods:
|
||||
|
||||
* InferenceEngine::IExtension::GetVersion returns information about version of the library
|
||||
|
||||
@snippet extension.cpp extension:GetVersion
|
||||
@snippet template_extension/extension.cpp extension:GetVersion
|
||||
|
||||
Implement the InferenceEngine::IExtension::getOpSets method if the extension contains custom layers.
|
||||
Read the [guide about custom operations](AddingNGraphOps.md) for more information.
|
||||
|
||||
@@ -1,16 +1,16 @@
|
||||
# How to Implement Custom GPU Layers {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel}
|
||||
# How to Implement Custom GPU Operations {#openvino_docs_IE_DG_Extensibility_DG_GPU_Kernel}
|
||||
|
||||
The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the layer.
|
||||
The GPU codepath abstracts many details about OpenCL™. You need to provide the kernel code in OpenCL C and the configuration file that connects the kernel and its parameters to the parameters of the operation.
|
||||
|
||||
There are two options of using custom layer configuration file:
|
||||
There are two options of using custom operation configuration file:
|
||||
|
||||
* Include a section with your kernels into the global automatically-loaded `cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file, which is hosted in the `<INSTALL_DIR>/deployment_tools/inference_engine/bin/intel64/{Debug/Release}` folder
|
||||
* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom layers to the plugin:
|
||||
* Call the `InferenceEngine::Core::SetConfig()` method from your application with the `InferenceEngine::PluginConfigParams::KEY_CONFIG_FILE` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_Kernel.cpp part0
|
||||
@snippet snippets/GPU_Kernel.cpp part0
|
||||
|
||||
All Inference Engine samples, except trivial `hello_classification`,
|
||||
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom layers for the classification sample, run the command below:
|
||||
feature a dedicated command-line option `-c` to load custom kernels. For example, to load custom operations for the classification sample, run the command below:
|
||||
```sh
|
||||
$ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validation_set/daily/227x227/apron.bmp -d GPU
|
||||
-c <absolute_path_to_config>/custom_layer_example.xml
|
||||
@@ -19,7 +19,7 @@ $ ./classification_sample -m <path_to_model>/bvlc_alexnet_fp16.xml -i ./validati
|
||||
## Configuration File Format <a name="config-file-format"></a>
|
||||
|
||||
The configuration file is expected to follow the `.xml` file structure
|
||||
with a node of the type `CustomLayer` for every custom layer you provide.
|
||||
with a node of the type `CustomLayer` for every custom operation you provide.
|
||||
|
||||
The definitions described in the sections below use the following notations:
|
||||
|
||||
@@ -32,14 +32,13 @@ Notation | Description
|
||||
|
||||
### CustomLayer Node and Sub-node Structure
|
||||
|
||||
`CustomLayer` node contains the entire configuration for a single custom
|
||||
layer.
|
||||
`CustomLayer` node contains the entire configuration for a single custom operation.
|
||||
|
||||
| Attribute Name |\# | Description |
|
||||
|-----|-----|-----|
|
||||
| `name` | (1) | The name of the layer type to be used. This name should be identical to the type used in the IR.|
|
||||
| `type` | (1) | Must be `SimpleGPU`. |
|
||||
| `version` | (1) | Must be `1`. |
|
||||
| `name` | (1) | The name of the operation type to be used. This name should be identical to the type used in the IR.|
|
||||
| `type` | (1) | Must be `SimpleGPU`. |
|
||||
| `version` | (1) | Must be `1`. |
|
||||
|
||||
**Sub-nodes**: `Kernel` (1), `Buffers` (1), `CompilerOptions` (0+),
|
||||
`WorkSizes` (0/1)
|
||||
@@ -69,9 +68,9 @@ the sources during compilation (JIT).
|
||||
| Attribute Name | \# | Description |
|
||||
|------|-------|------|
|
||||
| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well (taken as a string). |
|
||||
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
|
||||
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
|
||||
| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. |
|
||||
| `default` | (0/1) | The default value to be used if the specified parameters is missing from the layer in the IR. |
|
||||
| `default` | (0/1) | The default value to be used if the specified parameters is missing from the operation in the IR. |
|
||||
|
||||
**Sub-nodes:** None
|
||||
|
||||
@@ -92,7 +91,7 @@ weights or biases).
|
||||
|
||||
| Attribute Name | \# | Description |
|
||||
|----|-----|------|
|
||||
| `name` | (1) | Name of a blob attached to a layer in the IR |
|
||||
| `name` | (1) | Name of a blob attached to a operation in the IR |
|
||||
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to |
|
||||
|
||||
**Sub-nodes**: None
|
||||
@@ -105,7 +104,7 @@ weights or biases).
|
||||
|------|-------|-------|
|
||||
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
|
||||
| `type` | (1) | `input` or `output` |
|
||||
| `port-index` | (1) | 0-based index in the layer’s input/output ports in the IR |
|
||||
| `port-index` | (1) | 0-based index in the operation input/output ports in the IR |
|
||||
| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB` (also in all lowercase). Default value: `BFYX` |
|
||||
|
||||
### CompilerOptions Node and Sub-node Structure
|
||||
@@ -178,7 +177,7 @@ For an example, see [Example Kernel](#example-kernel).
|
||||
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array |
|
||||
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element (bypassing the lower padding) |
|
||||
All `<TENSOR>` values are automatically defined for every tensor
|
||||
bound to this layer (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown
|
||||
bound to this operation (`INPUT0`, `INPUT1`, `OUTPUT0`, and so on), as shown
|
||||
in the following for example:
|
||||
|
||||
```sh
|
||||
@@ -227,7 +226,7 @@ floating-point, and integer kernel parameters. To get the dump, add the
|
||||
following line to your code that configures the GPU plugin to output the
|
||||
custom kernels:
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_Kernel.cpp part1
|
||||
@snippet snippets/GPU_Kernel.cpp part1
|
||||
|
||||
When the Inference Engine compiles the kernels for the specific network,
|
||||
it also outputs the resulting code for the custom kernels. In the
|
||||
|
||||
@@ -2,19 +2,22 @@
|
||||
|
||||
Inference Engine Extensibility API allows to add support of custom operations to the Inference Engine.
|
||||
Extension should contain operation sets with custom operations and execution kernels for custom operations.
|
||||
Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function that allows to create a new extension instance.
|
||||
Physically, an extension library can be represented as a dynamic library exporting the single `CreateExtension` function
|
||||
that allows to create a new extension instance.
|
||||
|
||||
Extensibility library can be loaded to the InferenceEngine::Core object using the InferenceEngine::Core::AddExtension method.
|
||||
Extensibility library can be loaded to the `InferenceEngine::Core` object using the
|
||||
`InferenceEngine::Core::AddExtension` method.
|
||||
|
||||
## Inference Engine Extension Library
|
||||
|
||||
Inference Engine Extension dynamic library contains several main components:
|
||||
Inference Engine Extension dynamic library contains several components:
|
||||
|
||||
* [Extension class](Extension.md):
|
||||
* [Extension Library](Extension.md):
|
||||
- Contains custom operation sets
|
||||
- Provides CPU implementations for custom operations
|
||||
* [Custom operations](Intro.md):
|
||||
- Allows to use InferenceEngine::Core::ReadNetwork to read Intermediate Representation (IR) with unsupported operations
|
||||
* [Custom nGraph Operation](AddingNGraphOps.md):
|
||||
- Allows to use `InferenceEngine::Core::ReadNetwork` to read Intermediate Representation (IR) with unsupported
|
||||
operations
|
||||
- Allows to create `ngraph::Function` with unsupported operations
|
||||
- Provides shape inference mechanism for custom operations
|
||||
|
||||
@@ -26,13 +29,13 @@ at `<dldt source tree>/docs/template_extension`.
|
||||
|
||||
The Inference Engine workflow involves the creation of custom kernels and either custom or existing operations.
|
||||
|
||||
An _Operation_ is a Network building block implemented in the training framework, for example, `Convolution` in Caffe*.
|
||||
An _Operation_ is a network building block implemented in the training framework, for example, `Convolution` in Caffe*.
|
||||
A _Kernel_ is defined as the corresponding implementation in the Inference Engine.
|
||||
|
||||
Refer to the [Custom Layers in the Model Optimizer](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) section for details on how
|
||||
mapping between framework layers and Inference Engine kernels is registered.
|
||||
Refer to the [Model Optimizer Extensibility](../../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
for details on how a mapping between framework operations and Inference Engine kernels is registered.
|
||||
|
||||
In short, you can plug your own kernel implementations into the Inference Engine and map them to the layers in the original framework.
|
||||
In short, you can plug your own kernel implementations into the Inference Engine and map them to the operations in the original framework.
|
||||
|
||||
The following pages describe how to integrate custom _kernels_ into the Inference Engine:
|
||||
|
||||
|
||||
@@ -30,7 +30,7 @@ File with tuned data is the result of this step.
|
||||
|
||||
The example below shows how to set and use the key files:
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_Kernels_Tuning.cpp part0
|
||||
@snippet snippets/GPU_Kernels_Tuning.cpp part0
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -72,7 +72,7 @@ Glossary of terms used in the Inference Engine
|
||||
| <code>InferenceEngineProfileInfo</code> | Represents basic inference profiling information per layer |
|
||||
| Inference Engine | A C++ library with a set of classes that you can use in your application to infer input data (images) and get the result |
|
||||
| Inference Engine API | The basic default API for all supported devices, which allows you to load a model from Intermediate Representation, set input and output formats and execute the model on various devices |
|
||||
| Inference Engine <code>Core<code> | Inference Engine Core is a software component that manages inference on certain Intel(R) hardware devices: CPU, GPU, MYRIAD, GNA, etc. |
|
||||
| Inference Engine <code>Core</code> | Inference Engine Core is a software component that manages inference on certain Intel(R) hardware devices: CPU, GPU, MYRIAD, GNA, etc. |
|
||||
| Layer catalog or Operations specification | A list of supported layers or operations and its parameters. Sets of supported layers are different for different plugins, please check the documentation on plugins to verify if the Inference Engine supports certain layer on the dedicated hardware |
|
||||
| <code>Layout</code> | Image data layout refers to the representation of images batch. Layout shows a sequence of 4D or 5D tensor data in memory. A typical NCHW format represents pixel in horizontal direction, rows by vertical dimension, planes by channel and images into batch |
|
||||
| <code>OutputsDataMap</code> | Structure which contains information about output precisions and layouts |
|
||||
|
||||
@@ -23,7 +23,7 @@ The `InferenceEngine::ExecutableNetwork` class is also extended to support the Q
|
||||
|
||||
### GetAvailableDevices
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI0.cpp part0
|
||||
@snippet snippets/InferenceEngine_QueryAPI0.cpp part0
|
||||
|
||||
The function returns list of available devices, for example:
|
||||
```
|
||||
@@ -32,7 +32,8 @@ MYRIAD.1.4-ma2480
|
||||
FPGA.0
|
||||
FPGA.1
|
||||
CPU
|
||||
GPU
|
||||
GPU.0
|
||||
GPU.1
|
||||
...
|
||||
```
|
||||
|
||||
@@ -46,7 +47,7 @@ Each device name can then be passed to:
|
||||
|
||||
The code below demonstrates how to understand whether `HETERO` device dumps `.dot` files with split graphs during the split stage:
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI1.cpp part1
|
||||
@snippet snippets/InferenceEngine_QueryAPI1.cpp part1
|
||||
|
||||
For documentation about common configuration keys, refer to `ie_plugin_config.hpp`. Device specific configuration keys can be found in corresponding plugin folders.
|
||||
|
||||
@@ -54,7 +55,7 @@ For documentation about common configuration keys, refer to `ie_plugin_config.hp
|
||||
|
||||
* To extract device properties such as available device, device name, supported configuration keys, and others, use the `InferenceEngine::Core::GetMetric` method:
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI2.cpp part2
|
||||
@snippet snippets/InferenceEngine_QueryAPI2.cpp part2
|
||||
|
||||
A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`.
|
||||
|
||||
@@ -66,17 +67,17 @@ A returned value looks as follows: `Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz`.
|
||||
|
||||
The method is used to get executable network specific metric such as `METRIC_KEY(OPTIMAL_NUMBER_OF_INFER_REQUESTS)`:
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI3.cpp part3
|
||||
@snippet snippets/InferenceEngine_QueryAPI3.cpp part3
|
||||
|
||||
Or the current temperature of `MYRIAD` device:
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI4.cpp part4
|
||||
@snippet snippets/InferenceEngine_QueryAPI4.cpp part4
|
||||
|
||||
### GetConfig()
|
||||
|
||||
The method is used to get information about configuration values the executable network has been created with:
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_QueryAPI5.cpp part5
|
||||
@snippet snippets/InferenceEngine_QueryAPI5.cpp part5
|
||||
|
||||
### SetConfig()
|
||||
|
||||
|
||||
@@ -29,20 +29,20 @@ Integration process includes the following steps:
|
||||
|
||||
1) **Create Inference Engine Core** to manage available devices and read network objects:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part0
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part0
|
||||
|
||||
2) **Read a model IR** created by the Model Optimizer (.xml is supported format):
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part1
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part1
|
||||
|
||||
**Or read the model from ONNX format** (.onnx and .prototxt are supported formats). You can find more information about the ONNX format support in the document [ONNX format support in the OpenVINO™](./ONNX_Support.md).
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part2
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part2
|
||||
|
||||
3) **Configure input and output**. Request input and output information using `InferenceEngine::CNNNetwork::getInputsInfo()`, and `InferenceEngine::CNNNetwork::getOutputsInfo()`
|
||||
methods:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part3
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part3
|
||||
|
||||
Optionally, set the number format (precision) and memory layout for inputs and outputs. Refer to the
|
||||
[Supported configurations](supported_plugins/Supported_Devices.md) chapter to choose the relevant configuration.
|
||||
@@ -67,7 +67,7 @@ methods:
|
||||
|
||||
You can use the following code snippet to configure input and output:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part4
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part4
|
||||
|
||||
> **NOTE**: NV12 input color format pre-processing differs from other color conversions. In case of NV12,
|
||||
> Inference Engine expects two separate image planes (Y and UV). You must use a specific
|
||||
@@ -91,31 +91,31 @@ methods:
|
||||
|
||||
4) **Load the model** to the device using `InferenceEngine::Core::LoadNetwork()`:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part5
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part5
|
||||
|
||||
It creates an executable network from a network object. The executable network is associated with single hardware device.
|
||||
It is possible to create as many networks as needed and to use them simultaneously (up to the limitation of the hardware resources).
|
||||
Third parameter is a configuration for plugin. It is map of pairs: (parameter name, parameter value). Choose device from
|
||||
[Supported devices](supported_plugins/Supported_Devices.md) page for more details about supported configuration parameters.
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part6
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part6
|
||||
|
||||
5) **Create an infer request**:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part7
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part7
|
||||
|
||||
6) **Prepare input**. You can use one of the following options to prepare input:
|
||||
* **Optimal way for a single network.** Get blobs allocated by an infer request using `InferenceEngine::InferRequest::GetBlob()`
|
||||
and feed an image and the input data to the blobs. In this case, input data must be aligned (resized manually) with a
|
||||
given blob size and have a correct color format.
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part8
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part8
|
||||
|
||||
* **Optimal way for a cascade of networks (output of one network is input for another).** Get output blob from the first
|
||||
request using `InferenceEngine::InferRequest::GetBlob()` and set it as input for the second request using
|
||||
`InferenceEngine::InferRequest::SetBlob()`.
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part9
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part9
|
||||
|
||||
* **Optimal way to handle ROI (a ROI object located inside of input of one network is input for another).** It is
|
||||
possible to re-use shared input by several networks. You do not need to allocate separate input blob for a network if
|
||||
@@ -126,7 +126,7 @@ methods:
|
||||
ROI without allocation of new memory using `InferenceEngine::make_shared_blob()` with passing of
|
||||
`InferenceEngine::Blob::Ptr` and `InferenceEngine::ROI` as parameters.
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part10
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part10
|
||||
|
||||
Make sure that shared input is kept valid during execution of each network. Otherwise, ROI blob may be corrupted if the
|
||||
original input blob (that ROI is cropped from) has already been rewritten.
|
||||
@@ -134,7 +134,7 @@ methods:
|
||||
* Allocate input blobs of the appropriate types and sizes, feed an image and the input data to the blobs, and call
|
||||
`InferenceEngine::InferRequest::SetBlob()` to set these blobs for an infer request:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part11
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part11
|
||||
|
||||
A blob can be filled before and after `SetBlob()`.
|
||||
|
||||
@@ -157,11 +157,11 @@ methods:
|
||||
7) **Do inference** by calling the `InferenceEngine::InferRequest::StartAsync` and `InferenceEngine::InferRequest::Wait`
|
||||
methods for asynchronous request:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part12
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part12
|
||||
|
||||
or by calling the `InferenceEngine::InferRequest::Infer` method for synchronous request:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part13
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part13
|
||||
|
||||
`StartAsync` returns immediately and starts inference without blocking main thread, `Infer` blocks
|
||||
main thread and returns when inference is completed.
|
||||
@@ -185,7 +185,7 @@ exception.
|
||||
Note that casting `Blob` to `TBlob` via `std::dynamic_pointer_cast` is not recommended way,
|
||||
better to access data via `buffer()` and `as()` methods as follows:
|
||||
|
||||
@snippet openvino/docs/snippets/Integrate_with_customer_application_new_API.cpp part14
|
||||
@snippet snippets/Integrate_with_customer_application_new_API.cpp part14
|
||||
|
||||
## Build Your Application
|
||||
|
||||
|
||||
@@ -116,7 +116,7 @@ For Intel® Distribution of OpenVINO™ toolkit, the Inference Engine package co
|
||||
[sample console applications](Samples_Overview.md) demonstrating how you can use
|
||||
the Inference Engine in your applications.
|
||||
|
||||
The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md">Inference Engine Build Instructions</a>.
|
||||
The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode">Inference Engine Build Instructions</a>.
|
||||
## See Also
|
||||
- [Inference Engine Samples](Samples_Overview.md)
|
||||
- [Intel® Deep Learning Deployment Toolkit Web Page](https://software.intel.com/en-us/computer-vision-sdk)
|
||||
|
||||
@@ -27,44 +27,44 @@ Common migration process includes the following steps:
|
||||
|
||||
1. Migrate from the `InferenceEngine::InferencePlugin` initialization:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part0
|
||||
@snippet snippets/Migration_CoreAPI.cpp part0
|
||||
|
||||
to the `InferenceEngine::Core` class initialization:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part1
|
||||
@snippet snippets/Migration_CoreAPI.cpp part1
|
||||
|
||||
2. Instead of using `InferenceEngine::CNNNetReader` to read IR:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part2
|
||||
@snippet snippets/Migration_CoreAPI.cpp part2
|
||||
|
||||
read networks using the Core class:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part3
|
||||
@snippet snippets/Migration_CoreAPI.cpp part3
|
||||
|
||||
The Core class also allows reading models from the ONNX format (more information is [here](./ONNX_Support.md)):
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part4
|
||||
@snippet snippets/Migration_CoreAPI.cpp part4
|
||||
|
||||
3. Instead of adding CPU device extensions to the plugin:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part5
|
||||
@snippet snippets/Migration_CoreAPI.cpp part5
|
||||
|
||||
add extensions to CPU device using the Core class:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part6
|
||||
@snippet snippets/Migration_CoreAPI.cpp part6
|
||||
|
||||
4. Instead of setting configuration keys to a particular plugin, set (key, value) pairs via `InferenceEngine::Core::SetConfig`
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part7
|
||||
@snippet snippets/Migration_CoreAPI.cpp part7
|
||||
|
||||
> **NOTE**: If `deviceName` is omitted as the last argument, configuration is set for all Inference Engine devices.
|
||||
|
||||
5. Migrate from loading the network to a particular plugin:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part8
|
||||
@snippet snippets/Migration_CoreAPI.cpp part8
|
||||
|
||||
to `InferenceEngine::Core::LoadNetwork` to a particular device:
|
||||
|
||||
@snippet openvino/docs/snippets/Migration_CoreAPI.cpp part9
|
||||
@snippet snippets/Migration_CoreAPI.cpp part9
|
||||
|
||||
After you have an instance of `InferenceEngine::ExecutableNetwork`, all other steps are as usual.
|
||||
|
||||
@@ -18,7 +18,7 @@ Two categories of API functions:
|
||||
To list all supported ONNX ops in a specific version and domain, use the `get_supported_operators`
|
||||
as shown in the example below:
|
||||
|
||||
@snippet openvino/docs/snippets/OnnxImporterTutorial0.cpp part0
|
||||
@snippet snippets/OnnxImporterTutorial0.cpp part0
|
||||
|
||||
The above code produces a list of all the supported operators for the `version` and `domain` you specified and outputs a list similar to this:
|
||||
```cpp
|
||||
@@ -30,7 +30,7 @@ Xor
|
||||
|
||||
To determine whether a specific ONNX operator in a particular version and domain is supported by the importer, use the `is_operator_supported` function as shown in the example below:
|
||||
|
||||
@snippet openvino/docs/snippets/OnnxImporterTutorial1.cpp part1
|
||||
@snippet snippets/OnnxImporterTutorial1.cpp part1
|
||||
|
||||
## Import ONNX Model
|
||||
|
||||
@@ -55,13 +55,13 @@ As it was shown in [Build a Model with nGraph Library](../nGraph_DG/build_functi
|
||||
|
||||
The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the stream as an input:
|
||||
|
||||
@snippet openvino/docs/snippets/OnnxImporterTutorial2.cpp part2
|
||||
@snippet snippets/OnnxImporterTutorial2.cpp part2
|
||||
|
||||
### <a name="path">Filepath as Input</a>
|
||||
|
||||
The code below shows how to convert the ONNX ResNet50 model to the nGraph function using `import_onnx_model` with the filepath as an input:
|
||||
|
||||
@snippet openvino/docs/snippets/OnnxImporterTutorial3.cpp part3
|
||||
@snippet snippets/OnnxImporterTutorial3.cpp part3
|
||||
|
||||
[onnx_header]: https://github.com/NervanaSystems/ngraph/blob/master/src/ngraph/frontend/onnx_import/onnx.hpp
|
||||
[onnx_model_zoo]: https://github.com/onnx/models
|
||||
|
||||
@@ -35,13 +35,15 @@ Inference Engine sample applications include the following:
|
||||
- [Object Detection for SSD C Sample](../../inference-engine/ie_bridges/c/samples/object_detection_sample_ssd/README.md)
|
||||
- [Object Detection for SSD Python* Sample](../../inference-engine/ie_bridges/python/sample/object_detection_sample_ssd/README.md)
|
||||
|
||||
> **NOTE**: All samples support input paths containing only ASCII characters, except the Hello Classification Sample, that supports Unicode.
|
||||
|
||||
## Media Files Available for Samples
|
||||
|
||||
To run the sample applications, you can use images and videos from the media files collection available at https://github.com/intel-iot-devkit/sample-videos.
|
||||
|
||||
## Samples that Support Pre-Trained Models
|
||||
|
||||
You can download the [pre-trained models](@ref omz_models_intel_index) using the OpenVINO [Model Downloader](@ref omz_tools_downloader_README) or from [https://download.01.org/opencv/](https://download.01.org/opencv/).
|
||||
To run the sample, you can use [public](@ref omz_models_public_index) or [Intel's](@ref omz_models_intel_index) pre-trained models from the Open Model Zoo. The models can be downloaded using the [Model Downloader](@ref omz_tools_downloader_README).
|
||||
|
||||
## Build the Sample Applications
|
||||
|
||||
@@ -53,7 +55,7 @@ The officially supported Linux* build environment is the following:
|
||||
* GCC* 7.5.0 (for Ubuntu* 18.04) or GCC* 4.8.5 (for CentOS* 7.6)
|
||||
* CMake* version 3.10 or higher
|
||||
|
||||
> **NOTE**: For building samples from the open-source version of OpenVINO™ toolkit, see the [build instructions on GitHub](https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md).
|
||||
> **NOTE**: For building samples from the open-source version of OpenVINO™ toolkit, see the [build instructions on GitHub](https://github.com/openvinotoolkit/openvino/wiki/BuildingCode).
|
||||
|
||||
To build the C or C++ sample applications for Linux, go to the `<INSTALL_DIR>/inference_engine/samples/c` or `<INSTALL_DIR>/inference_engine/samples/cpp` directory, respectively, and run the `build_samples.sh` script:
|
||||
```sh
|
||||
|
||||
@@ -1,6 +1,36 @@
|
||||
Using Shape Inference {#openvino_docs_IE_DG_ShapeInference}
|
||||
==========================================
|
||||
|
||||
OpenVINO™ provides the following methods for runtime model reshaping:
|
||||
|
||||
* **Set a new input shape** with the `InferenceEngine::CNNNetwork::reshape` method.<br>
|
||||
The `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers.
|
||||
|
||||
> **NOTES**:
|
||||
> - Starting with the 2021.1 release, the Model Optimizer converts topologies keeping shape-calculating sub-graphs by default, which enables correct shape propagation during reshaping in most cases.
|
||||
> - Older versions of IRs are not guaranteed to reshape successfully. Please regenerate them with the Model Optimizer of the latest version of OpenVINO™.<br>
|
||||
> - If an ONNX model does not have a fully defined input shape and the model was imported with the ONNX importer, reshape the model before loading it to the plugin.
|
||||
|
||||
* **Set a new batch dimension value** with the `InferenceEngine::CNNNetwork::setBatchSize` method.<br>
|
||||
The meaning of a model batch may vary depending on the model design.
|
||||
This method does not deduce batch placement for inputs from the model architecture.
|
||||
It assumes that the batch is placed at the zero index in the shape for all inputs and uses the `InferenceEngine::CNNNetwork::reshape` method to propagate updated shapes through the model.
|
||||
|
||||
The method transforms the model before a new shape propagation to relax a hard-coded batch dimension in the model, if any.
|
||||
|
||||
Use `InferenceEngine::CNNNetwork::reshape` instead of `InferenceEngine::CNNNetwork::setBatchSize` to set new input shapes for the model in case the model has:
|
||||
* Multiple inputs with different zero-index dimension meanings
|
||||
* Input without a batch dimension
|
||||
* 0D, 1D, or 3D shape
|
||||
|
||||
The `InferenceEngine::CNNNetwork::setBatchSize` method is a high-level API method that wraps the `InferenceEngine::CNNNetwork::reshape` method call and works for trivial models from the batch placement standpoint.
|
||||
Use `InferenceEngine::CNNNetwork::reshape` for other models.
|
||||
|
||||
Using the `InferenceEngine::CNNNetwork::setBatchSize` method for models with a non-zero index batch placement or for models with inputs that do not have a batch dimension may lead to undefined behaviour.
|
||||
|
||||
You can change input shapes multiple times using the `InferenceEngine::CNNNetwork::reshape` and `InferenceEngine::CNNNetwork::setBatchSize` methods in any order.
|
||||
If a model has a hard-coded batch dimension, use `InferenceEngine::CNNNetwork::setBatchSize` first to change the batch, then call `InferenceEngine::CNNNetwork::reshape` to update other dimensions, if needed.
|
||||
|
||||
Inference Engine takes three kinds of a model description as an input, which are converted into an `InferenceEngine::CNNNetwork` object:
|
||||
1. [Intermediate Representation (IR)](../MO_DG/IR_and_opsets.md) through `InferenceEngine::Core::ReadNetwork`
|
||||
2. [ONNX model](../IE_DG/OnnxImporterTutorial.md) through `InferenceEngine::Core::ReadNetwork`
|
||||
@@ -23,33 +53,7 @@ for (const auto & parameter : parameters) {
|
||||
|
||||
To feed input data of a shape that is different from the model input shape, reshape the model first.
|
||||
|
||||
OpenVINO™ provides the following methods for runtime model reshaping:
|
||||
|
||||
* **Set a new input shape** with the `InferenceEngine::CNNNetwork::reshape` method.<br>
|
||||
The `InferenceEngine::CNNNetwork::reshape` method updates input shapes and propagates them down to the outputs of the model through all intermediate layers.
|
||||
You can reshape a model multiple times like in this application scheme:
|
||||
```
|
||||
ReadNetwork -> reshape(input_1_shape) -> LoadNetwork -> infer(input_1)
|
||||
\
|
||||
-> reshape(input_2_shape) -> LoadNetwork -> infer(input_2)
|
||||
```
|
||||
> **NOTES**:
|
||||
> - Starting with the 2021.1 release, the Model Optimizer converts topologies keeping shape-calculating sub-graphs by default, which enables correct shape propagation during reshaping.
|
||||
> - Older versions of IRs are not guaranteed to reshape successfully. Please regenerate them with the Model Optimizer of the latest version of OpenVINO™.<br>
|
||||
> - If an ONNX model does not have a fully defined input shape and the model was imported with the ONNX importer, reshape the model before loading it to the plugin.
|
||||
* **Set a new batch dimension value** with the `InferenceEngine::CNNNetwork::setBatchSize` method.<br>
|
||||
The meaning of a model batch may vary depending on the model design.
|
||||
The `InferenceEngine::CNNNetwork::setBatchSize` method deduces the index of a batch dimension based only on the input rank.
|
||||
This method does not work for models with a non-zero index batch placement or models with inputs without a batch dimension.
|
||||
The batch-setting algorithm does not involve the shape inference mechanism.
|
||||
Batch of input and output shapes for all layers is set to a new batch value without layer validation.
|
||||
It may cause both positive and negative side effects.
|
||||
Due to the limitations described above, the current method is not recommended to use.
|
||||
If you need to set a new batch size for the model, use the `CNNNetwork::reshape` method instead.
|
||||
|
||||
Do not use runtime reshaping methods simultaneously, especially do not call the `CNNNetwork::reshape` method after you use `InferenceEngine::CNNNetwork::setBatchSize`.
|
||||
The `InferenceEngine::CNNNetwork::setBatchSize` method causes irreversible conversion of the internal model representation into the legacy model representation.
|
||||
The method does not use nGraph for shape inference which leads to reduced reshape opportunities and may affect the performance of the model.
|
||||
Once the input shape of `InferenceEngine::CNNNetwork` is set, call the `InferenceEngine::Core::LoadNetwork` method to get an `InferenceEngine::ExecutableNetwork` object for inference with updated shapes.
|
||||
|
||||
There are other approaches to reshape the model during the stage of <a href="_docs_MO_DG_prepare_model_convert_model_Converting_Model_General.html#when_to_specify_input_shapes">IR generation</a> or [nGraph::Function creation](../nGraph_DG/build_function.md).
|
||||
|
||||
@@ -62,8 +66,8 @@ Shape collision during shape propagation may be a sign that a new shape does not
|
||||
Changing the model input shape may result in intermediate operations shape collision.
|
||||
|
||||
Examples of such operations:
|
||||
- <a href="_docs_MO_DG_prepare_model_convert_model_IR_V10_opset1.html#Reshape">`Reshape` operation</a> with a hard-coded output shape value
|
||||
- <a href="_docs_MO_DG_prepare_model_convert_model_IR_V10_opset1.html#MatMul">`MatMul` operation</a> with the `Const` second input cannot be resized by spatial dimensions due to operation semantics
|
||||
- [`Reshape` operation](../ops/shape/Reshape_1.md) with a hard-coded output shape value
|
||||
- [`MatMul` operation](../ops/matrix/MatMul_1.md) with the `Const` second input cannot be resized by spatial dimensions due to operation semantics
|
||||
|
||||
Model structure and logic should not change significantly after model reshaping.
|
||||
- The Global Pooling operation is commonly used to reduce output feature map of classification models output.
|
||||
@@ -94,7 +98,7 @@ The algorithm for resizing network is the following:
|
||||
|
||||
Here is a code example:
|
||||
|
||||
@snippet openvino/docs/snippets/ShapeInference.cpp part0
|
||||
@snippet snippets/ShapeInference.cpp part0
|
||||
|
||||
Shape Inference feature is used in [Smart classroom sample](@ref omz_demos_smart_classroom_demo_README).
|
||||
|
||||
|
||||
3
docs/IE_DG/img/applying_low_latency.png
Executable file
3
docs/IE_DG/img/applying_low_latency.png
Executable file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f4f6e9d35869fa2c414e58914aaec1607eb7d4768b69c0cbcce5d5fa3ceddba3
|
||||
size 56444
|
||||
3
docs/IE_DG/img/low_latency_limitation_1.png
Executable file
3
docs/IE_DG/img/low_latency_limitation_1.png
Executable file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:28f4e7ee50785e9c571725942e67c899d08e87af3802f6bea4721c64bfdb2bac
|
||||
size 21722
|
||||
3
docs/IE_DG/img/low_latency_limitation_2.png
Executable file
3
docs/IE_DG/img/low_latency_limitation_2.png
Executable file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0923af3acfb69dd0b88a5edf097e60c2655828b643d8e328561b13b0196c0850
|
||||
size 47997
|
||||
3
docs/IE_DG/img/state_network_example.png
Normal file
3
docs/IE_DG/img/state_network_example.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:9976341ca931f3ab4e4fbccea26844b738adb27b091149a4c6231eda841ab867
|
||||
size 144541
|
||||
@@ -7,26 +7,26 @@ Inference Engine is a set of C++ libraries providing a common API to deliver inf
|
||||
|
||||
For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages.
|
||||
|
||||
The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md">Inference Engine Build Instructions</a>.
|
||||
The open source version is available in the [OpenVINO™ toolkit GitHub repository](https://github.com/openvinotoolkit/openvino) and can be built for supported platforms using the <a href="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode">Inference Engine Build Instructions</a>.
|
||||
|
||||
To learn about how to use the Inference Engine API for your application, see the [Integrating Inference Engine in Your Application](Integrate_with_customer_application_new_API.md) documentation.
|
||||
|
||||
For complete API Reference, see the [API Reference](usergroup29.html) section.
|
||||
For complete API Reference, see the [Inference Engine API References](./api_references.html) section.
|
||||
|
||||
Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, etc. Each plugin implements the unified API and provides additional hardware-specific APIs.
|
||||
|
||||
Modules in the Inference Engine component
|
||||
---------------------------------------
|
||||
-----------------------------------------
|
||||
|
||||
### Core Inference Engine Libraries ###
|
||||
|
||||
Your application must link to the core Inference Engine libraries:
|
||||
* Linux* OS:
|
||||
- `libinference_engine.so`, which depends on `libinference_engine_transformations.so` and `libngraph.so`
|
||||
- `libinference_engine_legacy.so`, which depends on `libtbb.so`
|
||||
- `libinference_engine.so`, which depends on `libinference_engine_transformations.so`, `libtbb.so`, `libtbbmalloc.so` and `libngraph.so`
|
||||
* Windows* OS:
|
||||
- `inference_engine.dll`, which depends on `inference_engine_transformations.dll` and `ngraph.dll`
|
||||
- `inference_engine_legacy.dll`, which depends on `tbb.dll`
|
||||
- `inference_engine.dll`, which depends on `inference_engine_transformations.dll`, `tbb.dll`, `tbbmalloc.dll` and `ngraph.dll`
|
||||
* macOS*:
|
||||
- `libinference_engine.dylib`, which depends on `libinference_engine_transformations.dylib`, `libtbb.dylib`, `libtbbmalloc.dylib` and `libngraph.dylib`
|
||||
|
||||
The required C++ header files are located in the `include` directory.
|
||||
|
||||
@@ -49,26 +49,26 @@ Starting from 2020.4 release, Inference Engine introduced a concept of `CNNNetwo
|
||||
|
||||
For each supported target device, Inference Engine provides a plugin — a DLL/shared library that contains complete implementation for inference on this particular device. The following plugins are available:
|
||||
|
||||
| Plugin | Device Type |
|
||||
| ------------- | ------------- |
|
||||
|CPU| Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE |
|
||||
|GPU| Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics
|
||||
|MYRIAD| Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X|
|
||||
|GNA| Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor
|
||||
|HETERO|Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers|
|
||||
|MULTI| Simultaneous inference of the same network on several devices in parallel|
|
||||
| Plugin | Device Type |
|
||||
| ------- | ----------------------------- |
|
||||
|CPU | Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE |
|
||||
|GPU | Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics |
|
||||
|MYRIAD | Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X |
|
||||
|GNA | Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor |
|
||||
|HETERO | Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers|
|
||||
|MULTI | Simultaneous inference of the same network on several devices in parallel|
|
||||
|
||||
The table below shows the plugin libraries and additional dependencies for Linux and Windows platforms.
|
||||
The table below shows the plugin libraries and additional dependencies for Linux, Windows and macOS platforms.
|
||||
|
||||
| Plugin | Library name for Linux | Dependency libraries for Linux | Library name for Windows | Dependency libraries for Windows |
|
||||
|--------|------------------------|-------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------|
|
||||
| CPU | `libMKLDNNPlugin.so` | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll` | `inference_engine_lp_transformations.dll` |
|
||||
| GPU | `libclDNNPlugin.so` | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll` | `OpenCL.dll`, `inference_engine_lp_transformations.dll` |
|
||||
| MYRIAD | `libmyriadPlugin.so` | `libusb.so`, `libinference_engine_lp_transformations.so` | `myriadPlugin.dll` | `usb.dll`, `inference_engine_lp_transformations.dll` |
|
||||
| HDDL | `libHDDLPlugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`, `libinference_engine_lp_transformations.so`| `HDDLPlugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`, `inference_engine_lp_transformations.dll` |
|
||||
| GNA | `libGNAPlugin.so` | `libgna.so`, `libinference_engine_lp_transformations.so` | `GNAPlugin.dll` | `gna.dll`, `inference_engine_lp_transformations.dll` |
|
||||
| HETERO | `libHeteroPlugin.so` | Same as for selected plugins | `HeteroPlugin.dll` | Same as for selected plugins |
|
||||
| MULTI | `libMultiDevicePlugin.so` | Same as for selected plugins | `MultiDevicePlugin.dll` | Same as for selected plugins |
|
||||
| Plugin | Library name for Linux | Dependency libraries for Linux | Library name for Windows | Dependency libraries for Windows | Library name for macOS | Dependency libraries for macOS |
|
||||
|--------|-----------------------------|-------------------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------|------------------------------|---------------------------------------------|
|
||||
| CPU | `libMKLDNNPlugin.so` | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll` | `inference_engine_lp_transformations.dll` | `libMKLDNNPlugin.dylib` | `inference_engine_lp_transformations.dylib` |
|
||||
| GPU | `libclDNNPlugin.so` | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll` | `OpenCL.dll`, `inference_engine_lp_transformations.dll` | Is not supported | - |
|
||||
| MYRIAD | `libmyriadPlugin.so` | `libusb.so`, | `myriadPlugin.dll` | `usb.dll` | `libmyriadPlugin.dylib` | `libusb.dylib` |
|
||||
| HDDL | `libHDDLPlugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so` | `HDDLPlugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll` | Is not supported | - |
|
||||
| GNA | `libGNAPlugin.so` | `libgna.so`, | `GNAPlugin.dll` | `gna.dll` | Is not supported | - |
|
||||
| HETERO | `libHeteroPlugin.so` | Same as for selected plugins | `HeteroPlugin.dll` | Same as for selected plugins | `libHeteroPlugin.dylib` | Same as for selected plugins |
|
||||
| MULTI | `libMultiDevicePlugin.so` | Same as for selected plugins | `MultiDevicePlugin.dll` | Same as for selected plugins | `libMultiDevicePlugin.dylib` | Same as for selected plugins |
|
||||
|
||||
> **NOTE**: All plugin libraries also depend on core Inference Engine libraries.
|
||||
|
||||
@@ -76,15 +76,16 @@ Make sure those libraries are in your computer's path or in the place you pointe
|
||||
|
||||
* Linux: `LD_LIBRARY_PATH`
|
||||
* Windows: `PATH`
|
||||
* macOS: `DYLD_LIBRARY_PATH`
|
||||
|
||||
On Linux, use the script `bin/setupvars.sh` to set the environment variables.
|
||||
On Linux and macOS, use the script `bin/setupvars.sh` to set the environment variables.
|
||||
|
||||
On Windows, run the `bin\setupvars.bat` batch file to set the environment variables.
|
||||
|
||||
To learn more about supported devices and corresponding plugins, see the [Supported Devices](supported_plugins/Supported_Devices.md) chapter.
|
||||
|
||||
Common Workflow for Using the Inference Engine API
|
||||
---------------------------
|
||||
--------------------------------------------------
|
||||
The common workflow contains the following steps:
|
||||
|
||||
1. **Create Inference Engine Core object** - Create an `InferenceEngine::Core` object to work with different devices, all device plugins are managed internally by the `Core` object. Register extensions with custom nGraph operations (`InferenceEngine::Core::AddExtension`).
|
||||
|
||||
275
docs/IE_DG/network_state_intro.md
Normal file
275
docs/IE_DG/network_state_intro.md
Normal file
@@ -0,0 +1,275 @@
|
||||
Introduction to OpenVINO state API {#openvino_docs_IE_DG_network_state_intro}
|
||||
==============================
|
||||
|
||||
This section describes how to work with stateful networks in OpenVINO toolkit, specifically:
|
||||
* How stateful networks are represented in IR and nGraph
|
||||
* How operations with state can be done
|
||||
|
||||
The section additionally provides small examples of stateful network and code to infer it.
|
||||
|
||||
## What is a stateful network
|
||||
|
||||
Several use cases require processing of data sequences. When length of a sequence is known and small enough,
|
||||
we can process it with RNN like networks that contain a cycle inside. But in some cases, like online speech recognition of time series
|
||||
forecasting, length of data sequence is unknown. Then data can be divided in small portions and processed step-by-step. But dependency
|
||||
between data portions should be addressed. For that, networks save some data between inferences - state. When one dependent sequence is over,
|
||||
state should be reset to initial value and new sequence can be started.
|
||||
|
||||
Several frameworks have special API for states in networks. For example, Keras have special option for RNNs `stateful` that turns on saving state
|
||||
between inferences. Kaldi contains special specifier `Offset` to define time offset in a network.
|
||||
|
||||
OpenVINO also contains special API to simplify work with networks with states. State is automatically saved between inferences,
|
||||
and there is a way to reset state when needed. You can also read state or set it to some new value between inferences.
|
||||
|
||||
## OpenVINO state representation
|
||||
|
||||
OpenVINO contains special abstraction variable to represent state in a network. There are two operations to work with state:
|
||||
* `Assign` to save value in state
|
||||
* `ReadValue` to read value saved on previous iteration
|
||||
|
||||
You can find more details on these operations in [ReadValue specification](../ops/infrastructure/ReadValue_3.md) and
|
||||
[Assign specification](../ops/infrastructure/Assign_3.md).
|
||||
|
||||
## Examples of representation of a network with states
|
||||
|
||||
To get a model with states ready for inference, you can convert a model from another framework to IR with Model Optimizer or create an nGraph function
|
||||
(details can be found in [Build nGraph Function section](../nGraph_DG/build_function.md)).
|
||||
Let's represent the following graph in both forms:
|
||||
![state_network_example]
|
||||
|
||||
### Example of IR with state
|
||||
|
||||
The `bin` file for this graph should contain float 0 in binary form. Content of `xml` is the following.
|
||||
|
||||
```xml
|
||||
<?xml version="1.0" ?>
|
||||
<net name="summator" version="10">
|
||||
<layers>
|
||||
<layer id="0" name="init_value" type="Const" version="opset5">
|
||||
<data element_type="f32" offset="0" shape="1,1" size="4"/>
|
||||
<output>
|
||||
<port id="1" precision="FP32">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
<layer id="1" name="read" type="ReadValue" version="opset5">
|
||||
<data variable_id="id"/>
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</input>
|
||||
<output>
|
||||
<port id="1" precision="FP32">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
<layer id="2" name="input" type="Parameter" version="opset5">
|
||||
<data element_type="f32" shape="1,1"/>
|
||||
<output>
|
||||
<port id="0" precision="FP32">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
<layer id="3" name="add_sum" type="Add" version="opset5">
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
<port id="1">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</input>
|
||||
<output>
|
||||
<port id="2" precision="FP32">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
<layer id="4" name="save" type="Assign" version="opset5">
|
||||
<data variable_id="id"/>
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</input>
|
||||
</layer>
|
||||
<layer id="10" name="add" type="Add" version="opset5">
|
||||
<data axis="1"/>
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
<port id="1">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</input>
|
||||
<output>
|
||||
<port id="2" precision="FP32">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
<layer id="5" name="output/sink_port_0" type="Result" version="opset5">
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>1</dim>
|
||||
</port>
|
||||
</input>
|
||||
</layer>
|
||||
</layers>
|
||||
<edges>
|
||||
<edge from-layer="0" from-port="1" to-layer="1" to-port="0"/>
|
||||
<edge from-layer="2" from-port="0" to-layer="3" to-port="1"/>
|
||||
<edge from-layer="1" from-port="1" to-layer="3" to-port="0"/>
|
||||
<edge from-layer="3" from-port="2" to-layer="4" to-port="0"/>
|
||||
<edge from-layer="3" from-port="2" to-layer="10" to-port="0"/>
|
||||
<edge from-layer="1" from-port="1" to-layer="10" to-port="1"/>
|
||||
<edge from-layer="10" from-port="2" to-layer="5" to-port="0"/>
|
||||
</edges>
|
||||
<meta_data>
|
||||
<MO_version value="unknown version"/>
|
||||
<cli_parameters>
|
||||
</cli_parameters>
|
||||
</meta_data>
|
||||
</net>
|
||||
```
|
||||
|
||||
### Example of creating model nGraph API
|
||||
|
||||
```cpp
|
||||
auto arg = make_shared<op::Parameter>(element::f32, Shape{1, 1});
|
||||
auto init_const = op::Constant::create(element::f32, Shape{1, 1}, {0});
|
||||
auto read = make_shared<op::ReadValue>(init_const, "v0");
|
||||
std::vector<shared_ptr<Node>> args = {arg, read};
|
||||
auto add = make_shared<op::Add>(arg, read);
|
||||
auto assign = make_shared<op::Assign>(add, "v0");
|
||||
auto add2 = make_shared<op::Add>(add, read);
|
||||
auto res = make_shared<op::Result>(add2);
|
||||
|
||||
auto f = make_shared<Function>(ResultVector({res}), ParameterVector({arg}), SinkVector({assign}));
|
||||
```
|
||||
|
||||
In this example, `SinkVector` is used to create `ngraph::Function`. For network with states, except inputs and outputs, `Assign` nodes should also point to `Function`
|
||||
to avoid deleting it during graph transformations. You can do it with the constructor, as shown in the example, or with the special method `add_sinks(const SinkVector& sinks)`. Also you can delete
|
||||
sink from `ngraph::Function` after deleting the node from graph with the `delete_sink()` method.
|
||||
|
||||
## OpenVINO state API
|
||||
|
||||
Inference Engine has the `InferRequest::QueryState` method to get the list of states from a network and `IVariableState` interface to operate with states. Below you can find brief description of methods and the workable example of how to use this interface.
|
||||
is below and next section contains small workable example how this interface can be used.
|
||||
|
||||
* `std::string GetName() const`
|
||||
returns name(variable_id) of according Variable
|
||||
* `void Reset()`
|
||||
reset state to default value
|
||||
* `void SetState(Blob::Ptr newState)`
|
||||
set new value for state
|
||||
* `Blob::CPtr GetState() const`
|
||||
returns current value of state
|
||||
|
||||
## Example of stateful network inference
|
||||
|
||||
Let's take an IR from the previous section example. The example below demonstrates inference of two independent sequences of data. State should be reset between these sequences.
|
||||
|
||||
One infer request and one thread
|
||||
will be used in this example. Using several threads is possible if you have several independent sequences. Then each sequence can be processed in its own infer
|
||||
request. Inference of one sequence in several infer requests is not recommended. In one infer request state will be saved automatically between inferences, but
|
||||
if the first step is done in one infer request and the second in another, state should be set in new infer request manually (using `IVariableState::SetState` method).
|
||||
|
||||
@snippet openvino/docs/snippets/InferenceEngine_network_with_state_infer.cpp part1
|
||||
|
||||
You can find more powerful examples demonstrating how to work with networks with states in speech sample and demo.
|
||||
Decsriptions can be found in [Samples Overview](./Samples_Overview.md)
|
||||
|
||||
[state_network_example]: ./img/state_network_example.png
|
||||
|
||||
|
||||
## LowLatency transformation
|
||||
|
||||
If the original framework does not have a special API for working with states, after importing the model, OpenVINO representation will not contain Assign/ReadValue layers. For example, if the original ONNX model contains RNN operations, IR will contain TensorIterator operations and the values will be obtained only after the execution of whole TensorIterator primitive, intermediate values from each iteration will not be available. To be able to work with these intermediate values of each iteration and receive them with a low latency after each infer request, a special LowLatency transformation was introduced.
|
||||
|
||||
LowLatency transformation changes the structure of the network containing [TensorIterator](../ops/infrastructure/TensorIterator_1.md) by adding the ability to work with state, inserting Assign/ReadValue layers as it is shown in the picture below.
|
||||
|
||||

|
||||
|
||||
### Steps to apply LowLatency transformation
|
||||
|
||||
1. Get CNNNetwork. Any way is acceptable:
|
||||
|
||||
* [from IR or ONNX model](Integrate_with_customer_application_new_API.md#integration-steps)
|
||||
* [from nGraph Function](../nGraph_DG/build_function.md)
|
||||
|
||||
2. [Reshape](ShapeInference) CNNNetwork network if necessary
|
||||
**Necessary case:** the sequence_lengths dimention of input > 1, it means TensorIterator layer will have number_iterations > 1. We should reshape the inputs of the network to set sequence_dimension exactly to 1.
|
||||
```cpp
|
||||
|
||||
// Network before reshape: Parameter (name: X, shape: [2 (sequence_lengths), 1, 16]) -> TensorIterator (num_iteration = 2, axis = 0) -> ...
|
||||
|
||||
cnnNetwork.reshape({"X" : {1, 1, 16});
|
||||
|
||||
// Network after reshape: Parameter (name: X, shape: [1 (sequence_lengths), 1, 16]) -> TensorIterator (num_iteration = 1, axis = 0) -> ...
|
||||
|
||||
```
|
||||
|
||||
3. Apply LowLatency transformation
|
||||
```cpp
|
||||
#include "ie_transformations.hpp"
|
||||
|
||||
...
|
||||
|
||||
InferenceEngine::LowLatency(cnnNetwork);
|
||||
```
|
||||
**State naming rule:** a name of state is a concatenation of names: original TensorIterator operation, Parameter of the body, and additional suffix "variable_" + id (0-base indexing, new indexing for each TensorIterator), for example:
|
||||
```
|
||||
tensor_iterator_name = "TI_name"
|
||||
body_parameter_name = "param_name"
|
||||
|
||||
state_name = "TI_name/param_name/variable_0"
|
||||
```
|
||||
4. [Use state API](#openvino-state-api)
|
||||
|
||||
|
||||
### Known limitations
|
||||
1. Parameters are directly connected to States (ReadValues).
|
||||
|
||||
Removing Parameters from `ngraph::Function` is not possible.
|
||||
|
||||

|
||||
|
||||
**Current solution:** replace Parameter with Constant (freeze) with the value [0, 0, 0 … 0] via [ModelOptimizer CLI](../MO_DG/prepare_model/convert_model/Converting_Model_General.md) `--input` or `--freeze_placeholder_with_value`.
|
||||
|
||||
2. Non-reshapable network.
|
||||
|
||||
Value of shapes is hard-coded somewhere in the network.
|
||||
|
||||

|
||||
|
||||
**Current solution:** trim non-reshapable layers via [ModelOptimizer CLI](../MO_DG/prepare_model/convert_model/Converting_Model_General.md) `--input`, `--output` or via nGraph.
|
||||
|
||||
```cpp
|
||||
// nGraph example:
|
||||
auto func = cnnNetwork.getFunction();
|
||||
auto new_const = std::make_shared<ngraph::opset5::Constant>(); // type, shape, value
|
||||
for (const auto& node : func->get_ops()) {
|
||||
if (node->get_friendly_name() == "name_of_non_reshapable_const") {
|
||||
auto bad_const = std::dynamic_pointer_cast<ngraph::opset5::Constant>(node);
|
||||
ngraph::replace_node(bad_const, new_const); // replace constant
|
||||
}
|
||||
}
|
||||
```
|
||||
@@ -33,7 +33,7 @@ a temporary memory block for model decryption, and use
|
||||
For more information, see the `InferenceEngine::Core` Class
|
||||
Reference Documentation.
|
||||
|
||||
@snippet openvino/docs/snippets/protecting_model_guide.cpp part0
|
||||
@snippet snippets/protecting_model_guide.cpp part0
|
||||
|
||||
Hardware-based protection, such as Intel® Software Guard Extensions
|
||||
(Intel® SGX), can be utilized to protect decryption operation secrets and
|
||||
@@ -47,7 +47,7 @@ Currently there are no possibility to read external weights from memory for ONNX
|
||||
The `ReadNetwork(const std::string& model, const Blob::CPtr& weights)` function
|
||||
should be called with `weights` passed as an empty `Blob`.
|
||||
|
||||
@snippet openvino/docs/snippets/protecting_model_guide.cpp part1
|
||||
@snippet snippets/protecting_model_guide.cpp part1
|
||||
|
||||
[deploy_encrypted_model]: img/deploy_encrypted_model.png
|
||||
|
||||
@@ -57,7 +57,7 @@ should be called with `weights` passed as an empty `Blob`.
|
||||
- OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org)
|
||||
- Model Optimizer Developer Guide: [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
|
||||
- Inference Engine Developer Guide: [Inference Engine Developer Guide](Deep_Learning_Inference_Engine_DevGuide.md)
|
||||
- For more information on Sample Applications, see the [Inference Engine Samples Overview](Samples_Overview.html)
|
||||
- For more information on Sample Applications, see the [Inference Engine Samples Overview](Samples_Overview.md)
|
||||
- For information on a set of pre-trained models, see the [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index)
|
||||
- For information on Inference Engine Tutorials, see the [Inference Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic)
|
||||
- For IoT Libraries and Code Samples see the [Intel® IoT Developer Kit](https://github.com/intel-iot-devkit).
|
||||
|
||||
@@ -1,9 +1,30 @@
|
||||
GPU Plugin {#openvino_docs_IE_DG_supported_plugins_CL_DNN}
|
||||
=======
|
||||
|
||||
The GPU plugin uses the Intel® Compute Library for Deep Neural Networks ([clDNN](https://01.org/cldnn)) to infer deep neural networks.
|
||||
clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel® Processor Graphics including Intel® HD Graphics and Intel® Iris® Graphics.
|
||||
For an in-depth description of clDNN, see: [clDNN sources](https://github.com/intel/clDNN) and [Accelerate Deep Learning Inference with Intel® Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
|
||||
The GPU plugin uses the Intel® Compute Library for Deep Neural Networks (clDNN) to infer deep neural networks.
|
||||
clDNN is an open source performance library for Deep Learning (DL) applications intended for acceleration of Deep Learning Inference on Intel® Processor Graphics including Intel® HD Graphics, Intel® Iris® Graphics, Intel® Iris® Xe Graphics, and Intel® Iris® Xe MAX graphics.
|
||||
For an in-depth description of clDNN, see [Inference Engine source files](https://github.com/openvinotoolkit/openvino/tree/master/inference-engine/src/cldnn_engine) and [Accelerate Deep Learning Inference with Intel® Processor Graphics](https://software.intel.com/en-us/articles/accelerating-deep-learning-inference-with-intel-processor-graphics).
|
||||
|
||||
## Device Naming Convention
|
||||
* Devices are enumerated as "GPU.X" where `X={0, 1, 2,...}`. Only Intel® GPU devices are considered.
|
||||
* If the system has an integrated GPU, it always has id=0 ("GPU.0").
|
||||
* Other GPUs have undefined order that depends on the GPU driver.
|
||||
* "GPU" is an alias for "GPU.0"
|
||||
* If the system doesn't have an integrated GPU, then devices are enumerated starting from 0.
|
||||
|
||||
For demonstration purposes, see the [Hello Query Device C++ Sample](../../../inference-engine/samples/hello_query_device/README.md) that can print out the list of available devices with associated indices. Below is an example output (truncated to the device names only):
|
||||
|
||||
```sh
|
||||
./hello_query_device
|
||||
Available devices:
|
||||
Device: CPU
|
||||
...
|
||||
Device: GPU.0
|
||||
...
|
||||
Device: GPU.1
|
||||
...
|
||||
Device: HDDL
|
||||
```
|
||||
|
||||
## Optimizations
|
||||
|
||||
@@ -92,7 +113,7 @@ When specifying key values as raw strings (that is, when using Python API), omit
|
||||
| `KEY_CLDNN_PLUGIN_THROTTLE` | `<0-3>` | `0` | OpenCL queue throttling (before usage, make sure your OpenCL driver supports appropriate extension)<br> Lower value means lower driver thread priority and longer sleep time for it. 0 disables the setting. |
|
||||
| `KEY_CLDNN_GRAPH_DUMPS_DIR` | `"<dump_dir>"` | `""` | clDNN graph optimizer stages dump output directory (in GraphViz format) |
|
||||
| `KEY_CLDNN_SOURCES_DUMPS_DIR` | `"<dump_dir>"` | `""` | Final optimized clDNN OpenCL sources dump output directory |
|
||||
| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).<br>This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low. <br>The default value is 1, which implies latency-oriented behaviour.<br>`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams). <br> A positive integer value creates the requested number of streams. |
|
||||
| `KEY_GPU_THROUGHPUT_STREAMS` | `KEY_GPU_THROUGHPUT_AUTO`, or positive integer| 1 | Specifies a number of GPU "execution" streams for the throughput mode (upper bound for a number of inference requests that can be executed simultaneously).<br>This option is can be used to decrease GPU stall time by providing more effective load from several streams. Increasing the number of streams usually is more effective for smaller topologies or smaller input sizes. Note that your application should provide enough parallel slack (e.g. running many inference requests) to leverage full GPU bandwidth. Additional streams consume several times more GPU memory, so make sure the system has enough memory available to suit parallel stream execution. Multiple streams might also put additional load on CPU. If CPU load increases, it can be regulated by setting an appropriate `KEY_CLDNN_PLUGIN_THROTTLE` option value (see above). If your target system has relatively weak CPU, keep throttling low. <br>The default value is 1, which implies latency-oriented behavior.<br>`KEY_GPU_THROUGHPUT_AUTO` creates bare minimum of streams to improve the performance; this is the most portable option if you are not sure how many resources your target machine has (and what would be the optimal number of streams). <br> A positive integer value creates the requested number of streams. |
|
||||
| `KEY_EXCLUSIVE_ASYNC_REQUESTS` | `YES` / `NO` | `NO` | Forces async requests (also from different executable networks) to execute serially.|
|
||||
|
||||
## Note on Debug Capabilities of the GPU Plugin
|
||||
|
||||
@@ -2,98 +2,98 @@
|
||||
|
||||
## Introducing the GNA Plugin
|
||||
|
||||
Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.
|
||||
Intel® Gaussian & Neural Accelerator is a low-power neural coprocessor for continuous inference at the edge.
|
||||
|
||||
Intel® GNA is not intended to replace classic inference devices such as
|
||||
CPU, graphics processing unit (GPU), or vision processing unit (VPU) . It is designed for offloading
|
||||
Intel® GNA is not intended to replace classic inference devices such as
|
||||
CPU, graphics processing unit (GPU), or vision processing unit (VPU). It is designed for offloading
|
||||
continuous inference workloads including but not limited to noise reduction or speech recognition
|
||||
to save power and free CPU resources.
|
||||
|
||||
The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU.
|
||||
The GNA plugin provides a way to run inference on Intel® GNA, as well as in the software execution mode on CPU.
|
||||
|
||||
## Devices with Intel® GNA
|
||||
## Devices with Intel® GNA
|
||||
|
||||
Devices with Intel® GNA support:
|
||||
Devices with Intel® GNA support:
|
||||
|
||||
* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html)
|
||||
* [Intel® Speech Enabling Developer Kit](https://www.intel.com/content/www/us/en/support/articles/000026156/boards-and-kits/smart-home.html)
|
||||
|
||||
* [Amazon Alexa* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice)
|
||||
* [Amazon Alexa\* Premium Far-Field Developer Kit](https://developer.amazon.com/en-US/alexa/alexa-voice-service/dev-kits/amazon-premium-voice)
|
||||
|
||||
* [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html):
|
||||
- Intel® Pentium® Silver J5005 Processor
|
||||
- Intel® Pentium® Silver N5000 Processor
|
||||
- Intel® Celeron® J4005 Processor
|
||||
- Intel® Celeron® J4105 Processor
|
||||
- Intel® Celeron® Processor N4100
|
||||
- Intel® Celeron® Processor N4000
|
||||
* [Intel® Pentium® Silver Processors N5xxx, J5xxx and Intel® Celeron® Processors N4xxx, J4xxx](https://ark.intel.com/content/www/us/en/ark/products/codename/83915/gemini-lake.html):
|
||||
- Intel® Pentium® Silver J5005 Processor
|
||||
- Intel® Pentium® Silver N5000 Processor
|
||||
- Intel® Celeron® J4005 Processor
|
||||
- Intel® Celeron® J4105 Processor
|
||||
- Intel® Celeron® Processor N4100
|
||||
- Intel® Celeron® Processor N4000
|
||||
|
||||
* [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html):
|
||||
Intel® Core™ i3-8121U Processor
|
||||
* [Intel® Core™ Processors (formerly codenamed Cannon Lake)](https://ark.intel.com/content/www/us/en/ark/products/136863/intel-core-i3-8121u-processor-4m-cache-up-to-3-20-ghz.html):
|
||||
Intel® Core™ i3-8121U Processor
|
||||
|
||||
* [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html):
|
||||
- Intel® Core™ i7-1065G7 Processor
|
||||
- Intel® Core™ i7-1060G7 Processor
|
||||
- Intel® Core™ i5-1035G4 Processor
|
||||
- Intel® Core™ i5-1035G7 Processor
|
||||
- Intel® Core™ i5-1035G1 Processor
|
||||
- Intel® Core™ i5-1030G7 Processor
|
||||
- Intel® Core™ i5-1030G4 Processor
|
||||
- Intel® Core™ i3-1005G1 Processor
|
||||
- Intel® Core™ i3-1000G1 Processor
|
||||
- Intel® Core™ i3-1000G4 Processor
|
||||
* [10th Generation Intel® Core™ Processors (formerly codenamed Ice Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/74979/ice-lake.html):
|
||||
- Intel® Core™ i7-1065G7 Processor
|
||||
- Intel® Core™ i7-1060G7 Processor
|
||||
- Intel® Core™ i5-1035G4 Processor
|
||||
- Intel® Core™ i5-1035G7 Processor
|
||||
- Intel® Core™ i5-1035G1 Processor
|
||||
- Intel® Core™ i5-1030G7 Processor
|
||||
- Intel® Core™ i5-1030G4 Processor
|
||||
- Intel® Core™ i3-1005G1 Processor
|
||||
- Intel® Core™ i3-1000G1 Processor
|
||||
- Intel® Core™ i3-1000G4 Processor
|
||||
|
||||
* All [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html).
|
||||
* All [11th Generation Intel® Core™ Processors (formerly codenamed Tiger Lake)](https://ark.intel.com/content/www/us/en/ark/products/codename/88759/tiger-lake.html).
|
||||
|
||||
> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.
|
||||
> **NOTE**: On platforms where Intel® GNA is not enabled in the BIOS, the driver cannot be installed, so the GNA plugin uses the software emulation mode only.
|
||||
|
||||
## Drivers and Dependencies
|
||||
|
||||
Intel® GNA hardware requires a driver to be installed on the system.
|
||||
Intel® GNA hardware requires a driver to be installed on the system.
|
||||
|
||||
* Linux\* OS:
|
||||
[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/)
|
||||
[Download Intel® GNA driver for Ubuntu Linux 18.04.3 LTS (with HWE Kernel version 5.0+)](https://download.01.org/opencv/drivers/gna/)
|
||||
|
||||
* Windows\* OS:
|
||||
Intel® GNA driver for Windows is available through Windows Update\*
|
||||
Intel® GNA driver for Windows is available through Windows Update\*
|
||||
|
||||
## Models and Layers Limitations
|
||||
|
||||
Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations.
|
||||
For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted for the GNA Plugin, because the plugin does not fully support
|
||||
2D convolutions.
|
||||
Because of specifics of hardware architecture, Intel® GNA supports a limited set of layers, their kinds and combinations.
|
||||
For example, you should not expect the GNA Plugin to be able to run computer vision models, except those specifically adapted
|
||||
for the GNA Plugin, because the plugin does not fully support 2D convolutions.
|
||||
|
||||
For the list of supported layers, see the **GNA** column of the **Supported Layers** section in [Supported Devices](Supported_Devices.md).
|
||||
|
||||
The list of supported layers can be found
|
||||
[here](Supported_Devices.md) (see the GNA column of Supported Layers section).
|
||||
Limitations include:
|
||||
|
||||
- Only 1D convolutions are natively supported in the models converted from:
|
||||
- [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework;
|
||||
- [TensorFlow](../../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) framework; note that for TensorFlow models, the option `--disable_nhwc_to_nchw` must be used when running the Model Optimizer.
|
||||
- The number of output channels for convolutions must be a multiple of 4
|
||||
- Permute layer support is limited to the cases where no data reordering is needed, or when reordering is happening for 2 dimensions, at least one of which is not greater than 8
|
||||
- [Kaldi](../../MO_DG/prepare_model/convert_model/Convert_Model_From_Kaldi.md) framework
|
||||
- [TensorFlow](../../MO_DG/prepare_model/convert_model/Convert_Model_From_TensorFlow.md) framework. For TensorFlow models, use the `--disable_nhwc_to_nchw` option when running the Model Optimizer.
|
||||
- The number of output channels for convolutions must be a multiple of 4.
|
||||
- Permute layer support is limited to the cases where no data reordering is needed or when reordering is happening for two dimensions, at least one of which is not greater than 8.
|
||||
|
||||
#### Experimental Support for 2D Convolutions
|
||||
|
||||
The Intel® GNA hardware natively supports only 1D convolution.
|
||||
The Intel® GNA hardware natively supports only 1D convolution.
|
||||
|
||||
However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. Such a transformation is performed by the GNA Plugin for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts a `NHWC` input and produces `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, it may be necessary to insert `Permute` layers before or after convolutions.
|
||||
However, 2D convolutions can be mapped to 1D when a convolution kernel moves in a single direction. GNA Plugin performs such a transformation for Kaldi `nnet1` convolution. From this perspective, the Intel® GNA hardware convolution operation accepts an `NHWC` input and produces an `NHWC` output. Because OpenVINO™ only supports the `NCHW` layout, you may need to insert `Permute` layers before or after convolutions.
|
||||
|
||||
For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result.
|
||||
For example, the Kaldi model optimizer inserts such a permute after convolution for the [rm_cnn4a network](https://download.01.org/openvinotoolkit/models_contrib/speech/kaldi/rm_cnn4a_smbr/). This `Permute` layer is automatically removed by the GNA Plugin, because the Intel® GNA hardware convolution layer already produces the required `NHWC` result.
|
||||
|
||||
## Operation Precision
|
||||
|
||||
Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations, so compared to 32-bit floating point (`FP32`) results – for example, calculated on CPU using Inference Engine [CPU Plugin](CPU.md) – outputs calculated using reduced integer precision are different from the scores calculated using floating point.
|
||||
Intel® GNA essentially operates in the low-precision mode, which represents a mix of 8-bit (`I8`), 16-bit (`I16`), and 32-bit (`I32`) integer computations. Outputs calculated using a reduced integer precision are different from the scores calculated using the floating point format, for example, `FP32` outputs calculated on CPU using the Inference Engine [CPU Plugin](CPU.md).
|
||||
|
||||
Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so a model can run without calibration.
|
||||
Unlike other plugins supporting low-precision execution, the GNA plugin calculates quantization factors at the model loading time, so you can run a model without calibration.
|
||||
|
||||
## <a name="execution-models">Execution Modes</a>
|
||||
## <a name="execution-modes">Execution Modes</a>
|
||||
|
||||
| Mode | Description |
|
||||
| :---------------------------------| :---------------------------------------------------------|
|
||||
| `GNA_AUTO` | Uses Intel® GNA if available, otherwise uses software execution mode on CPU. |
|
||||
| `GNA_HW` | Uses Intel® GNA if available, otherwise raises an error. |
|
||||
| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode. |
|
||||
| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode. |
|
||||
| `GNA_AUTO` | Uses Intel® GNA if available, otherwise uses software execution mode on CPU. |
|
||||
| `GNA_HW` | Uses Intel® GNA if available, otherwise raises an error. |
|
||||
| `GNA_SW` | *Deprecated*. Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA, but not in the bit-exact mode. |
|
||||
| `GNA_SW_EXACT` | Executes the GNA-compiled graph on CPU performing calculations in the same precision as the Intel® GNA in the bit-exact mode. |
|
||||
| `GNA_SW_FP32` | Executes the GNA-compiled graph on CPU but substitutes parameters and calculations from low precision to floating point (`FP32`). |
|
||||
|
||||
## Supported Configuration Parameters
|
||||
@@ -101,42 +101,42 @@ Unlike other plugins supporting low-precision execution, the GNA plugin calculat
|
||||
The plugin supports the configuration parameters listed below.
|
||||
The parameters are passed as `std::map<std::string, std::string>` on `InferenceEngine::Core::LoadNetwork` or `InferenceEngine::SetConfig`.
|
||||
|
||||
The parameter `KEY_GNA_DEVICE_MODE` can also be changed at run time using `InferenceEngine::ExecutableNetwork::SetConfig` (for any values excluding `GNA_SW_FP32`). This allows switching the
|
||||
You can change the `KEY_GNA_DEVICE_MODE` parameter at run time using `InferenceEngine::ExecutableNetwork::SetConfig`, which works for any value excluding `GNA_SW_FP32`. This enables you to switch the
|
||||
execution between software emulation mode and hardware emulation mode after the model is loaded.
|
||||
|
||||
The parameter names below correspond to their usage through API keys, such as `GNAConfigParams::KEY_GNA_DEVICE_MODE` or `PluginConfigParams::KEY_PERF_COUNT`.
|
||||
When specifying key values as raw strings (that is, when using Python API), omit the `KEY_` prefix.
|
||||
When specifying key values as raw strings, that is, when using Python API, omit the `KEY_` prefix.
|
||||
|
||||
| Parameter Name | Parameter Values | Default Value | Description |
|
||||
| :---------------------------------| :---------------------------------------------------------| :-----------| :------------------------------------------------------------------------|
|
||||
| `KEY_GNA_COMPACT_MODE` | `YES`/`NO` | `NO` | Reuse I/O buffers to save space (makes debugging harder) |
|
||||
| `KEY_GNA_SCALE_FACTOR` | `FP32` number | 1.0 | Scale factor to use for input quantization |
|
||||
| `KEY_GNA_DEVICE_MODE` | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO` | One of the modes described <a name="execution-models">Execution Models</a> |
|
||||
| `KEY_GNA_FIRMWARE_MODEL_IMAGE` | `std::string` | `""` | Name for embedded model binary dump file |
|
||||
| `KEY_GNA_PRECISION` | `I16`/`I8` | `I16` | Hint to GNA plugin: preferred integer weight resolution for quantization |
|
||||
| `KEY_PERF_COUNT` | `YES`/`NO` | `NO` | Turn on performance counters reporting |
|
||||
| `KEY_GNA_LIB_N_THREADS` | 1-127 integer number | 1 | Sets the number of GNA accelerator library worker threads used for inference computation in software modes
|
||||
| `KEY_GNA_COMPACT_MODE` | `YES`/`NO` | `YES` | Enables I/O buffers reuse to save space. Makes debugging harder. |
|
||||
| `KEY_GNA_SCALE_FACTOR` | `FP32` number | 1.0 | Sets the scale factor to use for input quantization. |
|
||||
| `KEY_GNA_DEVICE_MODE` | `GNA_AUTO`/`GNA_HW`/`GNA_SW_EXACT`/`GNA_SW_FP32` | `GNA_AUTO` | One of the modes described in <a href="#execution-modes">Execution Modes</a> |
|
||||
| `KEY_GNA_FIRMWARE_MODEL_IMAGE` | `std::string` | `""` | Sets the name for the embedded model binary dump file. |
|
||||
| `KEY_GNA_PRECISION` | `I16`/`I8` | `I16` | Sets the preferred integer weight resolution for quantization. |
|
||||
| `KEY_PERF_COUNT` | `YES`/`NO` | `NO` | Turns on performance counters reporting. |
|
||||
| `KEY_GNA_LIB_N_THREADS` | 1-127 integer number | 1 | Sets the number of GNA accelerator library worker threads used for inference computation in software modes.
|
||||
|
||||
## How to Interpret Performance Counters
|
||||
|
||||
As a result of collecting performance counters using `InferenceEngine::InferRequest::GetPerformanceCounts`, you can find various performance data about execution on GNA.
|
||||
Returned map stores a counter description as a key, counter value is stored in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. API allows to retrieve counter units in cycles, but they can be converted to seconds as follows:
|
||||
Returned map stores a counter description as a key, and a counter value in the `realTime_uSec` field of the `InferenceEngineProfileInfo` structure. Current GNA implementation calculates counters for the whole utterance scoring and does not provide per-layer information. The API enables you to retrieve counter units in cycles, you can convert cycles to seconds as follows:
|
||||
|
||||
```
|
||||
seconds = cycles / frequency
|
||||
```
|
||||
|
||||
Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor.
|
||||
Processor | Frequency of Intel® GNA
|
||||
Refer to the table below to learn about the frequency of Intel® GNA inside a particular processor.
|
||||
Processor | Frequency of Intel® GNA
|
||||
---|---
|
||||
Intel® Ice Lake processors| 400MHz
|
||||
Intel® Core™ i3-8121U processor| 400MHz
|
||||
Intel® Gemini Lake processors | 200MHz
|
||||
Intel® Ice Lake processors| 400MHz
|
||||
Intel® Core™ i3-8121U processor| 400MHz
|
||||
Intel® Gemini Lake processors | 200MHz
|
||||
|
||||
Performance counters provided for the time being:
|
||||
|
||||
* Scoring request performance results
|
||||
* Number of total cycles spent on scoring in hardware (including compute and memory stall cycles)
|
||||
* Number of total cycles spent on scoring in hardware including compute and memory stall cycles
|
||||
* Number of stall cycles spent in hardware
|
||||
|
||||
## Multithreading Support in GNA Plugin
|
||||
@@ -151,40 +151,40 @@ The GNA plugin supports the following configuration parameters for multithreadin
|
||||
|
||||
## Network Batch Size
|
||||
|
||||
Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one
|
||||
Intel® GNA plugin supports the processing of context-windowed speech frames in batches of 1-8 frames in one
|
||||
input blob using `InferenceEngine::ICNNNetwork::setBatchSize`. Increasing batch size only improves efficiency of `Fully Connected` layers.
|
||||
|
||||
> **NOTE**: For networks with `Convolutional`, `LSTM`, or `Memory` layers, the only supported batch size is 1.
|
||||
|
||||
## Compatibility with Heterogeneous Plugin
|
||||
|
||||
Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin (for example, Softmax), use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration. For the list of supported networks, see the [Supported Frameworks](#supported-frameworks).
|
||||
Heterogeneous plugin was tested with the Intel® GNA as a primary device and CPU as a secondary device. To run inference of networks with layers unsupported by the GNA plugin, such as Softmax, use the Heterogeneous plugin with the `HETERO:GNA,CPU` configuration.
|
||||
|
||||
> **NOTE:** Due to limitation of the Intel® GNA backend library, heterogeneous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices.
|
||||
> **NOTE:** Due to limitation of the Intel® GNA backend library, heterogenous support is limited to cases where in the resulted sliced graph, only one subgraph is scheduled to run on GNA\_HW or GNA\_SW devices.
|
||||
|
||||
## Recovery from interruption by high-priority Windows audio processes\*
|
||||
## Recovery from Interruption by High-Priority Windows Audio Processes\*
|
||||
|
||||
As noted in the introduction, GNA is designed for real-time workloads such as noise reduction.
|
||||
GNA is designed for real-time workloads such as noise reduction.
|
||||
For such workloads, processing should be time constrained, otherwise extra delays may cause undesired effects such as
|
||||
audio "glitches". To make sure that processing can satisfy real time requirements, the GNA driver provides a QoS
|
||||
(Quality of Service) mechanism which interrupts requests that might cause high-priority Windows audio processes to miss
|
||||
schedule, thereby causing long running GNA tasks to terminate early.
|
||||
*audio glitches*. To make sure that processing can satisfy real-time requirements, the GNA driver provides a Quality of Service
|
||||
(QoS) mechanism, which interrupts requests that might cause high-priority Windows audio processes to miss
|
||||
the schedule, thereby causing long running GNA tasks to terminate early.
|
||||
|
||||
Applications should be prepared for this situation.
|
||||
If an inference (in `GNA_HW` mode) cannot be executed because of such an interruption, then `InferRequest::Wait()` will return status code
|
||||
`StatusCode::INFER_NOT_STARTED` (note that it will be changed to a more meaningful status code in future releases).
|
||||
If an inference in the `GNA_HW` mode cannot be executed because of such an interruption, then `InferRequest::Wait()` returns status code
|
||||
`StatusCode::INFER_NOT_STARTED`. In future releases, it will be changed to a more meaningful status code.
|
||||
|
||||
Any application working with GNA must properly react if it receives this code. Various strategies are possible.
|
||||
One of the options is to immediately switch to GNA SW emulation mode:
|
||||
Any application working with GNA must properly react to this code.
|
||||
One of the strategies to adapt an application:
|
||||
|
||||
1. Immediately switch to the GNA_SW emulation mode:
|
||||
```cpp
|
||||
std::map<std::string, Parameter> newConfig;
|
||||
newConfig[GNAConfigParams::KEY_GNA_DEVICE_MODE] = Parameter("GNA_SW_EXACT");
|
||||
executableNet.SetConfig(newConfig);
|
||||
|
||||
```
|
||||
|
||||
then resubmit and switch back to GNA_HW after some time hoping that the competing application has finished.
|
||||
2. Resubmit and switch back to GNA_HW expecting that the competing application has finished.
|
||||
|
||||
## See Also
|
||||
|
||||
|
||||
@@ -102,15 +102,15 @@ Refer to the sections below to see pseudo-code of usage examples.
|
||||
|
||||
This example uses the OpenCL context obtained from an executable network object.
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_RemoteBlob_API0.cpp part0
|
||||
@snippet snippets/GPU_RemoteBlob_API0.cpp part0
|
||||
|
||||
### Running GPU Plugin Inference within User-Supplied Shared Context
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_RemoteBlob_API1.cpp part1
|
||||
@snippet snippets/GPU_RemoteBlob_API1.cpp part1
|
||||
|
||||
### Direct Consuming of the NV12 VAAPI Video Decoder Surface on Linux
|
||||
|
||||
@snippet openvino/docs/snippets/GPU_RemoteBlob_API2.cpp part2
|
||||
@snippet snippets/GPU_RemoteBlob_API2.cpp part2
|
||||
|
||||
## See Also
|
||||
|
||||
|
||||
@@ -28,17 +28,17 @@ Default fallback policy decides which layer goes to which device automatically a
|
||||
|
||||
Another way to annotate a network is to set affinity manually using <code>ngraph::Node::get_rt_info</code> with key `"affinity"`:
|
||||
|
||||
@snippet openvino/docs/snippets/HETERO0.cpp part0
|
||||
@snippet snippets/HETERO0.cpp part0
|
||||
|
||||
The fallback policy does not work if even one layer has an initialized affinity. The sequence should be calling of automating affinity settings and then fix manually.
|
||||
|
||||
> **NOTE**: If you set affinity manually, be careful at the current moment Inference Engine plugins don't support constant (`Constant`->`Result`) and empty (`Parameter`->`Result`) networks. Please avoid such subgraphs when you set affinity manually.
|
||||
|
||||
@snippet openvino/docs/snippets/HETERO1.cpp part1
|
||||
@snippet snippets/HETERO1.cpp part1
|
||||
|
||||
If you rely on the default affinity distribution, you can avoid calling <code>InferenceEngine::Core::QueryNetwork</code> and just call <code>InferenceEngine::Core::LoadNetwork</code> instead:
|
||||
|
||||
@snippet openvino/docs/snippets/HETERO2.cpp part2
|
||||
@snippet snippets/HETERO2.cpp part2
|
||||
|
||||
> **NOTE**: `InferenceEngine::Core::QueryNetwork` does not depend on affinities set by a user, but queries for layer support based on device capabilities.
|
||||
|
||||
@@ -74,7 +74,7 @@ Heterogeneous plugin can generate two files:
|
||||
* `hetero_affinity_<network name>.dot` - annotation of affinities per layer. This file is written to the disk only if default fallback policy was executed
|
||||
* `hetero_subgraphs_<network name>.dot` - annotation of affinities per graph. This file is written to the disk during execution of <code>ICNNNetwork::LoadNetwork()</code> for heterogeneous plugin
|
||||
|
||||
@snippet openvino/docs/snippets/HETERO3.cpp part3
|
||||
@snippet snippets/HETERO3.cpp part3
|
||||
|
||||
You can use GraphViz* utility or converters to `.png` formats. On Ubuntu* operating system, you can use the following utilities:
|
||||
* `sudo apt-get install xdot`
|
||||
|
||||
@@ -32,11 +32,11 @@ You can use name of the configuration directly as a string, or use MultiDeviceCo
|
||||
|
||||
Basically, there are three ways to specify the devices to be use by the "MULTI":
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI0.cpp part0
|
||||
@snippet snippets/MULTI0.cpp part0
|
||||
|
||||
Notice that the priorities of the devices can be changed in real-time for the executable network:
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI1.cpp part1
|
||||
@snippet snippets/MULTI1.cpp part1
|
||||
|
||||
Finally, there is a way to specify number of requests that the multi-device will internally keep for each device.
|
||||
Say if your original app was running 4 cameras with 4 inference requests now you would probably want to share these 4 requests between 2 devices used in the MULTI. The easiest way is to specify a number of requests for each device using parentheses: "MULTI:CPU(2),GPU(2)" and use the same 4 requests in your app. However, such an explicit configuration is not performance portable and hence not recommended. Instead, the better way is to configure the individual devices and query the resulting number of requests to be used in the application level (see [Configuring the Individual Devices and Creating the Multi-Device On Top](#configuring-the-individual-devices-and-creating-the-multi-device-on-top)).
|
||||
@@ -47,15 +47,17 @@ Inference Engine now features a dedicated API to enumerate devices and their cap
|
||||
```sh
|
||||
./hello_query_device
|
||||
Available devices:
|
||||
Device: CPU
|
||||
Device: CPU
|
||||
...
|
||||
Device: GPU
|
||||
Device: GPU.0
|
||||
...
|
||||
Device: HDDL
|
||||
Device: GPU.1
|
||||
...
|
||||
Device: HDDL
|
||||
```
|
||||
Simple programmatic way to enumerate the devices and use with the multi-device is as follows:
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI2.cpp part2
|
||||
@snippet snippets/MULTI2.cpp part2
|
||||
|
||||
Beyond trivial "CPU", "GPU", "HDDL" and so on, when multiple instances of a device are available the names are more qualified.
|
||||
For example this is how two Intel® Movidius™ Myriad™ X sticks are listed with the hello_query_sample:
|
||||
@@ -68,13 +70,13 @@ For example this is how two Intel® Movidius™ Myriad™ X sticks are listed wi
|
||||
So the explicit configuration to use both would be "MULTI:MYRIAD.1.2-ma2480,MYRIAD.1.4-ma2480".
|
||||
Accordingly, the code that loops over all available devices of "MYRIAD" type only is below:
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI3.cpp part3
|
||||
@snippet snippets/MULTI3.cpp part3
|
||||
|
||||
|
||||
## Configuring the Individual Devices and Creating the Multi-Device On Top
|
||||
As discussed in the first section, you shall configure each individual device as usual and then just create the "MULTI" device on top:
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI4.cpp part4
|
||||
@snippet snippets/MULTI4.cpp part4
|
||||
|
||||
Alternatively, you can combine all the individual device settings into single config and load that, allowing the multi-device plugin to parse and apply that to the right devices. See code example in the next section.
|
||||
|
||||
@@ -84,7 +86,7 @@ See section of the [Using the multi-device with OpenVINO samples and benchmarkin
|
||||
## Querying the Optimal Number of Inference Requests
|
||||
Notice that until R2 you had to calculate number of requests in your application for any device, e.g. you had to know that Intel® Vision Accelerator Design with Intel® Movidius™ VPUs required at least 32 inference requests to perform well. Now you can use the new GetMetric API to query the optimal number of requests. Similarly, when using the multi-device you don't need to sum over included devices yourself, you can query metric directly:
|
||||
|
||||
@snippet openvino/docs/snippets/MULTI5.cpp part5
|
||||
@snippet snippets/MULTI5.cpp part5
|
||||
|
||||
## Using the Multi-Device with OpenVINO Samples and Benchmarking the Performance
|
||||
Notice that every OpenVINO sample that supports "-d" (which stays for "device") command-line option transparently accepts the multi-device.
|
||||
|
||||
@@ -71,6 +71,8 @@ In addition to common parameters, the MYRIAD plugin accepts the following option
|
||||
| `KEY_VPU_MYRIAD_FORCE_RESET` | `YES`/`NO` | `NO` | Enables force reset of all booted devices when new ExecutableNetwork is created.<br />This is a plugin scope option and must be used with the plugin's SetConfig method only.<br />See <a href="#MYRIAD_DEVICE_ALLOC">Device allocation</a> section for details. |
|
||||
| `KEY_VPU_PLATFORM` | empty string/`VPU_2450`/`VPU_2480` | empty string | **Deprecated** Use `KEY_VPU_MYRIAD_PLATFORM` instead. <br />If set, the plugin will use a device with specific platform to allocate a network. |
|
||||
| `KEY_VPU_FORCE_RESET` | `YES`/`NO` | `NO` | **Deprecated** Use `KEY_VPU_MYRIAD_FORCE_RESET` instead. <br />Enables force reset of all booted devices when new ExecutableNetwork is created.<br />This is a plugin scope option and must be used with the plugin's SetConfig method only.<br />See <a href="#MYRIAD_DEVICE_ALLOC">Device allocation</a> section for details. |
|
||||
| `KEY_VPU_MYRIAD_MOVIDIUS_DDR_TYPE` | `VPU_MYRIAD_DDR_AUTO`/ `VPU_MYRIAD_DDR_MICRON_2GB`/ `VPU_MYRIAD_DDR_SAMSUNG_2GB`/ `VPU_MYRIAD_DDR_HYNIX_2GB`/ `VPU_MYRIAD_DDR_MICRON_1GB` | `VPU_MYRIAD_DDR_AUTO` | This option allows setting DDR type for the MyriadX board. |
|
||||
|
||||
|
||||
## Device allocation <a name="MYRIAD_DEVICE_ALLOC"> </a>
|
||||
|
||||
|
||||
@@ -844,11 +844,7 @@ EXCLUDE_SYMLINKS = NO
|
||||
# Note that the wildcards are matched against the file with absolute path, so to
|
||||
# exclude all test directories for example use the pattern */test/*
|
||||
|
||||
EXCLUDE_PATTERNS = cnn_network_ngraph_impl.hpp \
|
||||
ie_imemory_state_internal.hpp \
|
||||
ie_memory_state_internal.hpp \
|
||||
ie_memory_state_base.hpp \
|
||||
generic_ie.hpp \
|
||||
EXCLUDE_PATTERNS = generic_ie.hpp \
|
||||
function_name.hpp \
|
||||
macro_overload.hpp
|
||||
|
||||
|
||||
@@ -92,7 +92,7 @@ Returns a metric value for a metric with the name `name`. A metric is a static
|
||||
|
||||
@snippet src/template_executable_network.cpp executable_network:get_metric
|
||||
|
||||
The IE_SET_METRIC helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
|
||||
The IE_SET_METRIC_RETURN helper macro sets metric value and checks that the actual metric type matches a type of the specified value.
|
||||
|
||||
### `GetConfig()`
|
||||
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
# Representation of low-precision models
|
||||
# Representation of low-precision models {#lp_representation}
|
||||
The goal of this document is to describe how optimized models are represented in OpenVINO Intermediate Representation (IR) and provide guidance on interpretation rules for such models at runtime.
|
||||
Currently, there are two groups of optimization methods that can influence on the IR after applying them to the full-precision model:
|
||||
- **Sparsity**. It is represented by zeros inside the weights and this is up to the hardware plugin how to interpret these zeros (use weights as is or apply special compression algorithms and sparse arithmetic). No additional mask is provided with the model.
|
||||
- **Quantization**. The rest of this document is dedicated to the representation of quantized models.
|
||||
|
||||
## Representation of quantized models
|
||||
The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in [this document](../MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md)). This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
|
||||
The OpenVINO Toolkit represents all the quantized models using the so-called FakeQuantize operation (see the description in [this document](@ref openvino_docs_ops_quantization_FakeQuantize_1)). This operation is very expressive and allows mapping values from arbitrary input and output ranges. The whole idea behind that is quite simple: we project (discretize) the input values to the low-precision data type using affine transformation (with clamp and rounding) and then reproject discrete values back to the original range and data type. It can be considered as an emulation of the quantization process which happens at runtime.
|
||||
In order to be able to execute a particular DL operation in low-precision all its inputs should be quantized i.e. should have FakeQuantize between operation and data blobs. The figure below shows an example of quantized Convolution which contains two FakeQuantize nodes: one for weights and one for activations (bias is quantized using the same parameters).
|
||||
![quantized_convolution]
|
||||
<div align="center">Figure 1. Example of quantized Convolution operation.</div>
|
||||
|
||||
@@ -3,13 +3,13 @@
|
||||
One of the feature of Inference Engine is the support of quantized networks with different precisions: INT8, INT4, etc.
|
||||
However, it is up to the plugin to define what exact precisions are supported by the particular HW.
|
||||
All quantized networks which can be expressed in IR have a unified representation by means of *FakeQuantize* operation.
|
||||
For more details about low-precision model representation please refer to this [document](LowPrecisionModelRepresentation.md).
|
||||
For more details about low-precision model representation please refer to this [document](@ref lp_representation).
|
||||
|
||||
### Interpreting FakeQuantize at runtime
|
||||
During the model load each plugin can interpret quantization rules expressed in *FakeQuantize* operations:
|
||||
- Independently based on the definition of *FakeQuantize* operation.
|
||||
- Using a special library of low-precision transformations (LPT) which applies common rules for generic operations,
|
||||
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](../IE_DG/Int8Inference.md).
|
||||
such as Convolution, Fully-Connected, Eltwise, etc., and translates "fake-quantized" models into the models with low-precision operations. For more information about low-precision flow please refer to the following [document](@ref openvino_docs_IE_DG_Int8Inference).
|
||||
|
||||
Here we provide only a high-level overview of the interpretation rules of FakeQuantize.
|
||||
At runtime each FakeQuantize can be split into two independent operations: **Quantize** and **Dequantize**.
|
||||
|
||||
@@ -17,8 +17,10 @@
|
||||
</tab>
|
||||
<!-- API References -->
|
||||
<tab type="usergroup" title="API REFERENCE">
|
||||
<!-- IE Developer Package -->
|
||||
<tab type="modules" visible="yes" title="Inference Engine Plugin API Reference"/>
|
||||
<!-- IE Plugin API -->
|
||||
<tab type="user" url="group__ie__dev__api.html" visible="yes" title="Inference Engine Plugin API Reference"/>
|
||||
<!-- IE Transformations API -->
|
||||
<tab type="user" url="group__ie__transformation__api.html" visible="yes" title="Inference Engine Transformations API Reference"/>
|
||||
</tab>
|
||||
<tab type="usergroup" title="MAIN OPENVINO™ DOCS" url="../index.html"/>
|
||||
</navindex>
|
||||
|
||||
@@ -4,9 +4,7 @@ This software and the related documents are Intel copyrighted materials, and you
|
||||
|
||||
This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by visiting [www.intel.com/design/literature.htm](https://www.intel.com/design/literature.htm).
|
||||
|
||||
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
|
||||
|
||||
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit [www.intel.com/benchmarks](https://www.intel.com/benchmarks).
|
||||
Performance varies by use, configuration and other factors. Learn more at [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex).
|
||||
|
||||
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.
|
||||
|
||||
@@ -14,7 +12,7 @@ Your costs and results may vary.
|
||||
|
||||
Intel technologies may require enabled hardware, software or service activation.
|
||||
|
||||
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others.
|
||||
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others.
|
||||
|
||||
## OpenVINO™ Logo
|
||||
To build equity around the project, the OpenVINO logo was created for both Intel and community usage. The logo may only be used to represent the OpenVINO toolkit and offerings built using the OpenVINO toolkit.
|
||||
|
||||
@@ -12,51 +12,39 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi
|
||||
|
||||
* <code>.bin</code> - Contains the weights and biases binary data.
|
||||
|
||||
> **TIP**: You also can work with the Model Optimizer inside the OpenVINO™ [Deep Learning Workbench](@ref workbench_docs_Workbench_DG_Introduction) (DL Workbench).
|
||||
> [DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is a platform built upon OpenVINO™ and provides a web-based graphical environment that enables you to optimize, fine-tune, analyze, visualize, and compare
|
||||
> performance of deep learning models on various Intel® architecture
|
||||
> configurations. In the DL Workbench, you can use most of OpenVINO™ toolkit components.
|
||||
> <br>
|
||||
> Proceed to an [easy installation from Docker](@ref workbench_docs_Workbench_DG_Install_from_Docker_Hub) to get started.
|
||||
|
||||
## What's New in the Model Optimizer in this Release?
|
||||
|
||||
* Common changes:
|
||||
* Implemented several optimization transformations to replace sub-graphs of operations with HSwish, Mish, Swish and SoftPlus operations.
|
||||
* Model Optimizer generates IR keeping shape-calculating sub-graphs **by default**. Previously, this behavior was triggered if the "--keep_shape_ops" command line parameter was provided. The key is ignored in this release and will be deleted in the next release. To trigger the legacy behavior to generate an IR for a fixed input shape (folding ShapeOf operations and shape-calculating sub-graphs to Constant), use the "--static_shape" command line parameter. Changing model input shape using the Inference Engine API in runtime may fail for such an IR.
|
||||
* Fixed Model Optimizer conversion issues resulted in non-reshapeable IR using the Inference Engine reshape API.
|
||||
* Enabled transformations to fix non-reshapeable patterns in the original networks:
|
||||
* Hardcoded Reshape
|
||||
* In Reshape(2D)->MatMul pattern
|
||||
* Reshape->Transpose->Reshape when the pattern can be fused to the ShuffleChannels or DepthToSpace operation
|
||||
* Hardcoded Interpolate
|
||||
* In Interpolate->Concat pattern
|
||||
* Added a dedicated requirements file for TensorFlow 2.X as well as the dedicated install prerequisites scripts.
|
||||
* Replaced the SparseToDense operation with ScatterNDUpdate-4.
|
||||
* Updated requirements for the numpy component to avoid compatibility issues with TensorFlow 1.x.
|
||||
* Improved reshape-ability of models with eltwise and CTCGreedyDecoder operations
|
||||
* ONNX*:
|
||||
* Enabled an ability to specify the model output **tensor** name using the "--output" command line parameter.
|
||||
* Added support for the following operations:
|
||||
* Acosh
|
||||
* Asinh
|
||||
* Atanh
|
||||
* DepthToSpace-11, 13
|
||||
* DequantizeLinear-10 (zero_point must be constant)
|
||||
* HardSigmoid-1,6
|
||||
* QuantizeLinear-10 (zero_point must be constant)
|
||||
* ReduceL1-11, 13
|
||||
* ReduceL2-11, 13
|
||||
* Resize-11, 13 (except mode="nearest" with 5D+ input, mode="tf_crop_and_resize", and attributes exclude_outside and extrapolation_value with non-zero values)
|
||||
* ScatterND-11, 13
|
||||
* SpaceToDepth-11, 13
|
||||
* Loop-11, 13
|
||||
* Round-11
|
||||
* GatherND-11, 12, 13
|
||||
* TensorFlow*:
|
||||
* Added support for the TensorFlow Object Detection API models with pre-processing block when mean/scale values are applied prior to resizing of the image. Previously only the case when mean/scale values are applied after the resize was supported.
|
||||
* Aligned FakeQuantized limits adjustment with TensorFlow approach
|
||||
* Added support for the following operations:
|
||||
* Acosh
|
||||
* Asinh
|
||||
* Atanh
|
||||
* CTCLoss
|
||||
* EuclideanNorm
|
||||
* ExtractImagePatches
|
||||
* FloorDiv
|
||||
* GatherND
|
||||
* Round
|
||||
* NonMaxSuppression
|
||||
* LogSoftmax
|
||||
* FakeQuantWithMinMaxVarsPerChannel
|
||||
* MXNet*:
|
||||
* Added support for the following operations:
|
||||
* Acosh
|
||||
* Asinh
|
||||
* Atanh
|
||||
* GatherND
|
||||
* Round
|
||||
* Kaldi*:
|
||||
* Fixed bug with ParallelComponent support. Now it is fully supported with no restrictions.
|
||||
* Added support for the following operations:
|
||||
* TdnnComponent
|
||||
|
||||
> **NOTE:**
|
||||
> [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019).
|
||||
@@ -77,7 +65,6 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi
|
||||
* [Converting DeepSpeech from TensorFlow](prepare_model/convert_model/tf_specific/Convert_DeepSpeech_From_Tensorflow.md)
|
||||
* [Converting Language Model on One Billion Word Benchmark from TensorFlow](prepare_model/convert_model/tf_specific/Convert_lm_1b_From_Tensorflow.md)
|
||||
* [Converting Neural Collaborative Filtering Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_NCF_From_Tensorflow.md)
|
||||
|
||||
* [Converting TensorFlow* Object Detection API Models](prepare_model/convert_model/tf_specific/Convert_Object_Detection_API_Models.md)
|
||||
* [Converting TensorFlow*-Slim Image Classification Model Library Models](prepare_model/convert_model/tf_specific/Convert_Slim_Library_Models.md)
|
||||
* [Converting CRNN Model from TensorFlow*](prepare_model/convert_model/tf_specific/Convert_CRNN_From_Tensorflow.md)
|
||||
@@ -90,17 +77,15 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi
|
||||
* [Model Optimizations Techniques](prepare_model/Model_Optimization_Techniques.md)
|
||||
* [Cutting parts of the model](prepare_model/convert_model/Cutting_Model.md)
|
||||
* [Sub-graph Replacement in Model Optimizer](prepare_model/customize_model_optimizer/Subgraph_Replacement_Model_Optimizer.md)
|
||||
* [(Deprecated) Case-Study: Converting SSD models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md)
|
||||
* [(Deprecated) Case-Study: Converting Faster R-CNN models created with the TensorFlow* Object Detection API](prepare_model/customize_model_optimizer/TensorFlow_Faster_RCNN_ObjectDetection_API.md)
|
||||
* [Supported Framework Layers](prepare_model/Supported_Frameworks_Layers.md)
|
||||
* [Intermediate Representation and Operation Sets](IR_and_opsets.md)
|
||||
* [Operations Specification](../ops/opset.md)
|
||||
* [Intermediate Representation suitable for INT8 inference](prepare_model/convert_model/IR_suitable_for_INT8_inference.md)
|
||||
|
||||
* [Custom Layers in Model Optimizer](prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
* [Model Optimizer Extensibility](prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
* [Extending Model Optimizer with New Primitives](prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_New_Primitives.md)
|
||||
* [Extending Model Optimizer with Caffe Python Layers](prepare_model/customize_model_optimizer/Extending_Model_Optimizer_with_Caffe_Python_Layers.md)
|
||||
* [Extending Model Optimizer with Custom MXNet* Operations](prepare_model/customize_model_optimizer/Extending_MXNet_Model_Optimizer_with_New_Primitives.md)
|
||||
* [Legacy Mode for Caffe* Custom Layers](prepare_model/customize_model_optimizer/Legacy_Mode_for_Caffe_Custom_Layers.md)
|
||||
|
||||
* [Model Optimizer Frequently Asked Questions](prepare_model/Model_Optimizer_FAQ.md)
|
||||
|
||||
* [Known Issues](Known_Issues_Limitations.md)
|
||||
|
||||
@@ -114,6 +114,7 @@ Where `HEIGHT` and `WIDTH` are the input images height and width for which the m
|
||||
| Unet | [Repo](https://github.com/kkweon/UNet-in-Tensorflow) |
|
||||
| Keras-TCN | [Repo](https://github.com/philipperemy/keras-tcn) |
|
||||
| PRNet | [Repo](https://github.com/YadiraF/PRNet) |
|
||||
| YOLOv4 | [Repo](https://github.com/Ma-Dan/keras-yolo4) |
|
||||
|
||||
* YOLO topologies from DarkNet* can be converted using [instruction](tf_specific/Convert_YOLO_From_Tensorflow.md),
|
||||
* FaceNet topologies can be converted using [instruction](tf_specific/Convert_FaceNet_From_Tensorflow.md).
|
||||
|
||||
@@ -38,5 +38,5 @@ Framework-specific parameters for:
|
||||
## See Also
|
||||
* [Configuring the Model Optimizer](../Config_Model_Optimizer.md)
|
||||
* [IR Notation Reference](../../IR_and_opsets.md)
|
||||
* [Custom Layers in Model Optimizer](../customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
* [Model Cutting](Cutting_Model.md)
|
||||
* [Model Optimizer Extensibility](../customize_model_optimizer/Customize_Model_Optimizer.md)
|
||||
* [Model Cutting](Cutting_Model.md)
|
||||
|
||||
@@ -9,7 +9,6 @@ The following examples are the situations when model cutting is useful or even r
|
||||
* model has pre- or post-processing parts that cannot be translated to existing Inference Engine layers.
|
||||
* model has a training part that is convenient to be kept in the model, but not used during inference.
|
||||
* model is too complex (contains lots of unsupported operations that cannot be easily implemented as custom layers), so the complete model cannot be converted in one shot.
|
||||
* model is one of the supported [SSD models](../customize_model_optimizer/TensorFlow_SSD_ObjectDetection_API.md). In this case, you need to cut a post-processing part off.
|
||||
* problem with model conversion in the Model Optimizer or inference in the Inference Engine occurred. To localize the issue, limit the scope for conversion by iteratively searching for problematic places in the model.
|
||||
* single custom layer or a combination of custom layers is isolated for debugging purposes.
|
||||
|
||||
@@ -389,4 +388,4 @@ In this case, when `--input_shape` is specified and the node contains multiple i
|
||||
The correct command line is:
|
||||
```sh
|
||||
python3 mo.py --input_model=inception_v1.pb --input=0:InceptionV1/InceptionV1/Conv2d_1a_7x7/convolution --input_shape=[1,224,224,3]
|
||||
```
|
||||
```
|
||||
|
||||
@@ -1582,9 +1582,9 @@ OI, which means that Input changes the fastest, then Output.
|
||||
|
||||
**Mathematical Formulation**
|
||||
|
||||
\f[
|
||||
output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:]
|
||||
\f]
|
||||
\f[
|
||||
output[:, ... ,:, i, ... , j,:, ... ,:] = input2[:, ... ,:, input1[i, ... ,j],:, ... ,:]
|
||||
\f]
|
||||
|
||||
|
||||
**Inputs**
|
||||
@@ -5086,7 +5086,9 @@ t \in \left ( 0, \quad tiles \right )
|
||||
|
||||
Output tensor is populated by values computes in the following way:
|
||||
|
||||
output[i1, ..., i(axis-1), j, i(axis+1) ..., iN] = top_k(input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]), k, sort, mode)
|
||||
\f[
|
||||
output[i1, ..., i(axis-1), j, i(axis+1) ..., iN] = top_k(input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]), k, sort, mode)
|
||||
\f]
|
||||
|
||||
So for each slice `input[i1, ...., i(axis-1), :, i(axis+1), ..., iN]` which represents 1D array, top_k value is computed individually. Sorting and minimum/maximum are controlled by `sort` and `mode` attributes.
|
||||
|
||||
|
||||
@@ -45,6 +45,10 @@ python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weig
|
||||
```sh
|
||||
python3 convert_weights_pb.py --class_names coco.names --data_format NHWC --weights_file yolov3-tiny.weights --tiny
|
||||
```
|
||||
At this step, you may receive a warning like `WARNING:tensorflow:Entity <...> could not be transformed and will be executed as-is.`. To workaround this issue, switch to gast 0.2.2 with the following command:
|
||||
```sh
|
||||
pip3 install --user gast==0.2.2
|
||||
```
|
||||
|
||||
If you have YOLOv3 weights trained for an input image with the size different from 416 (320, 608 or your own), please provide the `--size` key with the size of your image specified while running the converter. For example, run the following command for an image with size 608:
|
||||
```sh
|
||||
@@ -106,7 +110,7 @@ where:
|
||||
|
||||
> **NOTE:** The color channel order (RGB or BGR) of an input data should match the channel order of the model training dataset. If they are different, perform the `RGB<->BGR` conversion specifying the command-line parameter: `--reverse_input_channels`. Otherwise, inference results may be incorrect. For more information about the parameter, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](../Converting_Model_General.md).
|
||||
|
||||
OpenVINO™ toolkit provides a demo that uses YOLOv3 model. For more information, refer to [Object Detection YOLO* V3 Demo, Async API Performance Showcase](@ref omz_demos_object_detection_demo_yolov3_async_README).
|
||||
OpenVINO™ toolkit provides a demo that uses YOLOv3 model. For more information, refer to [Object Detection C++ Demo](@ref omz_demos_object_detection_demo_ssd_async_README).
|
||||
|
||||
## Convert YOLOv1 and YOLOv2 Models to the IR
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,45 +1,41 @@
|
||||
# Extending the MXNet Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives}
|
||||
# Extending Model Optimizer for Custom MXNet* Operations {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives}
|
||||
|
||||
This section describes how you can create a Model Optimizer extension for a custom layer from your MXNet* model. It supplements the main document [Extending Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md) and provides a step-by-step procedure. To create an extension for a particular layer, perform the following steps:
|
||||
This section provides instruction on how to support a custom MXNet operation (or as it called in the MXNet documentation
|
||||
"operator" or "layer") which is not a part of the MXNet operation set. For example, if the operator is implemented using
|
||||
the following [guide](https://mxnet.apache.org/versions/1.7.0/api/faq/new_op.html).
|
||||
|
||||
This section describes a procedure on how to extract operator attributes in the Model Optimizer. The rest of the
|
||||
operation enabling pipeline and documentation on how to support MXNet operations from standard MXNet operation set is
|
||||
described in the main document [Customize_Model_Optimizer](Customize_Model_Optimizer.md).
|
||||
|
||||
## Writing Extractor for Custom MXNet Operation
|
||||
Custom MXNet operations have an attribute `op` (defining the type of the operation) equal to `Custom` and attribute
|
||||
`op_type` which is an operation type defined by an user. Implement extractor class inherited from the
|
||||
`MXNetCustomFrontExtractorOp` class instead of `FrontExtractorOp` class used for standard framework operations in order
|
||||
to extract attributes for such kind of operations. The `op` class attribute value should be set to the `op_type` value
|
||||
so the extractor is triggered for this kind of operation.
|
||||
|
||||
There is the example of the extractor for the custom operation registered with type (`op_type` value) equal to
|
||||
`MyCustomOp` having attribute `my_attribute` of the floating point type with default value `5.6`. In this sample we
|
||||
assume that we have already created the `CustomOp` class (inherited from `Op` class) for the Model Optimizer operation
|
||||
for this MXNet custom operation as described in the [Customize_Model_Optimizer](Customize_Model_Optimizer.md).
|
||||
|
||||
1. Create the file `custom_proposal_ext.py` in the folder `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/mxnet`
|
||||
If your MXNet layer has op `Custom`, create the `CustomProposalFrontExtractor` class inherited from `MXNetCustomFrontExtractorOp`:
|
||||
```py
|
||||
from mo.front.extractor import MXNetCustomFrontExtractorOp
|
||||
class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
|
||||
pass
|
||||
```
|
||||
Otherwise, for layers that are not standard MXNet layers, create the `ProposalFrontExtractor` class inherited from `FrontExtractorOp`:
|
||||
```py
|
||||
from mo.front.extractor import FrontExtractorOp
|
||||
class ProposalFrontExtractor(FrontExtractorOp):
|
||||
pass
|
||||
```
|
||||
2. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
|
||||
```py
|
||||
from mo.front.extractor import MXNetCustomFrontExtractorOp
|
||||
class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
|
||||
op = '_contrib_Proposal'
|
||||
enabled = True
|
||||
```
|
||||
3. Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function:
|
||||
```py
|
||||
from extension.ops.custom_op import CustomOp # implementation of the MO operation class
|
||||
from mo.front.mxnet.extractors.utils import get_mxnet_layer_attrs
|
||||
from mo.front.extractor import MXNetCustomFrontExtractorOp
|
||||
from mo.ops.op import Op
|
||||
|
||||
class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp):
|
||||
op = '_contrib_Proposal'
|
||||
enabled = True
|
||||
class CustomProposalFrontExtractor(MXNetCustomFrontExtractorOp): # inherit from specific base class
|
||||
op = 'MyCustomOp' # the value corresponding to the `op_type` value of the MXNet operation
|
||||
enabled = True # the extractor is enabled
|
||||
|
||||
@staticmethod
|
||||
def extract(node):
|
||||
attrs = get_mxnet_layer_attrs(node.symbol_dict)
|
||||
attrs = get_mxnet_layer_attrs(node.symbol_dict) # parse the attributes to a dictionary with string values
|
||||
node_attrs = {
|
||||
'feat_stride': attrs.float('feat_stride', 16)
|
||||
'my_attribute': attrs.float('my_attribute', 5.6)
|
||||
}
|
||||
|
||||
# update the attributes of the node
|
||||
Op.get_op_class_by_name('Proposal').update_node_stat(node, node_attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before
|
||||
return __class__.enabled
|
||||
```
|
||||
|
||||
CustomOp.update_node_stat(node, node_attrs) # update the attributes of the node
|
||||
return self.enabled
|
||||
```
|
||||
|
||||
@@ -0,0 +1,89 @@
|
||||
# Extending Model Optimizer with Caffe* Python Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers}
|
||||
|
||||
This section provides instruction on how to support a custom Caffe operation written only in Python. For example, the
|
||||
[Faster-R-CNN model]((http://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0)) implemented in
|
||||
Caffe contains a custom layer Proposal written in Python. The layer is described in the
|
||||
[Faster-R-CNN protoxt](https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt)
|
||||
the following way:
|
||||
```sh
|
||||
layer {
|
||||
name: 'proposal'
|
||||
type: 'Python'
|
||||
bottom: 'rpn_cls_prob_reshape'
|
||||
bottom: 'rpn_bbox_pred'
|
||||
bottom: 'im_info'
|
||||
top: 'rois'
|
||||
python_param {
|
||||
module: 'rpn.proposal_layer'
|
||||
layer: 'ProposalLayer'
|
||||
param_str: "'feat_stride': 16"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This section describes only a procedure on how to extract operator attributes in the Model Optimizer. The rest of the
|
||||
operation enabling pipeline and documentation on how to support other Caffe operations (written in C++) is described in
|
||||
the main document [Customize_Model_Optimizer](Customize_Model_Optimizer.md).
|
||||
|
||||
## Writing Extractor for Caffe Python Layer
|
||||
Custom Caffe Python layers have an attribute `type` (defining the type of the operation) equal to `Python` and two
|
||||
mandatory attributes `module` and `layer` in the `python_param` dictionary. The `module` defines the Python module name
|
||||
with the layer implementation, while `layer` value is an operation type defined by an user. In order to extract
|
||||
attributes for such an operation it is necessary to implement extractor class inherited from the
|
||||
`CaffePythonFrontExtractorOp` class instead of `FrontExtractorOp` class used for standard framework layers. The `op`
|
||||
class attribute value should be set to the `module + "." + layer` value so the extractor is triggered for this kind of
|
||||
operation.
|
||||
|
||||
Here is a simplified example of the extractor for the custom operation Proposal from Faster-R-CNN model mentioned above.
|
||||
The full code with additional checks is provided in the
|
||||
`<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe/proposal_python_ext.py`. The sample code uses
|
||||
operation `ProposalOp` which corresponds to `Proposal` operation described in the [Available Operations Sets](../../../ops/opset.md)
|
||||
document. Refer to the source code below for a detailed explanation of the extractor.
|
||||
|
||||
```py
|
||||
from extensions.ops.proposal import ProposalOp
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
|
||||
|
||||
class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
op = 'rpn.proposal_layer.ProposalLayer' # module + "." + layer
|
||||
enabled = True # extractor is enabled
|
||||
|
||||
@staticmethod
|
||||
def extract_proposal_params(node, defaults):
|
||||
param = node.pb.python_param # get the protobuf message representation of the layer attributes
|
||||
# parse attributes from the layer protobuf message to a Python dictionary
|
||||
attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
|
||||
update_attrs = defaults
|
||||
|
||||
# the operation expects ratio and scale values to be called "ratio" and "scale" while Caffe uses different names
|
||||
if 'ratios' in attrs:
|
||||
attrs['ratio'] = attrs['ratios']
|
||||
del attrs['ratios']
|
||||
if 'scales' in attrs:
|
||||
attrs['scale'] = attrs['scales']
|
||||
del attrs['scales']
|
||||
|
||||
update_attrs.update(attrs)
|
||||
ProposalOp.update_node_stat(node, update_attrs) # update the node attributes
|
||||
|
||||
@classmethod
|
||||
def extract(cls, node):
|
||||
# define default values for the Proposal layer attributes
|
||||
defaults = {
|
||||
'feat_stride': 16,
|
||||
'base_size': 16,
|
||||
'min_size': 16,
|
||||
'ratio': [0.5, 1, 2],
|
||||
'scale': [8, 16, 32],
|
||||
'pre_nms_topn': 6000,
|
||||
'post_nms_topn': 300,
|
||||
'nms_thresh': 0.7
|
||||
}
|
||||
cls.extract_proposal_params(node, defaults)
|
||||
return cls.enabled
|
||||
```
|
||||
|
||||
## See Also
|
||||
* [Customize_Model_Optimizer](Customize_Model_Optimizer.md)
|
||||
* [Legacy Mode for Caffe* Custom Layers](Legacy_Mode_for_Caffe_Custom_Layers.md)
|
||||
@@ -1,476 +1,3 @@
|
||||
# Extending the Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives}
|
||||
# Extending Model Optimizer with New Primitives {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives}
|
||||
|
||||
This section explains how to register a custom layer in the Model Optimizer, including how to register Proposal as a custom layer. This section also demonstrates how `Proposal` works as a custom layer.
|
||||
|
||||
Model Optimizer loads the model, goes through the topology, and tries to find each layer type in the list of known layers. If the Model Optimizer does not find a layer in that list, it looks for the layer in the list of custom layers. If the Model Optimizer fails to find the layer among the defined custom layers, it registers a Caffe\* fallback for for the output shape inference. If the Model Optimizer does not find Caffe and cannot infer shapes, the Model Optimizer fails with an appropriate message.
|
||||
|
||||
You must know two things about custom layers with the Model Optimizer:
|
||||
|
||||
* How to map a subgraph in a FW model to a subgraph consisting of Inference Engine layers. For Caffe, the subgraph is a 1-to-1 mapping of a Caffe layer to an Inference Engine layer.
|
||||
* How to infer shapes for unknown subgraphs. This can be either for a step in which the internal representation consists of framework-specific layers, or for a step in which the internal representation consists of Inference Engine layers.
|
||||
|
||||
You also have the option of a framework fallback for unknown subgraphs, for when the original framework is used for inference of output shapes of operations. The example below demonstrates the case in which the framework is not available or should not be used.
|
||||
|
||||
## Preparing an Example Topology
|
||||
|
||||
> **NOTE**: Skip this section if you have a topology with a layer that is not known to the Model Optimizer.
|
||||
|
||||
The information in this section prepares a Caffe\* model with the provided, deployment-ready `prototxt` for a
|
||||
well-known topology called
|
||||
[Faster-R-CNN protoxt](https://raw.githubusercontent.com/rbgirshick/py-faster-rcnn/master/models/pascal_voc/VGG16/faster_rcnn_end2end/test.prototxt)
|
||||
to demonstrate the workflow. To use this example, you must have
|
||||
[weights and biases](http://dl.dropboxusercontent.com/s/o6ii098bu51d139/faster_rcnn_models.tgz?dl=0) for inference,
|
||||
because `prototxt` just describes the structure of the topology.
|
||||
|
||||
1. Download the `.caffemodel` and `.prototxt` files
|
||||
2. Run the Model Optimizer on the `.caffemodel` and `.prototxt` files:
|
||||
```shell
|
||||
python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
|
||||
```
|
||||
You will likely see the error message:
|
||||
```shell
|
||||
Error parsing text-format caffe.NetParameter: 196:16: Message type "caffe.DropoutParameter" has no field named "scale_train".
|
||||
```
|
||||
Whether you see the error depends on your Caffe version. For example, BVLC Caffe does not support the boolean parameter `scale_train` for the `dropout` layer. The error message does not matter, because the dropout layer is needed only for training, and the Model Optimizer removes it.
|
||||
3. To proceed, comment out these lines in `test.prototxt`:
|
||||
```sh
|
||||
...
|
||||
layer {
|
||||
name: "drop6"
|
||||
type: "Dropout"
|
||||
bottom: "fc6"
|
||||
top: "fc6"
|
||||
dropout_param {
|
||||
dropout_ratio: 0.5
|
||||
# scale_train: false # <-------------- comment out this line
|
||||
}
|
||||
}
|
||||
...
|
||||
layer {
|
||||
name: "drop7"
|
||||
type: "Dropout"
|
||||
bottom: "fc7"
|
||||
top: "fc7"
|
||||
dropout_param {
|
||||
dropout_ratio: 0.5
|
||||
# scale_train: false # <-------------- comment out this line
|
||||
}
|
||||
}
|
||||
...
|
||||
```
|
||||
4. Run the Model Optimizer on this model again:
|
||||
```shell
|
||||
python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
|
||||
```
|
||||
You get the model successfully converted to Intermediate Representation, and you can infer it with the Inference Engine.
|
||||
|
||||
However, the aim of this tutorial is to demonstrate the way of supporting custom layers not yet supported by the Model Optimizer.
|
||||
If you want to understand better how Model Optimizer works, remove the extension for layer `Proposal` and follow all steps of this tutorial.
|
||||
|
||||
5. Remove the extension for layer `Proposal`:
|
||||
```sh
|
||||
mkdir extensions/old
|
||||
mv extensions/front/caffe/proposal_python_ext.py extensions/old/proposal_python_ext_old.py
|
||||
mv extensions/ops/proposal_python_example.py extensions/old/proposal_python__example_old.py
|
||||
```
|
||||
6. Now you can run the Model Optimizer on this model once again:
|
||||
```sh
|
||||
python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt
|
||||
```
|
||||
You will see the message:
|
||||
```shell
|
||||
[ ERROR ] Found custom layer proposal. Model Optimizer does not support this layer.
|
||||
Please, register it in CustomLayersMapping.xml or implement extension.
|
||||
For more information please refer to Model Optimizer FAQ, question #FAQ45.
|
||||
```
|
||||
This message means the Model Optimizer can load the model, but is unable to infer the shape and handle the custom layer properties.
|
||||
|
||||
## Registering a Custom Layer as a Model Optimizer Extension
|
||||
|
||||
In the following sections, you will learn how to make the Model Optimizer independent from Caffe\* when processing a
|
||||
model that has a custom layer. In this example, the custom layer is referred to as the Proposal layer.
|
||||
|
||||
Use this section to implement the mapping rules for the `Proposal` layer attributes and the output shape calculation. As part of these steps, you must first create a class for the `Proposal` layer and inherit it from general-purpose Op that defines the interface of every new custom layer.
|
||||
|
||||
In this section, it is important to understand the `Op` class and its function. The implementation of this class shows that it expects a graph and attributes to be passed when initializing. The graph and attributes are in `<INSTALL_DIR>/deployment_tools/model_optimizer/mo/ops/op.py`
|
||||
|
||||
`Op` keeps the attributes for each operation and contains logic for handling node creation for internal model representation. `Op` is responsible for dumping each particular operation to the `.xml` format for the Intermediate Representation. By inheriting from it, the technical items are complete and you concentrate on the specificity of this layer: the attributes it supports and the rules on computing its output shape.
|
||||
|
||||
Follow these steps:
|
||||
|
||||
1. Create the file `python_proposal.py` in the directory `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops`:
|
||||
```python
|
||||
from mo.ops.op import Op
|
||||
class PythonProposalOp(Op):
|
||||
pass
|
||||
```
|
||||
2. Define the name of the operation and make a stub constructor:
|
||||
```python
|
||||
from mo.ops.op import Op
|
||||
class PythonProposalOp(Op):
|
||||
op = 'Proposal'
|
||||
def __init__(self, graph, attrs):
|
||||
super().__init__(graph)
|
||||
```
|
||||
3. Every `Op` must have three specific fields defined: `type`, `op`, and `infer`. In most cases, the `type` and `op` names are the same, and `infer` is defined as a function to compute the output shape. Reflect these fields in your constructor:
|
||||
```python
|
||||
from mo.ops.op import Op
|
||||
class PythonProposalOp(Op):
|
||||
op = 'Proposal'
|
||||
def __init__(self, graph, attrs):
|
||||
mandatory_props = {
|
||||
'type': __class__.op,
|
||||
'op': __class__.op,
|
||||
'infer': None
|
||||
}
|
||||
super().__init__(graph, mandatory_props, attrs)
|
||||
```
|
||||
According to the Intermediate Representation catalog, Proposal layer has the following attributes:
|
||||
|
||||
* `pre_nms_topn`
|
||||
* `post_nms_topn`
|
||||
* `nms_thresh`
|
||||
* `feat_stride`
|
||||
* `min_size`
|
||||
* `base_size`
|
||||
* `ratio`
|
||||
* `scale`
|
||||
4. In defining supported attribute names, it is best to use the same names as in the original models. The names are similar to parameters and have no connection with the model layer properties. For clarity, you can use the name `my_ratio` for `ratio`. Other than defining the list of supported parameters, you can define only the parameters that appear in the Intermediate Representation in the `backend_attrs` method.
|
||||
Define your attributes:
|
||||
```python
|
||||
class PythonProposalOp(Op):
|
||||
# ... constructor
|
||||
def supported_attrs(self):
|
||||
return [
|
||||
'pre_nms_topn',
|
||||
'post_nms_topn',
|
||||
'nms_thresh',
|
||||
'feat_stride',
|
||||
'min_size',
|
||||
'base_size',
|
||||
'ratio',
|
||||
'scale'
|
||||
]
|
||||
```
|
||||
5. Model Optimizer now knows how to create the layer called Proposal when it is in the topology and what attributes this layer has. However, the Model Optimizer does not know how to calculate the output shape of this operation. Define a rule to calculate the output shape:
|
||||
```python
|
||||
import numpy as np
|
||||
from mo.graph.graph import Node
|
||||
from mo.ops.op import Op
|
||||
class PythonProposalOp(Op):
|
||||
def __init__(self, graph, attrs):
|
||||
mandatory_props = {
|
||||
'type': __class__.op,
|
||||
'op': __class__.op,
|
||||
'infer': PythonProposalOp.calculate_output_shape
|
||||
}
|
||||
super().__init__(graph, mandatory_props, attrs)
|
||||
# ... supported attrs
|
||||
@staticmethod
|
||||
def calculate_output_shape(node: Node):
|
||||
node.out_node().shape = (1, 1, 1, 1) # any Proposal now has always the same output
|
||||
```
|
||||
6. According to the Intermediate Representation catalog, Proposal layer has the following output calculation formula, where shape dynamically depends on the `post_nms_topn` parameter.
|
||||
Implement the output calculation formula in Python\*:
|
||||
```python
|
||||
import numpy as np
|
||||
class PythonProposalOp(Op):
|
||||
# ... static fields
|
||||
# ... constructor
|
||||
# ... supported attrs
|
||||
@staticmethod
|
||||
def calculate_output_shape(node: Node):
|
||||
input_shape = node.in_node(0).shape
|
||||
out_shape = np.array([0, 0], dtype=np.int64)
|
||||
# rois blob: holds R regions of interest, each is a 5 - tuple
|
||||
# (n, x1, y1, x2, y2) specifying an image batch index n and a
|
||||
# rectangle(x1, y1, x2, y2)
|
||||
out_shape[0] = input_shape[0] * node.post_nms_topn
|
||||
out_shape[1] = 5
|
||||
node.out_node(0).shape = out_shape
|
||||
```
|
||||
The node does not contain this parameter because it should be initialized in the constructor and in other parameters. The Inference Engine contains the implementation of a Caffe\*-like Proposal layer and works well with the default values from `caffe.proto`:
|
||||
```
|
||||
// Message that stores parameters used by ProposalLayer message ProposalParameter { optional uint32 feat_stride = 1 [default = 16]; optional uint32 base_size = 2 [default = 16]; optional uint32 min_size = 3 [default = 16]; repeated float ratio = 4; repeated float scale = 5; optional uint32 pre_nms_topn = 6 [default = 6000]; optional uint32 post_nms_topn = 7 [default = 300]; optional float nms_thresh = 8 [default = 0.7]; }
|
||||
```
|
||||
7. Change the constructor as follows:
|
||||
```python
|
||||
class PythonProposalOp(Op):
|
||||
# ... static fields
|
||||
def __init__(self, graph, attrs):
|
||||
mandatory_props = {
|
||||
'type': __class__.op,
|
||||
'op': __class__.op,
|
||||
'feat_stride': 16,
|
||||
'base_size': 16,
|
||||
'min_size': 16,
|
||||
'ratio': [0.5, 1, 2],
|
||||
'scale': [8, 16, 32],
|
||||
'pre_nms_topn': 6000,
|
||||
'post_nms_topn': 300,
|
||||
'nms_thresh': 0.7,
|
||||
'infer': PythonProposalOp.calculate_output_shape
|
||||
}
|
||||
super().__init__(graph, mandatory_props, attrs)
|
||||
# ... supported attrs
|
||||
# ... calculate output shape
|
||||
|
||||
```
|
||||
|
||||
It is mandatory to call two functions right after the implementation of that class:
|
||||
|
||||
```
|
||||
class ProposalPythonOp(Op):
|
||||
...
|
||||
|
||||
register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer')
|
||||
Op.excluded_classes.append(ProposalPythonOp)
|
||||
```
|
||||
|
||||
Note that the first call <code>register_caffe_python_extractor(ProposalPythonOp, 'rpn.proposal_layer.ProposalLayer')</code> registers the extension of the layer in the Model Optimizer that will be found by a specific name (it is mandatory to join module name and layer name): <code>'rpn.proposal_layer.ProposalLayer'</code>.
|
||||
|
||||
The second call prevents the Model Optimizer from using this extension as if it is an extension for a layer with type `Proposal`. Otherwise, this layer can be chosen as an implementation of extension that can lead to potential issues.
|
||||
|
||||
**Summary**
|
||||
|
||||
In this section you implemented support for a custom layer with type `Python` that is `Proposal` layer in the topology. You learned how to calculate output shape of this layer.
|
||||
|
||||
The values of attributes are hardcoded, and in the next section you will learn how to extract these values from original framework model (Caffe model in this case).
|
||||
|
||||
## Registering Rules to Pass Extension Layer Properties from a Caffe\* Model to the Intermediate Representation
|
||||
|
||||
Model Optimizer now knows how to set the shape of the `PythonProposalOp` operation, but it is incorrect to initialize attributes with same values for every operation. Instead, the values should be extracted from the original topology. Model Optimizer does not know how to map the custom layer properties to the `PythonProposalOp`. For this, you must register the `FrontExtractorOp` instance.
|
||||
|
||||
> **NOTE**: This step is required only if the layer requires parameters from the original model.
|
||||
|
||||
1. Remove call functions `register_caffe_python_extractor` and `Op.excluded_classes.append` from the file with `op`, because you will implement extracted attributes from prototxt by yourself.
|
||||
There are multiple types of layers in Caffe: for example, `Convolution` and `Pooling`. Also, there is a specific type for custom Python\* layers called `Python`. Therefore, it is necessary to distinguish between those 'usual' types of layers and custom ones. If you want to implement extensions for a layer with type different to `Python`, you need to inherit your class of operation (for example, `ProposalFrontExtractor`) from `FrontExtractorOp`. Otherwise, inherit your class of operation from `CaffePythonFrontExtractorOp`.
|
||||
2. Create a file `python_proposal_ext.py` in the folder `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe`
|
||||
```py
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
pass
|
||||
```
|
||||
For other layers types, inherit from `FrontExtractorOp`:
|
||||
```py
|
||||
from mo.front.extractor import FrontExtractorOp
|
||||
class ProposalFrontExtractor(FrontExtractorOp):
|
||||
pass
|
||||
```
|
||||
You will implement extractor for layer with type `Python`, however, the steps are generally the same for layers with other types.
|
||||
3. Specify the operation that the extractor refers to and a specific flag. The flag represents whether the operation should be used by the Model Optimizer or should be excluded from processing:
|
||||
```py
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
op = 'rpn.proposal_layer.ProposalLayer'
|
||||
enabled = True
|
||||
```
|
||||
4. Register a mapping rule between the original model and the `PythonProposalOp` attributes by overriding the following function:
|
||||
```py
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
from mo.ops.op import Op
|
||||
class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
op = 'rpn.proposal_layer.ProposalLayer'
|
||||
enabled = True
|
||||
@staticmethod
|
||||
def extract(node):
|
||||
proto_layer = node.pb
|
||||
param = proto_layer.python_param # each layer has a specific parameter, take a look at caffe.proto
|
||||
python_params = str(param.param_str) # for Python layers, all params are in param_str
|
||||
attrs = {
|
||||
'feat_stride': int(python_params.split(':')[-1])
|
||||
}
|
||||
# update the attributes of the node
|
||||
Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs) # <------ here goes the name ('Proposal') of the Operation that was implemented before
|
||||
return __class__.enabled
|
||||
```
|
||||
> **NOTE:** if you implement extension for layer with type different to `Python`, change the following line: <code>Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)</code> to this line: <code>Op.get_op_class_by_name(__class__.op).update_node_stat(node, mapping_rule)</code>.
|
||||
You have successfully extracted the parameter `feat_stride` from `prototxt`, assuming it is the only parameter in this layer.
|
||||
5. To increase the implementation flexibility:
|
||||
```py
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
from mo.ops.op import Op
|
||||
class PythonProposalFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
op = 'rpn.proposal_layer.ProposalLayer'
|
||||
enabled = True
|
||||
@staticmethod
|
||||
def extract(node):
|
||||
param = node.pb.python_param
|
||||
attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
|
||||
Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)
|
||||
return ProposalPythonFrontExtractor.enabled
|
||||
```
|
||||
|
||||
You can successfully convert the model. Open the `.xml` file and view your code:
|
||||
```xml
|
||||
...
|
||||
<layer id="42" name="proposal" precision="FP32" type="Python">
|
||||
<data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="[0.5, 1, 2]" scale="[8, 16, 32]"/>
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>18</dim>
|
||||
<dim>15</dim>
|
||||
<dim>15</dim>
|
||||
</port>
|
||||
<port id="1">
|
||||
<dim>1</dim>
|
||||
<dim>36</dim>
|
||||
<dim>15</dim>
|
||||
<dim>15</dim>
|
||||
</port>
|
||||
<port id="2">
|
||||
<dim>1</dim>
|
||||
<dim>3</dim>
|
||||
</port>
|
||||
</input>
|
||||
<output>
|
||||
<port id="3">
|
||||
<dim>300</dim>
|
||||
<dim>5</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
...
|
||||
```
|
||||
|
||||
Look at the output shape of the custom layer you implemented. The shape was calculated according to the rules specified in `PythonProposalOp`. The `ratio` and `scale` properties have the value `[0.5, 1, 2]` and `[8, 16, 32]`. They have square brackets because they are originally a repeated parameter. You converted the parameter to a list in `PythonProposalOp`. Model Optimizer cast the value to a string. According to Python\* rules, a list has a string representation of opening and closing square brackets and values joined by commas.
|
||||
|
||||
This is not a valid notation for the Intermediate Representation specification, because repeated parameters must be separated by a comma but without the brackets. Therefore, you must override the Model Optimizer default behavior regarding how it handles those parameters during the Intermediate Representation emitting stage, after the optimizations are complete. To do so, implement `backend_attrs()` in the `PythonProposalOp` class:
|
||||
```python
|
||||
class PythonProposalOp(Op):
|
||||
... other methods
|
||||
def backend_attrs(self) -> list:
|
||||
"""
|
||||
Gets list of attributes that should appear in resulting IR
|
||||
Returns:
|
||||
list of attributes names or list of tuples (name of attribute, pre-processing rule)
|
||||
"""
|
||||
return [
|
||||
( # a tuple per attribute
|
||||
'ratio', # name of attribute
|
||||
# pre-processing rule in a form of lambda
|
||||
# lambda takes a PythonProposalOp node with all defined properties
|
||||
# it translates [1,2,3] -> "1,2,3"
|
||||
lambda node: ','.join(map(str, node['ratio']))
|
||||
),
|
||||
(
|
||||
'scale',
|
||||
lambda node: ','.join(map(str, node['scale']))
|
||||
),
|
||||
'feat_stride',
|
||||
'base_size',
|
||||
'min_size',
|
||||
'pre_nms_topn',
|
||||
'post_nms_topn',
|
||||
'nms_thresh'
|
||||
]
|
||||
```
|
||||
The model can now be successfully converted.
|
||||
|
||||
Open the `.xml` file. `ratio` and `scale` have the expected correct values `0.5,1,2` and `8,16,32`:
|
||||
```xml
|
||||
...
|
||||
|
||||
<layer id="33" name="proposal" precision="FP32" type="Python">
|
||||
<data base_size="16" feat_stride="16" min_size="16" nms_thresh="0.7" post_nms_topn="300" pre_nms_topn="6000" ratio="0.5,1,2" scale="8,16,32"/>
|
||||
<input>
|
||||
...
|
||||
</input>
|
||||
<output>
|
||||
...
|
||||
</output>
|
||||
</layer>
|
||||
|
||||
...
|
||||
```
|
||||
|
||||
> **NOTE**: Model Optimizer supports the Faster-R-CNN topology. Run the following command for the same Intermediate Representation:
|
||||
|
||||
```sh
|
||||
python mo.py --input_model VGG16_faster_rcnn_final.caffemodel --input_proto test.prototxt --extensions <INSTALL_DIR>/deployment_tools/inference-engine/samples/object_detection_sample/fasterrcnn_extensions
|
||||
```
|
||||
|
||||
**Summary**
|
||||
|
||||
In this section you learned how to:
|
||||
|
||||
1. Create a framework-independent extension implementation of the Intermediate Representation custom layer with unified logic for calculating output shapes, specified set of attributes
|
||||
2. Use the Framework-Specific property extractor to map original model custom layer properties to the expected properties of the Framework-Independent extension
|
||||
3. Manipulate the custom layer properties representation in the resulting Intermediate Representation
|
||||
|
||||
Files used in this section:
|
||||
|
||||
* `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/ops/python_proposal.py`:
|
||||
|
||||
```py
|
||||
import networkx as nx
|
||||
import numpy as np
|
||||
from mo.front.extractor import attr_getter
|
||||
from mo.graph.graph import Node
|
||||
from mo.ops.op import Op
|
||||
|
||||
class ProposalOp(Op):
|
||||
op = 'Proposal'
|
||||
|
||||
def __init__(self, graph: nx.MultiDiGraph, attrs: dict):
|
||||
mandatory_props = {
|
||||
'type': __class__.op,
|
||||
'op': __class__.op,
|
||||
'post_nms_topn': 300, # default in caffe-shared
|
||||
'infer': ProposalOp.proposal_infer
|
||||
}
|
||||
super().__init__(graph, mandatory_props, attrs)
|
||||
|
||||
def supported_attrs(self):
|
||||
return [
|
||||
'feat_stride',
|
||||
'base_size',
|
||||
'min_size',
|
||||
'ratio',
|
||||
'scale',
|
||||
'pre_nms_topn',
|
||||
'post_nms_topn',
|
||||
'nms_thresh'
|
||||
]
|
||||
|
||||
def backend_attrs(self):
|
||||
return [
|
||||
'feat_stride',
|
||||
'base_size',
|
||||
'min_size',
|
||||
('ratio', lambda node: attr_getter(node, 'ratio')),
|
||||
('scale', lambda node: attr_getter(node, 'scale')),
|
||||
'pre_nms_topn',
|
||||
'post_nms_topn',
|
||||
'nms_thresh',
|
||||
]
|
||||
|
||||
@staticmethod
|
||||
def proposal_infer(node: Node):
|
||||
input_shape = node.in_node(0).shape
|
||||
out_shape = np.array([0, 0], dtype=np.int64)
|
||||
# rois blob: holds R regions of interest, each is a 5 - tuple
|
||||
# (n, x1, y1, x2, y2) specifying an image batch index n and a
|
||||
# rectangle(x1, y1, x2, y2)
|
||||
out_shape[0] = input_shape[0] * node.post_nms_topn
|
||||
out_shape[1] = 5
|
||||
node.out_node(0).shape = out_shape
|
||||
```
|
||||
* `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/caffe/python_proposal_ext.py`:
|
||||
|
||||
```py
|
||||
from mo.front.extractor import CaffePythonFrontExtractorOp
|
||||
from mo.ops.op import Op
|
||||
|
||||
class ProposalPythonFrontExtractor(CaffePythonFrontExtractorOp):
|
||||
op = 'rpn.proposal_layer.ProposalLayer'
|
||||
enabled = True
|
||||
|
||||
@staticmethod
|
||||
def extract(node):
|
||||
param = node.pb.python_param
|
||||
attrs = CaffePythonFrontExtractorOp.parse_param_str(param.param_str)
|
||||
Op.get_op_class_by_name('Proposal').update_node_stat(node, attrs)
|
||||
return ProposalPythonFrontExtractor.enabled
|
||||
```
|
||||
This page is deprecated. Please, refer to [Model Optimizer Extensibility](Customize_Model_Optimizer.md) page for more information.
|
||||
|
||||
@@ -1,10 +1,23 @@
|
||||
# Legacy Mode for Caffe* Custom Layers {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Legacy_Mode_for_Caffe_Custom_Layers}
|
||||
|
||||
> **NOTE**: This functionality is deprecated and will be removed in future releases.
|
||||
> **NOTE**: This functionality is deprecated and will be removed in the future releases.
|
||||
|
||||
Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe\* framework installed on your system. This chapter covers this option.
|
||||
Model Optimizer can register custom layers in a way that the output shape is calculated by the Caffe\* framework
|
||||
installed on your system. This approach has several limitations:
|
||||
|
||||
> **NOTE**: Caffe Python\* API has an issue when layer name does not correspond to the name of its top. The fix was implemented on [BVLC Caffe\*](https://github.com/BVLC/caffe/commit/35a7b87ad87457291dfc79bf8a7e7cf7ef278cbb). The Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the fallback procedure.
|
||||
* If your layer output shape depends on dynamic parameters, input data or previous layers parameters, calculation of
|
||||
output shape of the layer via Caffe can be incorrect. For example, `SimplerNMS` is filtering out bounding boxes that do
|
||||
not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some
|
||||
noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an
|
||||
option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched
|
||||
Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism
|
||||
for such layers described in the [Model Optimizer Extensibility](Customize_Model_Optimizer.md).
|
||||
* It is not possible to produce Intermediate Representation on a machine that does not have Caffe installed.
|
||||
|
||||
> **NOTE**: Caffe Python\* API has an issue when layer name does not correspond to the name of its top. The fix was
|
||||
> implemented on [BVLC Caffe\*](https://github.com/BVLC/caffe/commit/35a7b87ad87457291dfc79bf8a7e7cf7ef278cbb). The
|
||||
> Caffe framework on your computer must contain this fix. Otherwise, Caffe framework can unexpectedly fail during the
|
||||
> fallback procedure.
|
||||
|
||||
> **NOTE**: The Caffe fallback feature was validated against [this GitHub revision](https://github.com/BVLC/caffe/tree/99466224dac86ddb86296b1e727794fb836bd80f). You may have issues with forks or later Caffe framework versions.
|
||||
|
||||
@@ -25,7 +38,8 @@ Where:
|
||||
|
||||
**Example**:
|
||||
|
||||
1. `Proposal` layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in the `proposal_param` property of the layer:
|
||||
1. `Proposal` layer has parameters, and they appear in the Intermediate Representation. The parameters are stored in
|
||||
the `proposal_param` property of the layer:
|
||||
```shell
|
||||
\<CustomLayer NativeType="Proposal" hasParam ="true" protoParamName = "proposal_param"/\>
|
||||
```
|
||||
@@ -34,16 +48,6 @@ Where:
|
||||
\<CustomLayer NativeType="CustomLayer" hasParam ="false"/\>
|
||||
```
|
||||
|
||||
For this feature, you need an appropriate version of Caffe installed on the computer on which you run the Model Optimizer.
|
||||
|
||||
## Constraints of Using the Caffe Fallback
|
||||
|
||||
Several layers in the Caffe\* framework can have shapes that dynamically depend on the input data, not only the layers that proceed the layer and its parameters. For example, `SimplerNMS` is filtering out bounding boxes that do not satisfy the condition. Internally, Caffe fallback forwards the whole net without any meaningful data - just some noise. It is natural to get only one bounding box (0,0,0,0) instead of expected number (for example, 15). There is an option to patch Caffe accordingly, however, it makes success of Intermediate Representation generation on the patched Caffe on the particular machine. To keep the solution independent from Caffe, we recommend to use extensions mechanism for such layers.
|
||||
|
||||
Known cases like `Proposal`, `DetectionOutput`, `SimplerNMS` are implemented as extensions and can be used out of the box.
|
||||
|
||||
A detailed description of supported layers is in the [Operations Specification](../../../ops/opset.md) document.
|
||||
|
||||
## Building Caffe\*
|
||||
|
||||
1. Build Caffe\* with Python\* 3.5:
|
||||
@@ -68,4 +72,4 @@ python3
|
||||
import caffe
|
||||
```
|
||||
|
||||
If Caffe was installed correctly, the `caffe` module is imported without errors.
|
||||
If Caffe was installed correctly, the `caffe` module is imported without errors.
|
||||
|
||||
@@ -1,363 +1,4 @@
|
||||
# Sub-Graph Replacement in the Model Optimizer {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer}
|
||||
|
||||
Several reasons exist for why the Model Optimizer could not generate an Intermediate Representation for a model. However, in some cases, the Intermediate Representation could be generated after providing certain hints to the tool. The examples of hints below are mostly related to TensorFlow\*, but potentially could be actual for models created in any framework:
|
||||
|
||||
* Topology contains an operation (or a sub-graph of operations) not known for Model Optimizer, but this operation (sub-graph) could be expressed as a combination of known operations. A hint would be a description of this combination to the tool).
|
||||
* Sub-graph of operations in the topology expresses a single layer known to Inference Engine.
|
||||
* TensorFlow and Inference Engine use different layouts of tensors, NHWC and NCHW respectively. If some tensor in NHWC layout is flattened (for example, all the dimensions are squashed into single dim), it is not possible to convert it to NCHW layout required for Inference Engine, so Model Optimizer cannot produce correct Intermediate Representation.
|
||||
|
||||
The detailed solutions for the examples above are given later, the next subsection shows what is common in all three examples.
|
||||
|
||||
## Sub-graph Replacement
|
||||
|
||||
In these cases, the sub-graph (or a single node) of initial graph is replaced with a new sub-graph (single node). The sub-graph replacement consists of the following steps:
|
||||
|
||||
1. Identify an existing sub-graph for replacement
|
||||
|
||||
2. Generate a new sub-graph
|
||||
|
||||
3. Connect a new sub-graph to the graph (create input/output edges to the new sub-graph)
|
||||
|
||||
4. Create output edges out of a new sub-graph to the graph
|
||||
|
||||
5. Do something with the original sub-graph (for example, remove it)
|
||||
|
||||
Model Optimizer provides several ways to perform most of the sub-graph replacement steps. The next subsections describe these methods.
|
||||
|
||||
## Replace a Single Operation with a Sub-graph of Operations
|
||||
|
||||
For example, there is an operation `SquaredDifference` in TensorFlow which calculates \f$(a - b)^2\f$, where \f$a\f$ and \f$b\f$ are input tensors. Inference Engine does not support such operation. However, `SquaredDifference` could be expressed using two `Power` operations and one `Eltwise Add`. The `Power` operation calculates \f$scale * (a ^ {power}) + shift\f$, where \f$a\f$ is a tensor and \f$scale\f$, \f$power\f$ and \f$shift\f$ are float values. The first `Power` operation negates the value of tensor \f$b\f$. The second one is used to square the result of \f$a + (- b)\f$ which is calculated using the `Eltwise Add` operation applied to tensor \f$a\f$ and tensor \f$-b\f$.
|
||||
|
||||
Given that, we can replace all `SquaredDifference` operations in the initial model with two `Power` and one `Eltwise` operations. The replacer is implemented in the following file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/SquaredDifference.py`.
|
||||
```python
|
||||
import networkx as nx
|
||||
from mo.front.common.replacement import FrontReplacementOp
|
||||
from mo.graph.graph import Node
|
||||
from mo.ops.eltwise import Eltwise
|
||||
from mo.ops.power import Power
|
||||
class SquaredDifference(FrontReplacementOp):
|
||||
"""
|
||||
Example class illustrating how to implement replacement of a single op in the front-end of the MO pipeline.
|
||||
This class replaces a single op SquaredDifference by a sub-graph consisting of 3 lower-level ops.
|
||||
"""
|
||||
op = "SquaredDifference"
|
||||
enabled = True
|
||||
def replace_op(self, graph: nx.MultiDiGraph, node: Node):
|
||||
negate = Power(graph, dict(scale=-1, name=node.name + '/negate_'))
|
||||
add = Eltwise(graph, dict(operation='sum', name=node.name + '/add_'))
|
||||
squared = Power(graph, dict(power=2, name=node.name + '/squared_'))
|
||||
out_node = squared.create_node([add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])])
|
||||
# Replace edge from out port 0 of the matched node with a edge from node out_node.id with port 0.
|
||||
# The "explicit" version of the return value is: [(out_node.id, 0)])
|
||||
return [out_node.id]
|
||||
```
|
||||
Model Optimizer internal representation of the graph uses the networkx module.
|
||||
|
||||
**Key lines**:
|
||||
|
||||
* Line 1: Imports this module.
|
||||
|
||||
* Line 3: Imports class `FrontReplacementOp` that is used to replace operation of particular type with a new sub-graph. This class performs the first step of the sub-graph replacement (identifies an existing sub-graph for replacement). It is important to mention that the replacement happens before shape inference and creation of data nodes representing tensors with values. At this stage of model conversion pipeline, all nodes in the graph are operation nodes or nodes of type `Const` that produce tensor with fixed value embedded into the node.
|
||||
|
||||
* Line 4: Imports class `Node` representing a single node in the computation graph.
|
||||
|
||||
* Lines 5 - 6: Import classes representing operations `Power` and `Eltwise`. These classes are inherited from base class `mo.ops.Op` that represents operation and stores its attributes.
|
||||
|
||||
* Line 9: Defines class `SquaredDifference` inherited from `FrontReplacementOp`. This is a replacer class that is automatically registered and executed by Model Optimizer. Since the class is located in the common (not framework) specific directory `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front`, it is used for replacement for all supported frameworks.
|
||||
|
||||
* Line 15: Defines the class variable `op` that stores the name of the operation to be replaced. In this case, it is `SquaredDifference`.
|
||||
|
||||
* Line 16: Defines class variable `enabled` that controls whether the replacer is enabled or not. The only function that should be implemented in the class is `replace_op`. It gets graph to operate on and an instance of node of desired operation (`SquaredDifference` in this case). This function performs step two and three of the sub-graph replacement (generates a new sub-graph to replace with and connects a new sub-graph to the graph).
|
||||
|
||||
* Lines 19 - 21: Create instances of operations classes with required attributes.
|
||||
|
||||
* Line 23: Creates a sub-graph from the operations defined above. The `create_node` method of the `Op` class generates `Node` from the `Op` and uses single mandatory argument - the list of input nodes (represented as instances of `Node` class) to create input edges to the node being generated. Inputs of the `SquaredDifference` node are retrieved using `node.in_node(0)` and `node.in_node(1)` method calls. The `Eltwise Add` node gets first input as initial first input of `SquaredDifference` node, the second input of `add` is the result of negation of the second input of `SquaredDifference` node: `[add.create_node([node.in_node(0), negate.create_node([node.in_node(1)])])]`. Then the result of `Add` node is squared. `out_node` node performs this calculation.
|
||||
|
||||
The `replace_op` function returns a list of node names used to create output edges of the sub-graph to connect it with the rest of the graph. Each element of the list describes mapping between old output edge of the matched node and new sub-graph node and output edge index. The i-th element of the list corresponds to the i-th output tensor of the matched node. In this case, `SquaredDifference` produces single tensor through output port 0, so the returned list contains single element. In general, each element is a tuple, where the first element is the name of a new node producing required tensor and the second is the output port for that tensor. If the output port is 0, it is possible to use shortcut - just the name of the node instead of a tuple. Line 26 uses this shortcut. The returned value is used to create the new sub-graph output edges (step 4 of the sub-graph replacement).
|
||||
|
||||
Default implementation of the `FrontReplacementOp` class removes matched node and all its input/output edges (step 5 of the sub-graph replacement).
|
||||
|
||||
Another example of such kind of replacement is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/Sub.py` class where all instances of `Sub` operations are replaced with two operations: `Power` to negate the second argument and the `Eltwise` to perform elementwise add.
|
||||
|
||||
## Replace Sub-graph of Operations with a New Sub-graph of Operations
|
||||
|
||||
The previous example considered situation when one single node of a specific type is replaced. When it is necessary to replace a sub-graph of operations it is necessary to tell Model Optimizer how to identify this sub-graph. There are three ways to achieve that:
|
||||
|
||||
* Use graph isomorphism pattern of the networkx module
|
||||
|
||||
* Use nodes name pattern to identify `scope` (according to TensorFlow terminology) to be replaced
|
||||
|
||||
* Use sets of `start` and `end` node names to match all nodes "between" them
|
||||
|
||||
The next sections explain each option using real examples.
|
||||
|
||||
### Replace Sub-graph of Operations Using Graph Isomorphism Pattern <a name="replace-using-isomorphism-pattern"></a>
|
||||
|
||||
networkx Python\* module provides methods to find graph isomorphic to the given one using nodes and edges match: for example, `networkx.algorithms.isomorphism.categorical_node_match`, `networkx.algorithms.isomorphism.categorical_multiedge_match`. Model Optimizer uses these methods and provides simple API to use that feature.
|
||||
|
||||
For example, the Caffe\* has layer called [Mean-Variance Normalization (MVN)](http://caffe.berkeleyvision.org/tutorial/layers/mvn.html), which is also supported by the Inference Engine. This layer is implemented with low-level operations in TensorFlow: `Mean`, `StopGradient`, `SquaredDifference`, `Squeeze` and `FusedBatchNorm`. Model Optimizer should replace sub-graph with these operations with a single Inference Engine layer of type `MVN`.
|
||||
|
||||
The file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/mvn.py` performs such a replacement. The first part of the file is:
|
||||
```python
|
||||
class MVN(FrontReplacementSubgraph):
|
||||
enabled = True
|
||||
def pattern(self):
|
||||
log.debug('Enabled MVN replacement')
|
||||
return dict(
|
||||
nodes=[
|
||||
('mean', dict(op='Mean')),
|
||||
('stop_grad', dict(op='StopGradient')),
|
||||
('sqdiff', dict(op='SquaredDifference')),
|
||||
('variance', dict(op='Mean')),
|
||||
('squeeze_mean', dict(op='Squeeze')),
|
||||
('squeeze_variance', dict(op='Squeeze')),
|
||||
('fbn', dict(op='FusedBatchNorm')),
|
||||
],
|
||||
edges=[
|
||||
('mean', 'stop_grad', {'in': 0}),
|
||||
('stop_grad', 'sqdiff', {'in': 1}),
|
||||
('sqdiff', 'variance', {'in': 0}),
|
||||
('mean', 'squeeze_mean', {'in': 0}),
|
||||
('variance', 'squeeze_variance', {'in': 0}),
|
||||
('squeeze_mean', 'fbn', {'in': 3}),
|
||||
('squeeze_variance', 'fbn', {'in': 4}),
|
||||
],
|
||||
node_attrs=['op'],
|
||||
edge_attrs=['in'])
|
||||
```
|
||||
**Key lines**:
|
||||
|
||||
* Line 1: Defines class `MVN` inherited from class `FrontReplacementSubgraph` that performs sub-graph replacement using sub-graph isomorphism pattern.
|
||||
|
||||
* Line 3: Sets class variable `enabled` to value True meaning that this replacer is enabled.
|
||||
|
||||
* The function `pattern` defines the sub-graph constraints to be matched. It returns a dictionary with four keys:
|
||||
|
||||
* the `nodes` defines a list of nodes to be matched. Each element in the list is a tuple. The first element is the alias name assigned for the matched node, the second element is a dictionary with desired attributes of the node.
|
||||
|
||||
* the `edges` defines a list of edges to be matched. Each element in the list is a tuple. The first and the second elements are the start and end edge nodes alias names respectively. The third element is a dictionary with desired edge attributes.
|
||||
|
||||
* the `node_attrs` contains the names of nodes attributes to use during sub-graph isomorphism search.
|
||||
|
||||
* the `edge_attrs` contains the names of edges attributes to use during sub-graph isomorphism search.
|
||||
|
||||
The sub-graph is matched if all provided constraints are satisfied. If at least one node with desired attributes is missing or at least one defined edge is absent, the sub-graph is not matched.
|
||||
* Line 9: Adds constraint that sub-graph should contain node with attribute `op` with value `Mean`. The matched node gets an alias name `mean`. The same way the line 10 add constrain for node `StopGradient`, the matched node gets an alias name `stop_grad`.
|
||||
|
||||
* Line 18: Defines edge from node with alias name `mean` to node with alias name `stop_grad` having attribute `in` equal to 0. This means that the output of node `mean` is connected to the node `stop_grad` as a first input (Model Optimizer uses zero-based indexing that is why `in` is 0). Another example of defining the edges constraints is in line 25 where the edge from `squeeze_mean` is connected to the `fbn` node as fourth input.
|
||||
|
||||
* Lines 26 - 27: Specify a list of attributes to be checked. In fact, these lists are just list of all keys in the dictionaries for node and edge attributes.
|
||||
|
||||
Now when the Model Optimizer knows how to find sub-graph (step 1 of the sub-graph replacement), it is necessary to implement function that will perform actual sub-graph replacement (step 2 and 3). The code for this function is:
|
||||
```python
|
||||
def replace_sub_graph(self, graph: nx.MultiDiGraph, match: dict):
|
||||
fbn = match['fbn']
|
||||
input = fbn.in_node(0)
|
||||
log.debug('Found potential MVN pattern after {} with name {}'.format(input.op, input.name))
|
||||
if input.id != match['mean'].in_node(0).id or input.id != match['sqdiff'].in_node(0).id:
|
||||
return
|
||||
log.debug('Confirmed MVN pattern after {} with name {}'.format(input.op, input.name))
|
||||
MVN = Op.get_op_class_by_name('MVN')
|
||||
mvn = MVN(graph, dict(
|
||||
name=fbn.name + '/MVN_',
|
||||
eps=fbn.eps,
|
||||
required_reduction_indices=[1,2] if fbn.data_format == b'NHWC' else [2,3]
|
||||
))
|
||||
mvn.attrs['old_infer'] = mvn.attrs['infer']
|
||||
mvn.attrs['infer'] = __class__.infer
|
||||
mul = Eltwise(graph, dict(operation='mul', name=fbn.name + '/Mul_'))
|
||||
add = Eltwise(graph, dict(operation='sum', name=fbn.name + '/Add_'))
|
||||
input_gamma = fbn.in_node(1)
|
||||
input_beta = fbn.in_node(2)
|
||||
mean_reduction = match['mean'].in_node(1)
|
||||
variance_reduction = match['mean'].in_node(1)
|
||||
new_subgraph = add.create_node([
|
||||
mul.create_node([
|
||||
mvn.create_node([input, mean_reduction, variance_reduction]),
|
||||
input_gamma
|
||||
]),
|
||||
input_beta
|
||||
])
|
||||
replace_node(fbn, new_subgraph)
|
||||
```
|
||||
The function accepts two arguments - the graph and the dictionary `match`. The keys in the dictionary are the alias names of matched nodes (defined in the `nodes` list in the function `pattern`) and the values are the matched node of the graph (the instance of Node object).
|
||||
|
||||
The function generates new sub-graph with node of type `MVN` and two nodes of the type `Eltwise` calculating sum and product. There is nothing interesting in how the graph is generated and mathematics behind that, so attention will be put to two aspects of this function.
|
||||
|
||||
The first one is the call to function `replace_node` in line 36. `FusedBatchNorm` node is replaced with the output node of the generated sub-graph: all input edges of the `FusedBatchNorm` node are re-connected to the `new_subgraph` node, all consumers of the `FusedBatchNorm` node are updated to get inputs from the `new_subgraph` node. This action connects newly generated sub-graph with an existing graph (step 4 of the sub-graph replacement).
|
||||
|
||||
The second one is that the default implementation of the inference function for `MVN` operation is overwritten. In line 16, the default implementation of the inference function for `MVN` is saved to attribute `old_infer`. In line 17, the new inference function is saved to the instance of the `MVN` operation class. The new inference function code looks the following way:
|
||||
```python
|
||||
@staticmethod
|
||||
def infer(node: Node):
|
||||
if not(node.in_node(1).has_valid('value') and node.in_node(2).has_valid('value')):
|
||||
log.warning('Reduction indices for mean and variance for MVN node {} are not constants'.format(node.name))
|
||||
return
|
||||
if not(all(node.in_node(1).value == node.required_reduction_indices) and
|
||||
all(node.in_node(2).value == node.required_reduction_indices)):
|
||||
log.warning('Reduction indices for mean {} and variance {} do not match required ones {}'.format(
|
||||
node.in_node(1).value,
|
||||
node.in_node(2).value,
|
||||
node.required_reduction_indices
|
||||
))
|
||||
return
|
||||
node.graph.remove_edge(node.in_node(1).id, node.id)
|
||||
node.graph.remove_edge(node.in_node(2).id, node.id)
|
||||
node.old_infer(node)
|
||||
```
|
||||
The `infer` function is needed to infer value of the node (if it is possible) and to infer shapes of the output tensors of the node (mandatory). The custom `infer` function performs additional checks that describe limitations of the `MVN` layer implementation in the Inference Engine. For example, reduction indices for mean and variance must be constants (line 10), while in TensorFlow they could be computed during model inference. In addition, the function removes two edges from the graph (lines 17 and 18) because all required information is already stored in the `MVN` node attributes. This is due to different `MVN` layer implementation in Inference Engine and TensorFlow\*: `mean` and `variance` are attributes of the node in Inference Engine while in TensorFlow they are input tensors. Edges are not removed in the `replace_sub_graph` function, because these edges are used in the `infer` function (lines 7-12).
|
||||
|
||||
The last action in the `infer` method (line 19) is to call default infer function for the `MVN`, which is saved in the attribute `old_infer` of the node to infer output tensors shapes.
|
||||
|
||||
On the step 5 of the sub-graph replacement, six matching nodes are automatically removed during the dead code elimination pass that is performed after applying of custom sub-graph replacements defined. Six matching nodes are no more connected to the inputs of the network after replacing node `fbn` with a newly created sub-graph node. Since they are not marked as output nodes (using `--output` command line parameter), they could be removed.
|
||||
|
||||
The replacement works for all sub-graph isomorphism instances found in the network.
|
||||
|
||||
### Replace Sub-graph of Operations Using Nodes Name Pattern
|
||||
|
||||
TensorFlow uses a mechanism of scope to group related operation nodes. It is a good practice to put nodes performing particular task into the scope. This approach divides a graph into logical blocks that are easier to review in TensorBoard\*. The `scope`, in fact, just defines a common prefix for the node names in the scope.
|
||||
|
||||
For example, Inception topologies contain several types of so-called "Inception blocks". Some of them are exactly equal to each other, but located in different places of the network. For example, Inception V4 from `tensorflow.contrib.slim` module has inception blocks `Mixed_5b`, `Mixed_5c` and `Mixed_5d` with exactly the same nodes with the same attributes.
|
||||
|
||||
Now consider situation when someone implemented these Inception blocks extremely efficiently using single Inference Engine custom layer called `InceptionBlock` and would like to replace these blocks with instances of the layer to decrease inference time. Model Optimizer provides mechanism to replace sub-graph of operations defined by the regular expressions for the node names prefixes (scope). In this particular case, some of the patterns are: `.*InceptionV4/Mixed_5b`, `.*InceptionV4/Mixed_5c` and `.*InceptionV4/Mixed_5d`. Each pattern starts with `.*`, because a prefix `InceptionV4` is added to all nodes names during a model freeze.
|
||||
|
||||
The sub-graph replacement using nodes name pattern is a bit trickier than replacements of single operation and networkx isomorphism pattern described above. You should do the following additional steps in comparison with previously described replacements:
|
||||
|
||||
1. Prepare configuration file template defining node names patterns and information about custom layer attributes.
|
||||
|
||||
2. Run Model Optimizer with command line parameter to add information about input and output nodes of the specified sub-graphs.
|
||||
|
||||
Consider the following possible configuration file for the Inception Block replacer:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"custom_attributes": {
|
||||
"attr1_key": "attr1_value",
|
||||
"attr2_key": 123456
|
||||
},
|
||||
"id": "InceptionBlockReplacer",
|
||||
"op": "InceptionBlock",
|
||||
"instances": [
|
||||
".*InceptionV4/Mixed_5b",
|
||||
".*InceptionV4/Mixed_5c",
|
||||
".*InceptionV4/Mixed_5d"
|
||||
],
|
||||
"match_kind": "scope"
|
||||
}
|
||||
]
|
||||
```
|
||||
The `.json` file contains list of dictionaries. Each dictionary defines one replacement. Each replacement is defined with several keys:
|
||||
|
||||
* `id` (mandatory) is a unique identifier of the replacer. It is used in the Python\* code that implements sub-graph replacement to link the class and the replacement description from the configuration file.
|
||||
|
||||
* `match_kind` (mandatory) is a string that specifies what matching algorithm is used. Currently supported `scope` and `points`. In this example, the first one is considered. The `points` match kind is described below.
|
||||
|
||||
* `instances` (mandatory) specifies instances of the sub-graph to be matched. It contains a list of node names prefixes patterns for the match kind `scope`.
|
||||
|
||||
* `custom_attributes` (optional) is a dictionary with static attributes of the layer to be dumped to Inference Engine Intermediate Representation `.xml` file.
|
||||
|
||||
* `op` (optional) is used only if the sub-graph replacement Python code is not needed, because the sub-graph should be replaced with a single node of type `op`. If this attribute is not set, it is necessary to implement Python code with sub-graph generation code. Both options are considered in this example.
|
||||
|
||||
When the configuration file is ready, run the Model Optimizer with regular command line parameters pointing to the file with model and input shapes (if necessary) and additional parameter `--tensorflow_custom_operations_config_update` pointing to the generated configuration file. If the file is correct, Model Optimizer adds two keys to the `InceptionBlockReplacer` dictionary: `inputs` and `outputs` with the following content:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": "InceptionBlockReplacer",
|
||||
...
|
||||
"inputs": [
|
||||
[
|
||||
{
|
||||
"node": "Branch_2/Conv2d_0a_1x1/Conv2D$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "Branch_3/AvgPool_0a_3x3/AvgPool$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "Branch_1/Conv2d_0a_1x1/Conv2D$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "Branch_0/Conv2d_0a_1x1/Conv2D$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
],
|
||||
"outputs": [
|
||||
{
|
||||
"node": "concat$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
The value for key `inputs` is a list of lists describing input tensors of the sub-graph. Each element of the top-level list corresponds to one unique input tensor of the sub-graph. Each internal list describes a list of nodes consuming this tensor and port numbers where the tensor is consumed. Model Optimizer generates regular expressions for the input nodes names to uniquely identify them in each instance of the sub-graph defined by the `instances`. Denote these nodes as input nodes of the sub-graph.
|
||||
|
||||
In the InceptionV4 topology, the `InceptionV4/Mixed_5b` block has four input tensors from outside of the sub-graph, but all of them are produced by the node `InceptionV4/Mixed_5a/concat`. Therefore, the top-level list of the `inputs` contains one list corresponding to this tensor. Four input nodes of the sub-graph consume the tensor produced by `InceptionV4/Mixed_5a/concat` node. In this case, all four input nodes consume input tensor into port 0.
|
||||
|
||||
The order of items in the internal list describing nodes does not matter, but the order of elements in the top-level list is important. This order defines the order in which the Model Optimizer attaches input tensors to a new generated node if the sub-graph is replaced with a single node. The i-th input node of the sub-graph is obtained using call `match.single_input_node(i)` in the sub-graph replacer code. More information about API is given below. If you need to change the order of input tensors, you can edit the configuration file in the text-editor.
|
||||
|
||||
The value for the key `outputs` is a list describing nodes of the sub-graph producing tensor that goes outside of the sub-graph or does not have child nodes. Denote these nodes as output nodes of the sub-graph. The order of elements in the list is important. The i-th element of the list describes the i-th output tensor of the sub-graph, which could be obtained using call `match.output_node(i)`. The order of elements can be manually changed in the configuration file. Model Optimizer uses this order to connect output edges if the sub-graph is replaced with a single node.
|
||||
|
||||
Now, when meaning of `inputs` and `outputs` attributes is clean, return back to the replacer implementation. The replacer `InceptionBlockReplacer` contains attribute `op` with the value `InceptionBlock`, which means that the identified sub-graph should be replaced with a single layer of type `InceptionBlock`. This layer is not known for the Model Optimizer, so it is necessary to define it. See [Extending the Model Optimizer with New Primitives](Extending_Model_Optimizer_with_New_Primitives.md). You must create file `extension/ops/InceptionBlock.py` with the following content:
|
||||
```python
|
||||
import numpy as np
|
||||
from mo.graph.graph import Node
|
||||
from mo.ops.op import Op
|
||||
class InceptionBlock(Op):
|
||||
op = "InceptionBlock"
|
||||
enabled = True
|
||||
def __init__(self, graph, attrs):
|
||||
super().__init__(graph, attrs, {
|
||||
'type': __class__.op,
|
||||
'op': __class__.op,
|
||||
})
|
||||
```
|
||||
The shape inference function is not defined. In this case, Model Optimizer uses TensorFlow fallback to calculate shapes of the sub-graph output tensors.
|
||||
|
||||
Run the Model Optimizer with the regular command line parameters, path to the model file and input shape (if necessary), and the parameter `--tensorflow_use_custom_operations_config` and point to the created configuration file. Model Optimizer generates Intermediate Representation `.xml` file with three sequential layers of type `InceptionBlock` like in the following example:
|
||||
```xml
|
||||
<layer id="1658" name="InceptionBlock1877" precision="FP32" type="InceptionBlock">
|
||||
<input>
|
||||
<port id="0">
|
||||
<dim>1</dim>
|
||||
<dim>384</dim>
|
||||
<dim>35</dim>
|
||||
<dim>35</dim>
|
||||
</port>
|
||||
</input>
|
||||
<output>
|
||||
<port id="1">
|
||||
<dim>1</dim>
|
||||
<dim>384</dim>
|
||||
<dim>35</dim>
|
||||
<dim>35</dim>
|
||||
</port>
|
||||
</output>
|
||||
</layer>
|
||||
```
|
||||
The implementation of the sub-graph replacement by scope with a single layer is complete. The next subsection explains
|
||||
how Model Optimizer replaces sub-graph identified by start/end nodes (`points`) with another sub-graph.
|
||||
|
||||
### <a name="sub_graph_replacement_using_points"></a> Replace Sub-graph of Operations Using Points
|
||||
In this scenario, for the matching algorithm user defines the sub-graph via a set of "start" and "end" nodes.
|
||||
Given the set, the Model Optimizer performs the following steps:
|
||||
1. Starts graph traversal from every _start_ nodes following the direction of the graph edges.
|
||||
The search stops in _end_ nodes or in case of nodes without further children. All visited nodes are added to the matched sub-graph.
|
||||
2. Starts another graph traversal from each non-start node of the sub-graph, i.e. every node except nodes from "start" set.
|
||||
In this step the edges are traversed in the opposite edge direction. All newly visited nodes are added to the
|
||||
matched sub-graph. This step is needed to add nodes required for calculation values of internal nodes of the
|
||||
matched sub-graph.
|
||||
3. Checks that all "end" nodes were reached from "input" nodes. If no then exit with error.
|
||||
4. Check that there are no "Placeholder" operations among added nodes. If it is not true then some side branch of
|
||||
the sub-graph (added in step 2) depends on inputs of the network. Such configuration is not correct so exit with error.
|
||||
|
||||
This algorithm finds all nodes "between" start and end nodes. Also nodes needed for calculation of non-input nodes of the
|
||||
matched sub-graph produce _constant_ values because they do not depend on input of the network.
|
||||
**This sub-graph match has a limitation that each start node must have only one input**. Therefore, it is not possible
|
||||
to specify, for example, convolution node as input because it has two inputs: data tensor and tensor with weights.
|
||||
|
||||
For example of replacement with points, please refer to the case-study of the
|
||||
[conversion for the SSD models, created with TensorFlow Object Detection API](TensorFlow_SSD_ObjectDetection_API.md).
|
||||
The document has been deprecated. Refer to the [Model Optimizer Extensibility](Subgraph_Replacement_Model_Optimizer.md)
|
||||
for the up-to-date documentation.
|
||||
|
||||
@@ -1,449 +0,0 @@
|
||||
# Converting Faster R-CNN models, created with TensorFlow Object Detection API {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_Faster_RCNN_ObjectDetection_API}
|
||||
|
||||
This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results.
|
||||
|
||||
## Converting models created with TensorFlow Object Detection API version equal or higher than 1.6.0
|
||||
This chapter describes how to convert selected Faster R-CNN models from the TensorFlow Object Detection API zoo version equal or higher than 1.6.0. The full list of supported models is provided in the table below. Note that currently batch size 1 is supported only. The only Inference Engine plugin supporting these topologies inference is CPU.
|
||||
|
||||
The Faster R-CNN models contain several building blocks similar to building blocks from SSD models so it is highly recommended to read chapter about [enabling TensorFlow Object Detection API SSD models](TensorFlow_SSD_ObjectDetection_API.md) first. Detailed information about Faster R-CNN topologies is provided [here](https://arxiv.org/abs/1506.01497).
|
||||
|
||||
The TensorFlow network consists of a number of big blocks grouped by scope:
|
||||
|
||||
* `Preprocessor` performs scaling/resizing of the image and converts input data to [0, 1] interval. Has two outputs: the first one is modified input image and the second one is a constant tensor with shape (batch_size, 3) and values (resized_image_height, resized_image_width, 3).
|
||||
|
||||
* `FirstStageFeatureExtractor` is a backbone feature extractor.
|
||||
|
||||
* `FirstStageBoxPredictor` calculates boxes and classes predictions.
|
||||
|
||||
* `GridAnchorGenerator` generates anchors coordinates.
|
||||
|
||||
* `ClipToWindow` crops anchors to the resized image size.
|
||||
|
||||
* `Decode` decodes coordinates of boxes using anchors and data from the `FirstStageBoxPredictor`.
|
||||
|
||||
* `BatchMultiClassNonMaxSuppression` performs non maximum suppression.
|
||||
|
||||
* `map` scales coordinates of boxes to [0, 1] interval by dividing coordinates by (resized_image_height, resized_image_width).
|
||||
|
||||
* `map_1` scales coordinates from [0, 1] interval to resized image sizes.
|
||||
|
||||
* `SecondStageFeatureExtractor` is a feature extractor for predicted Regions of interest (ROIs).
|
||||
|
||||
* `SecondStageBoxPredictor` refines box coordinates according `SecondStageFeatureExtractor`.
|
||||
|
||||
* `SecondStagePostprocessor` is Detection Output layer performing final boxes predictions.
|
||||
|
||||
### Sub-graph replacements
|
||||
There are three sub-graph replacements defined in the `extensions/front/tf/legacy_faster_rcnn_support.json` used to convert these models:
|
||||
|
||||
* the first one replaces the `Preprocessor` block. The implementation of this replacer is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py`
|
||||
|
||||
* the second one replaces a number of blocks in the the graph including `GridAnchorGenerator`, `ClipToWindow`, `Decode`, `BatchMultiClassNonMaxSuppression`, `Tile`, `Tile_1` and `map` with Proposal and ROIRooling layers and some additional layers to pre-process input data
|
||||
|
||||
* the third one replaces `SecondStagePostprocessor` with a DetectionOutput layer.
|
||||
|
||||
The second replacer is defined using the following configuration that matches sub-graph by points:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_attributes": {
|
||||
"nms_threshold": 0.7,
|
||||
"feat_stride": 16,
|
||||
"max_proposals": 100,
|
||||
"anchor_base_size": 256,
|
||||
"anchor_scales": [0.25, 0.5, 1.0, 2.0],
|
||||
"anchor_aspect_ratios": [0.5, 1.0, 2.0],
|
||||
"roi_spatial_scale": 0.0625
|
||||
},
|
||||
"id": "TFObjectDetectionAPIFasterRCNNProposalAndROIPooling",
|
||||
"include_inputs_to_sub_graph": true,
|
||||
"include_outputs_to_sub_graph": true,
|
||||
"instances": {
|
||||
"end_points": [
|
||||
"CropAndResize",
|
||||
"map_1/TensorArrayStack/TensorArrayGatherV3",
|
||||
"map_1/while/strided_slice/Enter",
|
||||
"BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3"
|
||||
],
|
||||
"start_points": [
|
||||
"FirstStageBoxPredictor/concat",
|
||||
"FirstStageBoxPredictor/concat_1",
|
||||
"GridAnchorGenerator/Identity",
|
||||
"Shape",
|
||||
"CropAndResize"
|
||||
]
|
||||
},
|
||||
"match_kind": "points"
|
||||
}
|
||||
```
|
||||
|
||||
The `start_points` list contains the following nodes:
|
||||
|
||||
* `FirstStageBoxPredictor/concat` node produces box coordinates predictions.
|
||||
|
||||
* `FirstStageBoxPredictor/concat_1` node produces classes predictions which will be used for the ROIs
|
||||
|
||||
* `GridAnchorGenerator/Identity` node produces anchors coordinates.
|
||||
|
||||
* `Shape` and `CropAndResize` nodes are specified as inputs to correctly isolate the required sub-graph. Refer to the [chapter](Subgraph_Replacement_Model_Optimizer.md) for more information about replacements by points.
|
||||
|
||||
The `end_points` list contains the following nodes:
|
||||
|
||||
* `CropAndResize` is the node that performs ROI pooling operation.
|
||||
|
||||
* `map_1/TensorArrayStack/TensorArrayGatherV3`, `map_1/while/strided_slice/Enter` and `BatchMultiClassNonMaxSuppression/map/TensorArrayStack_4/TensorArrayGatherV3` are specified to correctly isolate the sub-graph.
|
||||
|
||||
The `custom_attributes` dictionary contains attributes where most values are taken from the topology-specific configuration file `samples/configs/faster_rcnn_*.config` of the [TensorFlow Object Detection API repository](https://github.com/tensorflow/models/tree/master/research/object_detection):
|
||||
|
||||
* `nms_threshold` is the value of the `first_stage_nms_iou_threshold` parameter.
|
||||
|
||||
* `feat_stride` is the value of the `height_stride` and `width_stride` parameters. Inference Engine supports case when these two values are equal that is why the replacement configuration file contains just one parameter.
|
||||
|
||||
* `max_proposals` is the value of the `max_total_detections` parameter which is a maximum number of proposal boxes from the Proposal layer and detected boxes.
|
||||
|
||||
* `anchor_base_size` is the base size of the generated anchor. The 256 is the default value for this parameter and it is not specified in the configuration file.
|
||||
|
||||
* `anchor_scales" is the value of the `scales` attrbite.
|
||||
|
||||
* `anchor_aspect_ratios` is the value of the `aspect_ratios` attribute.
|
||||
|
||||
* `roi_spatial_scale` is needed for the Inference Engine ROIPooling layer. It is the default value that is not actually used.
|
||||
|
||||
The identifier for this replacer is `TFObjectDetectionAPIFasterRCNNProposalAndROIPooling`. The Python implementation of this replacer is in the file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/FasterRCNNs.py`.
|
||||
|
||||
The first four functions of the replacer class are the following:
|
||||
|
||||
```python
|
||||
class TFObjectDetectionAPIFasterRCNNProposalAndROIPooling(FrontReplacementFromConfigFileSubGraph):
|
||||
"""
|
||||
This class replaces sub-graph of operations with Proposal and ROIPooling layers and additional layers transforming
|
||||
tensors from layout of TensorFlow to layout required by Inference Engine.
|
||||
Refer to comments inside the function for more information about performed actions.
|
||||
"""
|
||||
replacement_id = 'TFObjectDetectionAPIFasterRCNNProposalAndROIPooling'
|
||||
|
||||
def run_after(self):
|
||||
return [PreprocessorReplacement]
|
||||
|
||||
def run_before(self):
|
||||
return [SecondStagePostprocessorReplacement]
|
||||
|
||||
def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
|
||||
return {match.output_node(0)[0].id: new_sub_graph['roi_pooling_node'].id}
|
||||
|
||||
def nodes_to_remove(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
|
||||
new_list = match.matched_nodes_names().copy()
|
||||
# do not remove nodes that produce box predictions and class predictions
|
||||
new_list.remove(match.single_input_node(0)[0].id)
|
||||
new_list.remove(match.single_input_node(1)[0].id)
|
||||
return new_list
|
||||
```
|
||||
|
||||
The function `run_after` returns list of Python classes inherited from one of the replacer classes (`FrontReplacementOp`, `FrontReplacementPattern`, `FrontReplacementFromConfigFileSubGraph` etc) those current sub-graph replacement class must be run after. In this case the replacer must be run after the `Preprocessor` is removed by the `PreprocessorReplacement` replacer. Similar way the `run_before` function is used to tell Model Optimizer to execute `SecondStagePostprocessorReplacement` before this replacer.
|
||||
|
||||
The `output_edges_match` function describes matching between the output nodes of the sub-graph before replacement and after. In this case the only needed output node of the sub-graph is the `CropAndResize` node which is identified with `match.output_node(0)[0]`. The new output node which is created in the `generate_sub_graph` function is identified with `new_sub_graph['roi_pooling_node']`.
|
||||
|
||||
The `nodes_to_remove` function takes the default list of nodes to be removed which contains all matched nodes and remove from them two input nodes which are identified with `match.single_input_node(0)[0]` and `match.single_input_node(1)[0]`. These nodes will be connected as inputs to new nodes being generated in the `generate_sub_graph` function so they should node be removed.
|
||||
|
||||
The code generating new sub-graph is the following:
|
||||
|
||||
```python
|
||||
def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
|
||||
log.debug('TFObjectDetectionAPIFasterRCNNProposal: matched_nodes = {}'.format(match.matched_nodes_names()))
|
||||
|
||||
config_attrs = match.custom_replacement_desc.custom_attributes
|
||||
nms_threshold = config_attrs['nms_threshold']
|
||||
feat_stride = config_attrs['feat_stride']
|
||||
max_proposals = config_attrs['max_proposals']
|
||||
anchor_base_size = config_attrs['anchor_base_size']
|
||||
roi_spatial_scale = config_attrs['roi_spatial_scale']
|
||||
proposal_ratios = config_attrs['anchor_aspect_ratios']
|
||||
proposal_scales = config_attrs['anchor_scales']
|
||||
anchors_count = len(proposal_ratios) * len(proposal_scales)
|
||||
```
|
||||
|
||||
These lines get parameters defined in the sub-graph replacement configuration file and calculate initial anchors count.
|
||||
|
||||
```python
|
||||
# get the ROIPool size from the CropAndResize which performs the same action
|
||||
if 'CropAndResize' not in graph.nodes():
|
||||
raise Error('Failed to find node with name "CropAndResize" in the topology. Probably this is not Faster'
|
||||
' RCNN topology or it is not supported')
|
||||
roi_pool_size = Node(graph, 'CropAndResize').in_node(3).value[0]
|
||||
```
|
||||
|
||||
The code above gets the ROI Pooling spatial output dimension size as a value from the fourth argument of the node with name `CropAndResize`.
|
||||
|
||||
```python
|
||||
# Convolution/matmul node that produces classes predictions
|
||||
# Permute result of the tensor with classes permissions so it will be in a correct layout for Softmax
|
||||
predictions_node = match.single_input_node(1)[0].in_node(0).in_node(0)
|
||||
permute_predictions_op = Permute(graph, {'order': np.array([0, 2, 3, 1])})
|
||||
permute_predictions_node = permute_predictions_op.create_node([], dict(name=predictions_node.name + '/Permute_'))
|
||||
insert_node_after(predictions_node, permute_predictions_node, 0)
|
||||
|
||||
reshape_classes_op = Reshape(graph, {'dim': np.array([0, -1, 2])})
|
||||
reshape_classes_node = reshape_classes_op.create_node([permute_predictions_node],
|
||||
dict(name='Reshape_FirstStageBoxPredictor_Class_'))
|
||||
update_attrs(reshape_classes_node, 'shape_attrs', 'dim')
|
||||
|
||||
softmax_conf_op = Softmax(graph, {'axis': 1})
|
||||
softmax_conf_node = softmax_conf_op.create_node([reshape_classes_node],
|
||||
dict(name='FirstStageBoxPredictor_SoftMax_Class_'))
|
||||
```
|
||||
|
||||
The output with class predictions from the `FirstStageBoxPredictor` is generated with a convolution operation. The convolution output data layout in TensorFlow is NHWC while Inference Engine uses NCHW layout. Model Optimizer by default converts the weights of TensorFlow convolutions to produce output tensor in NCHW layout required by Inference Engine. The issue arises because the class predictions tensor is passed through the Softmax operation to produce class probabilities. The Inference Engine Softmax is performed over the fastest-changing dimension which is 'W' in Inference Engine. Thus, the softmax operation will be performed over a wrong dimension after conversion of the convolution node producing classes predicitions. The solution is to add Permute and Reshape operations to prepare the input data for Softmax. The Reshape operation is required to make the size of the fastest-changing dimension equal to 2, because there are 2 classes being predicted: background and foreground.
|
||||
|
||||
Another issue is that layout of elements in the predicted classes tensor is different between TensorFlow and Inference Engine Proposal layer requirements. In TensorFlow the tensor has the following virtual layout [N, H, W, num_anchors, num_classes] while the Inference Engine Proposal layer requires in the following virtual layout [N, num_classes, num_anchors, H, W]. Thus, it is necessary to reshape, permute and then reshape again output from the Softmax to the required shape for the Proposal layer:
|
||||
|
||||
```python
|
||||
reshape_softmax_op = Reshape(graph, {'dim': np.array([1, anchors_count, 2, -1])})
|
||||
reshape_softmax_node = reshape_softmax_op.create_node([softmax_conf_node], dict(name='Reshape_Softmax_Class_'))
|
||||
update_attrs(reshape_softmax_node, 'shape_attrs', 'dim')
|
||||
|
||||
permute_reshape_softmax_op = Permute(graph, {'order': np.array([0, 1, 3, 2])})
|
||||
permute_reshape_softmax_node = permute_reshape_softmax_op.create_node([reshape_softmax_node],
|
||||
dict(name='Permute_'))
|
||||
|
||||
# implement custom reshape infer function because we need to know the input convolution node output dimension
|
||||
# sizes but we can know it only after partial infer
|
||||
reshape_permute_op = Reshape(graph, {'dim': np.ones([4]), 'anchors_count': anchors_count,
|
||||
'conv_node': predictions_node})
|
||||
reshape_permute_op.attrs['old_infer'] = reshape_permute_op.attrs['infer']
|
||||
reshape_permute_op.attrs['infer'] = __class__.classes_probabilities_reshape_shape_infer
|
||||
reshape_permute_node = reshape_permute_op.create_node([permute_reshape_softmax_node],
|
||||
dict(name='Reshape_Permute_Class_'))
|
||||
update_attrs(reshape_permute_node, 'shape_attrs', 'dim')
|
||||
```
|
||||
|
||||
The Proposal layer has 3 inputs: classes probabilities, boxes predictions and a input shape of the image. The first two tensors are ready so it is necessary to create the Const operation that produces the desired third input tensor.
|
||||
|
||||
```python
|
||||
# create constant input with the image height, width and scale H and scale W (if present) required for Proposal
|
||||
const_value = np.array([[input_height, input_width, 1]], dtype=np.float32)
|
||||
const_op = Const(graph, dict(value=const_value, shape=const_value.shape))
|
||||
const_node = const_op.create_node([], dict(name='Proposal_const_image_size_'))
|
||||
```
|
||||
|
||||
Now add the Proposal layer:
|
||||
|
||||
```python
|
||||
|
||||
proposal_op = ProposalOp(graph, dict(min_size=10, framework='tensorflow', box_coordinate_scale=10,
|
||||
box_size_scale=5, post_nms_topn=max_proposals, feat_stride=feat_stride,
|
||||
ratio=proposal_ratios, scale=proposal_scales, base_size=anchor_base_size,
|
||||
pre_nms_topn=2**31 - 1,
|
||||
nms_thresh=nms_threshold))
|
||||
proposal_node = proposal_op.create_node([reshape_permute_node,
|
||||
match.single_input_node(0)[0].in_node(0).in_node(0),
|
||||
const_node],
|
||||
dict(name=proposal_op.attrs['type'] + '_'))
|
||||
```
|
||||
|
||||
The box coordinates in the TensorFlow are in the following layout "YXYX" while Inference Engine uses "XYXY" layout so it is necessary to swap coordinates produced by Proposal layer. It is implemented with help of a convolution node with a special filter of a size [5, 5]:
|
||||
|
||||
```python
|
||||
proposal_reshape_4d_op = Reshape(graph, {'dim': np.array([max_proposals, 1, 1, 5])})
|
||||
proposal_reshape_4d_node = proposal_reshape_4d_op.create_node([proposal_node], dict(name="reshape_4d_"))
|
||||
update_attrs(proposal_reshape_4d_node, 'shape_attrs', 'dim')
|
||||
|
||||
# create convolution node to swap X and Y coordinates in the proposals
|
||||
conv_filter_const_data = np.array(np.array([[1, 0, 0, 0, 0],
|
||||
[0, 0, 1, 0, 0],
|
||||
[0, 1, 0, 0, 0],
|
||||
[0, 0, 0, 0, 1],
|
||||
[0, 0, 0, 1, 0]],
|
||||
dtype=np.float32).reshape([1, 1, 5, 5]), dtype=np.float32)
|
||||
conv_filter_const_op = Const(graph, dict(value=conv_filter_const_data, spatial_dims=np.array([2, 3])))
|
||||
conv_filter_const_node = conv_filter_const_op.create_node([], dict(name="conv_weights"))
|
||||
|
||||
conv_op = Op(graph, {
|
||||
'op': 'Conv2D',
|
||||
'bias_addable': False,
|
||||
'spatial_dims': np.array([1, 2]),
|
||||
'channel_dims': np.array([3]),
|
||||
'batch_dims': np.array([0]),
|
||||
'pad': None,
|
||||
'pad_spatial_shape': None,
|
||||
'input_feature_channel': 2,
|
||||
'output_feature_channel': 2,
|
||||
'output_shape': [max_proposals, 1, 1, 5],
|
||||
'dilation': np.array([1, 1, 1, 1], dtype=np.int64),
|
||||
'stride': np.array([1, 1, 1, 1]),
|
||||
'type': 'Convolution',
|
||||
'group': None,
|
||||
'layout': 'NHWC',
|
||||
'infer': __class__.fake_conv_shape_infer})
|
||||
predictions_node = conv_op.create_node([proposal_reshape_4d_node, conv_filter_const_node], dict(name="conv_"))
|
||||
update_ie_fields(graph.node[predictions_node.id])
|
||||
|
||||
proposal_reshape_2d_op = Reshape(graph, {'dim': np.array([max_proposals, 5])})
|
||||
proposal_reshape_2d_node = proposal_reshape_2d_op.create_node([predictions_node], dict(name="reshape_2d_"))
|
||||
# set specific name for this Reshape operation so we can use it in the DetectionOutput replacer
|
||||
proposal_reshape_2d_node['name'] = 'swapped_proposals'
|
||||
```
|
||||
|
||||
The ROIPooling layer in TensorFlow is implemented with operation called `CropAndResize` with bi-linear filtration. Inference Engine implementation of the ROIPooling layer with bi-linear filtration requires input boxes coordinates be scaled to [0, 1] interval. Adding elementwise multiplication of box coordinates solves this issue:
|
||||
|
||||
```python
|
||||
# the TF implementation of Proposal with bi-linear filtration need proposals scaled by image size
|
||||
proposal_scale_const = np.array([1.0, 1 / input_height, 1 / input_width, 1 / input_height, 1 / input_width],
|
||||
dtype=np.float32)
|
||||
proposal_scale_const_op = Const(graph, dict(value=proposal_scale_const, shape=proposal_scale_const.shape))
|
||||
proposal_scale_const_node = proposal_scale_const_op.create_node([], dict(name='Proposal_scale_const_'))
|
||||
|
||||
scale_proposals_op = Eltwise(graph, {'operation': 'mul'})
|
||||
scale_proposals_node = scale_proposals_op.create_node([proposal_reshape_2d_node, proposal_scale_const_node],
|
||||
dict(name='scale_proposals_'))
|
||||
```
|
||||
|
||||
The last step is to create the ROIPooling node with 2 inputs: the identified feature maps from the `FirstStageFeatureExtractor` and the scaled output of the Proposal layer:
|
||||
|
||||
```python
|
||||
feature_extractor_output_nodes = scope_output_nodes(graph, 'FirstStageFeatureExtractor')
|
||||
if len(feature_extractor_output_nodes) != 1:
|
||||
raise Error("Failed to determine FirstStageFeatureExtractor output node to connect it to the ROIPooling."
|
||||
"Found the following nodes: {}".format([node.name for node in feature_extractor_output_nodes]))
|
||||
|
||||
roi_pooling_op = ROIPooling(graph, dict(method="bilinear", framework="tensorflow",
|
||||
pooled_h=roi_pool_size, pooled_w=roi_pool_size,
|
||||
spatial_scale=roi_spatial_scale))
|
||||
roi_pooling_node = roi_pooling_op.create_node([feature_extractor_output_nodes[0], scale_proposals_node],
|
||||
dict(name='ROI_Pooling_'))
|
||||
|
||||
return {'roi_pooling_node': roi_pooling_node}
|
||||
```
|
||||
|
||||
The are two additional methods implemented in the replacer class:
|
||||
|
||||
* The `fake_conv_shape_infer` is a silly infer function for the convolution that permutes X and Y coordinates of the Proposal output which avoids setting a lot of internal attributes required for propoper shape inference.
|
||||
|
||||
* The "classes_probabilities_reshape_shape_infer" function is used to update the output dimension of the reshape operation. The output spatial dimensions depends on the convolution output spatial dimensions thus they are not known until the shape inference pass which is performed after this sub-graph replacement class. So this custom infer function is called instead of default Reshape shape inference function, updates the required attribute "dim" of the node with the convolution output spatial dimensions which are known at the time of calling this inference function and then call the default Reshape inference function.
|
||||
|
||||
```python
|
||||
@staticmethod
|
||||
def fake_conv_shape_infer(node: Node):
|
||||
node.out_node(0).shape = node.in_node(0).shape
|
||||
# call functions to update internal attributes required for correct IR generation
|
||||
mark_input_bins(node)
|
||||
assign_dims_to_weights(node.in_node(1), [0, 1], node.input_feature_channel, node.output_feature_channel, 4)
|
||||
|
||||
@staticmethod
|
||||
def classes_probabilities_reshape_shape_infer(node: Node):
|
||||
# now we can determine the reshape dimensions from Convolution node
|
||||
conv_node = node.conv_node
|
||||
conv_output_shape = conv_node.out_node().shape
|
||||
|
||||
# update desired shape of the Reshape node
|
||||
node.dim = np.array([0, conv_output_shape[1], conv_output_shape[2], node.anchors_count * 2])
|
||||
node.old_infer(node)
|
||||
```
|
||||
|
||||
The second replacer defined in the sub-graph replacement configuration file replaces the `SecondStagePostprocessor` block and is defined using scope:
|
||||
|
||||
```json
|
||||
{
|
||||
"custom_attributes": {
|
||||
"code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
|
||||
"confidence_threshold": 0.01,
|
||||
"keep_top_k": 300,
|
||||
"nms_threshold": 0.6,
|
||||
"pad_mode": "caffe.ResizeParameter.CONSTANT",
|
||||
"resize_mode": "caffe.ResizeParameter.WARP",
|
||||
"max_detections_per_class": 100,
|
||||
"num_classes": 90
|
||||
},
|
||||
"id": "SecondStagePostprocessorReplacement",
|
||||
"inputs": [
|
||||
[
|
||||
{
|
||||
"node": "Reshape$",
|
||||
"port": 0
|
||||
}
|
||||
],
|
||||
[
|
||||
{
|
||||
"node": "Reshape_1$",
|
||||
"port": 0
|
||||
}
|
||||
],
|
||||
[
|
||||
{
|
||||
"node": "ExpandDims$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
],
|
||||
"instances": [
|
||||
".*SecondStagePostprocessor/"
|
||||
],
|
||||
"match_kind": "scope",
|
||||
"outputs": [
|
||||
{
|
||||
"node": "BatchMultiClassNonMaxSuppression/map/TensorArrayStack/TensorArrayGatherV3$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
The replacement code is similar to the `SecondStagePostprocessor` replacement for the SSDs topologies. The are two major difference:
|
||||
|
||||
* The tensor with bounding boxes doesn't contain locations for class 0 (background class) but Inference Engine Detection Output layer requires it. The Const node with some dummy values are created and concatenated with the tensor.
|
||||
|
||||
* The priors tensor is not constant like in SSDs so the bounding boxes tensor must be scaled with variances [0.1, 0.1, 0.2, 0.2].
|
||||
|
||||
The described above difference are resolved with the following code:
|
||||
|
||||
```python
|
||||
# TF produces locations tensor without boxes for background.
|
||||
# Inference Engine DetectionOutput layer requires background boxes so we generate them with some values
|
||||
# and concatenate with locations tensor
|
||||
fake_background_locs_blob = np.tile([[[1, 1, 2, 2]]], [max_detections_per_class, 1, 1])
|
||||
fake_background_locs_const_op = Const(graph, dict(value=fake_background_locs_blob,
|
||||
shape=fake_background_locs_blob.shape))
|
||||
fake_background_locs_const_node = fake_background_locs_const_op.create_node([])
|
||||
|
||||
reshape_loc_op = Reshape(graph, {'dim': np.array([max_detections_per_class, num_classes, 4])})
|
||||
reshape_loc_node = reshape_loc_op.create_node([match.single_input_node(0)[0].in_node(0)],
|
||||
dict(name='Reshape_loc_'))
|
||||
|
||||
concat_loc_op = Concat(graph, {'axis': 1})
|
||||
concat_loc_node = concat_loc_op.create_node([fake_background_locs_const_node, reshape_loc_node],
|
||||
dict(name='Concat_fake_loc_'))
|
||||
|
||||
# blob with variances
|
||||
variances_blob = np.array([0.1, 0.1, 0.2, 0.2])
|
||||
variances_const_op = Const(graph, dict(value=variances_blob, shape=variances_blob.shape))
|
||||
variances_const_node = variances_const_op.create_node([])
|
||||
|
||||
# reshape locations tensor to 2D so it could be passed to Eltwise which will be converted to ScaleShift
|
||||
reshape_loc_2d_op = Reshape(graph, {'dim': np.array([-1, 4])})
|
||||
reshape_loc_2d_node = reshape_loc_2d_op.create_node([concat_loc_node], dict(name='reshape_locs_2d_'))
|
||||
|
||||
# element-wise multiply locations with variances
|
||||
eltwise_locs_op = Eltwise(graph, {'operation': 'mul'})
|
||||
eltwise_locs_node = eltwise_locs_op.create_node([reshape_loc_2d_node, variances_const_node],
|
||||
dict(name='scale_locs_'))
|
||||
```
|
||||
|
||||
### Example of Model Optimizer Command-Line for TensorFlow's Faster R-CNNs
|
||||
The final command line to convert Faster R-CNNs from the TensorFlow* Object Detection Zoo is the following:
|
||||
|
||||
```sh
|
||||
./mo.py --input_model=<path_to_frozen.pb> --output=detection_boxes,detection_scores,num_detections --tensorflow_use_custom_operations_config extensions/front/tf/legacy_faster_rcnn_support.json
|
||||
```
|
||||
|
||||
Note that there are minor changes that should be made to the and sub-graph replacement configuration file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_faster_rcnn_support.json` before converting particular Faster R-CNN topology. Refer to the table below.
|
||||
|
||||
### Sub-Graph Replacement Configuration File Parameters to Convert Different Faster R-CNN Models
|
||||
|Model Name | Configuration File Changes|
|
||||
|:----|:----:|
|
||||
| faster_rcnn_inception_v2_coco | None
|
||||
| faster_rcnn_resnet50_coco | None
|
||||
| faster_rcnn_resnet50_lowproposals_coco | None
|
||||
| faster_rcnn_resnet101_coco | None
|
||||
| faster_rcnn_resnet101_lowproposals_coco | None
|
||||
| faster_rcnn_inception_resnet_v2_atrous_coco | "feat_stride: 8"
|
||||
| faster_rcnn_inception_resnet_v2_atrous_lowproposals_coco| "feat_stride: 8"
|
||||
|
||||
@@ -1,339 +0,0 @@
|
||||
# (Deprecated) Case Study: Converting SSD Models Created with TensorFlow* Object Detection API {#openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_SSD_ObjectDetection_API}
|
||||
|
||||
This is a deprecated page. Please, consider reading [this](../convert_model/tf_specific/Convert_Object_Detection_API_Models.md) page describing new approach to convert Object Detection API models giving closer to TensorFlow inference results.
|
||||
|
||||
## Converting Models Created with TensorFlow Object Detection API Version prior 1.6.0
|
||||
|
||||
As explained in the [Sub-graph Replacement in Model Optimizer](Subgraph_Replacement_Model_Optimizer.md) section, there are multiple
|
||||
ways to setup the sub-graph matching. In this example we are focusing on the defining the sub-graph via a set of
|
||||
"start" and "end" nodes.
|
||||
The result of matching is two buckets of nodes:
|
||||
* Nodes "between" start and end nodes.
|
||||
* Nodes connected to the first list, but just on the constant path (e.g. these nodes are not connected to the inputs of the entire graph).
|
||||
|
||||
Let's look closer to the SSD models from the TensorFlow* detection model
|
||||
<a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">zoo</a>:
|
||||
[SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and
|
||||
[SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz).
|
||||
|
||||
* Nodes "between" start and end nodes
|
||||
* Nodes connected to the first list, but just on the constant path (for example, these nodes are not connected to the inputs of the entire graph). Let's look closer to the SSD models from the TensorFlow\* detection model <a href="https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md">zoo</a> : [SSD MobileNet](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2017_11_17.tar.gz) and [SSD InceptionV2](http://download.tensorflow.org/models/object_detection/ssd_inception_v2_coco_2017_11_17.tar.gz).
|
||||
|
||||
A distinct layer of any SSD topology is the `DetectionOutput` layer. This layer is implemented with a dozens of primitive operations in TensorFlow, while in Inference Engine, it is one [layer](../../../ops/opset.md). Thus, to convert a SSD model from the TensorFlow, the Model Optimizer should replace the entire sub-graph of operations that implement the `DetectionOutput` layer with a single well-known `DetectionOutput` node.
|
||||
|
||||
The Inference Engine `DetectionOutput` layer consumes three tensors in the following order:
|
||||
|
||||
1. Tensor with locations of bounding boxes
|
||||
2. Tensor with confidences for each bounding box
|
||||
3. Tensor with prior boxes (anchors in TensorFlow terminology)
|
||||
|
||||
`DetectionOutput` layer produces one tensor with seven numbers for each actual detection. There are more output tensors in the TensorFlow Object Detection API, but the values in them are consistent with the Inference Engine ones.
|
||||
|
||||
The difference with [other examples](Subgraph_Replacement_Model_Optimizer.md) is that here the `DetectionOutput` sub-graph is replaced with a new sub-graph (not a single layer).
|
||||
|
||||
Look at sub-graph replacement configuration file `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/legacy_ssd_support.json` that is used to enable two models listed above:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"custom_attributes": {
|
||||
"code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
|
||||
"confidence_threshold": 0.01,
|
||||
"keep_top_k": 200,
|
||||
"nms_threshold": 0.45,
|
||||
"pad_mode": "caffe.ResizeParameter.CONSTANT",
|
||||
"resize_mode": "caffe.ResizeParameter.WARP"
|
||||
},
|
||||
"id": "TFObjectDetectionAPIDetectionOutput",
|
||||
"include_inputs_to_sub_graph": true,
|
||||
"include_outputs_to_sub_graph": true,
|
||||
"instances": {
|
||||
"end_points": [
|
||||
"detection_boxes",
|
||||
"detection_scores",
|
||||
"num_detections"
|
||||
],
|
||||
"start_points": [
|
||||
"Postprocessor/Shape",
|
||||
"Postprocessor/Slice",
|
||||
"Postprocessor/ExpandDims",
|
||||
"Postprocessor/Reshape_1"
|
||||
]
|
||||
},
|
||||
"match_kind": "points"
|
||||
},
|
||||
{
|
||||
"custom_attributes": {
|
||||
},
|
||||
"id": "PreprocessorReplacement",
|
||||
"inputs": [
|
||||
[
|
||||
{
|
||||
"node": "map/Shape$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayUnstack/Shape$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
|
||||
"port": 2
|
||||
}
|
||||
]
|
||||
],
|
||||
"instances": [
|
||||
".*Preprocessor/"
|
||||
],
|
||||
"match_kind": "scope",
|
||||
"outputs": [
|
||||
{
|
||||
"node": "sub$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
**Key lines**:
|
||||
|
||||
* Lines 3-10 define static attributes that will be saved to the Intermediate Representation `.xml` file for `DetectionOutput` layer.
|
||||
|
||||
* Lines 12 and 13 define values for attributes that should be always set to "true" for this release of the Model Optimizer. These two attributes are specific for sub-graph match by points only.
|
||||
|
||||
* Lines 14-26 define one instance of the sub-graph to be match. It is an important difference between sub-graph matching by scope and points. Several instances could be specified for matching by scope, but matching with points allows specifying just one instance. So the full node names (not regular expressions like in case of match with scope) are specified in `instances` dictionary.
|
||||
|
||||
The second sub-graph replacer with identifier `PreprocessorReplacement` is used to remove the `Preprocessing` block from the graph. The replacer removes all nodes from this scope except nodes performing mean value subtraction and scaling (if applicable). Implementation of the replacer is in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` file.
|
||||
|
||||
Now let's analyze the structure of the topologies generated with the Object Detection API. There are several blocks in the graph performing particular task:
|
||||
|
||||
* `Preprocessor` block resizes, scales and subtracts mean values from the input image.
|
||||
|
||||
* `FeatureExtractor` block is a [MobileNet](https://arxiv.org/abs/1704.04861) or other backbone to extract features.
|
||||
|
||||
* `MultipleGridAnchorGenerator` block creates initial bounding boxes locations (anchors).
|
||||
|
||||
* `Postprocessor` block acts as a `DetectionOutput` layer. So we need to replace `Postprocessor` block with `DetectionOutput` layer. It is necessary to add all input nodes of the `Postprocessor` scope to the list `start_points`. Consider inputs of each of these nodes:
|
||||
|
||||
* `Postprocessor/Shape` consumes tensor with locations.
|
||||
* `Postprocessor/Slice` consumes tensor with confidences.
|
||||
* `Postprocessor/ExpandDims` consumes tensor with prior boxes.
|
||||
* `Postprocessor/Reshape_1` consumes tensor with locations similarly to the `Postprocessor/Shape` node. Despite the fact that the last node `Postprocessor/Reshape_1` gets the same tensor as node `Postprocessor/Shape`, it must be explicitly put to the list.
|
||||
|
||||
Object Detection API `Postprocessor` block generates output nodes: `detection_boxes`, `detection_scores`, `num_detections`, `detection_classes`.
|
||||
|
||||
Now consider the implementation of the sub-graph replacer, available in the `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/SSDs.py`. The file is rather big, so only some code snippets are used:
|
||||
```python
|
||||
class PostprocessorReplacement(FrontReplacementFromConfigFileSubGraph):
|
||||
replacement_id = 'TFObjectDetectionAPIDetectionOutput'
|
||||
```
|
||||
|
||||
These lines define the new `PostprocessorReplacement` class inherited from `FrontReplacementFromConfigFileSubGraph`. `FrontReplacementFromConfigFileSubGraph` is designed to replace sub-graph of operations described in the configuration file. There are methods to override for implementing custom replacement logic that we need:
|
||||
|
||||
* `generate_sub_graph` performs new sub-graph generation and returns dictionary where key is an alias name for the node and value is a Node objects. The dictionary has the same format as parameter `match` in the `replace_sub_graph` method in the example with <a href="Subgraph_Replacement_Model_Optimizer.html#replace-using-isomorphism-pattern">networkx sub-graph isomorphism pattern</a>. This dictionary is passed as argument to the next three methods, so it should contain entries the for nodes that the functions need.
|
||||
|
||||
* `input_edges_match` specifies mapping between input edges to sub-graph before replacement and after replacement. The key of the dictionary is a tuple specifying input tensor of the sub-graph before replacement: sub-graph input node name and input port number for this node. The value for this key is also a tuple specifying the node where this tensor should be attached during replacement: the node name (or alias name of the node) and the input port for this node. If the port number is zero, the parameter could be omitted so the key or value is just a node name (alias). Default implementation of the method returns an empty dictionary, so Model Optimizer does not create new edges.
|
||||
|
||||
* `output_edges_match` returns mapping between old output edges of the matched nodes and new sub-graph node and output edge index. The format is similar to the dictionary returned in the `input_edges_match` method. The only difference is that instead of specifying input port numbers for the nodes it is necessary to specify output port number. Of course, this mapping is needed for the output nodes only. Default implementation of the method returns an empty dictionary, so the Model Optimizer does not create new edges.
|
||||
|
||||
* `nodes_to_remove` specifies list of nodes that Model Optimizer should remove after sub-graph replacement. Default implementation of the method removes all sub-graph nodes.
|
||||
|
||||
Review of the replacer code, considering details of the `DetectionOutput` layer implementation in the Inference Engine. There are several constraints to the input tensors of the `DetectionOutput` layer:
|
||||
|
||||
* The tensor with locations must be of shape `[#‍batch, #‍prior_boxes * 4]` or `[#‍batch, #‍prior_boxes * 5]` depending on shared locations between different batches or not.
|
||||
* The tensor with confidences must be of shape `[#‍batch, #‍prior_boxes * #‍classes]` and confidences values are in range [0, 1], that is passed through `softmax` layer.
|
||||
* The tensor with prior boxes must be of shape `[#‍batch, 2, #‍prior_boxes * 4]`. Inference Engine expects that it contains variance values which TensorFlow Object Detection API does not add.
|
||||
|
||||
To enable these models, add `Reshape` operations for locations and confidences tensors and update the values for the prior boxes to include the variance constants (they are not there in TensorFlow Object Detection API).
|
||||
|
||||
Look at the `generate_sub_graph` method:
|
||||
```python
|
||||
def generate_sub_graph(self, graph: nx.MultiDiGraph, match: SubgraphMatch):
|
||||
log.debug('PostprocessorReplacement.generate_sub_graph')
|
||||
log.debug('matched_nodes = {}'.format(match.matched_nodes_names()))
|
||||
# softmax to be applied to the confidence
|
||||
softmax_conf_op = Softmax(graph, {'axis': 2, 'nchw_layout': True})
|
||||
softmax_conf_node = softmax_conf_op.add_node(dict(name='DetectionOutput_SoftMax_conf_'))
|
||||
# Inference Engine DetectionOutput layer consumes flattened tensors
|
||||
# reshape operation to flatten locations tensor
|
||||
reshape_loc_op = Reshape(graph, {'dim': np.array([0, -1])})
|
||||
reshape_loc_node = reshape_loc_op.add_node(dict(name='DetectionOutput_Reshape_loc_'))
|
||||
# Inference Engine DetectionOutput layer consumes flattened tensors
|
||||
# reshape operation to flatten confidence tensor
|
||||
reshape_conf_op = Reshape(graph, {'dim': np.array([0, -1])})
|
||||
reshape_conf_node = reshape_conf_op.add_node(dict(name='DetectionOutput_Reshape_conf_'))
|
||||
# create Node object from Op class
|
||||
detection_output_op = DetectionOutput(graph, match.custom_replacement_desc.custom_attributes)
|
||||
detection_output_op.attrs['old_infer'] = detection_output_op.attrs['infer']
|
||||
detection_output_op.attrs['infer'] = __class__.do_infer
|
||||
detection_output_node = detection_output_op.add_node(dict(name=detection_output_op.attrs['type'] + '_'))
|
||||
# create internal edges of the sub-graph. In this case we add edges to connect input port 0 and 1 of the
|
||||
# detection output with output of reshape of locations and reshape of confidence
|
||||
create_edge(softmax_conf_node, reshape_conf_node, 0, 0)
|
||||
create_edge(reshape_loc_node, detection_output_node, 0, 0)
|
||||
create_edge(reshape_conf_node, detection_output_node, 0, 1)
|
||||
return {'detection_output_node': detection_output_node, 'reshape_conf_node': softmax_conf_node,
|
||||
'reshape_loc_node': reshape_loc_node}
|
||||
```
|
||||
The method has two inputs: the graph to operate on and the instance of `SubgraphMatch` object, which describes matched sub-graph. The latter class has several useful methods to get particular input/output node of the sub-graph by input/output index or by node name pattern. Examples of these methods usage are given below.
|
||||
|
||||
**Key lines**:
|
||||
|
||||
* Lines 6 and 7 create new instance of operation of type `Softmax` and graph Node object corresponding to that operation.
|
||||
|
||||
* Lines 11-12 and 16-17 create new instance of operation of type `Reshape` to reshape locations and confidences tensors correspondingly.
|
||||
|
||||
* Lines 20-23 create new instance of operation `DetectionOutput` and graph Node object corresponding to that operation.
|
||||
|
||||
* Lines 27-29 connect `softmax` node with `reshape` node and connect two reshaped locations and confidences tensors with `DetectionOutput` node.
|
||||
|
||||
* Lines 30-31 define dictionary with aliases for detection output node, reshape locations and confidences nodes. These aliases are used in the `input_edges_match` and `output_edges_match` methods.
|
||||
|
||||
The `input_edges_match` method is the following:
|
||||
```python
|
||||
def input_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
|
||||
locs_consumer_node, locs_consumer_node_port = match.input_nodes(0)[0]
|
||||
conf_consumer_node, conf_consumer_node_port = match.input_nodes(1)[0]
|
||||
priors_consumer_node, priors_consumer_node_port = match.input_nodes(2)[0]
|
||||
# create matching nodes for locations and confidence tensors using simple scheme "old_node_name: new_node_name"
|
||||
# which in fact means "(old_node_name, 0): (new_node_name, 0)", while first '0' means old_port and the second
|
||||
# zero defines 'new_port'.
|
||||
return {locs_consumer_node.id: new_sub_graph['reshape_loc_node'].id,
|
||||
conf_consumer_node.id: new_sub_graph['reshape_conf_node'].id,
|
||||
priors_consumer_node.id: (new_sub_graph['detection_output_node'].id, 2),
|
||||
}
|
||||
```
|
||||
The method has three parameters: input `graph`, `match` object describing matched sub-graph and `new_sub_graph` dictionary with alias names returned from the `generate_sub_graph` method.
|
||||
|
||||
**Key lines**:
|
||||
|
||||
* Lines 2-4 initialize Node objects and input ports for the nodes where the input tensors for the sub-graph are consumed. The method `match.input_nodes(ind)` returns list of tuples where the first element is a Node object and the second is the input port for this node which consumes the ind-th input tensor of the sub-graph. `input_points` list in the configuration file defines the order of input tensors to the sub-graph. For example, the `locs_consumer_node` object of type Node is a node that consumes tensor with locations in the port with number `locs_consumer_node_port`.
|
||||
|
||||
* Lines 8-11 define dictionary with the mapping of tensors as described above. Note that the attribute `id` of the Node object contains the name of the node in the graph.
|
||||
|
||||
The `output_edges_match` method is the following:
|
||||
```python
|
||||
def output_edges_match(self, graph: nx.DiGraph, match: SubgraphMatch, new_sub_graph: dict):
|
||||
# the DetectionOutput in IE produces single tensor, but in TF it produces two tensors, so we need to create only
|
||||
# one output edge match
|
||||
return {match.output_node(0)[0].id: new_sub_graph['detection_output_node'].id}
|
||||
```
|
||||
|
||||
The method has the same three parameters as `input_edges_match` method. The returned dictionary contains mapping just for one tensor initially produces by the first output node of the sub-graph (which is `detection_boxes` according to the configuration file) to a single output tensor of the created `DetectionOutput` node. In fact, it is possible to use any output node of the initial sub-graph in mapping, because the sub-graph output nodes are the output nodes of the whole graph (their output is not consumed by any other nodes).
|
||||
|
||||
Now, the Model Optimizer knows how to replace the sub-graph. The last step to enable the model is to cut-off some parts of the graph not needed during inference.
|
||||
|
||||
It is necessary to remove the `Preprocessor` block where image is resized. Inference Engine does not support dynamic input shapes, so the Model Optimizer must froze the input image size, and thus, resizing of the image is not necessary. This is achieved by replacer `<INSTALL_DIR>/deployment_tools/model_optimizer/extensions/front/tf/Preprocessor.py` which is executed automatically.
|
||||
|
||||
There are several `Switch` operations in the `Postprocessor` block without output edges. For example:
|
||||
```sh
|
||||
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_t
|
||||
```
|
||||
```sh
|
||||
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/cond/switch_f
|
||||
```
|
||||
```sh
|
||||
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_t
|
||||
```
|
||||
```sh
|
||||
Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_1/cond/switch_f
|
||||
```
|
||||
|
||||
Model Optimizer marks these nodes as output nodes of the topology. Some parts of the `Posprocessor` blocks are not removed during sub-graph replacement because of that. In order to fix this issue, it is necessary to specify output nodes of the graph manually using the `--output` command line parameter.
|
||||
|
||||
###Example Model Optimizer Command-Line for TensorFlow\* SSD
|
||||
|
||||
The final command line to convert SSDs from the TensorFlow Object Detection API Zoo is:
|
||||
```shell
|
||||
./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/legacy_ssd_support.json --output="detection_boxes,detection_scores,num_detections"
|
||||
```
|
||||
|
||||
## Converting MobileNet V2 model created with TensorFlow Object Detection API <a name="convert_mobilenet_v2"></a>
|
||||
The [MobileNet V2 model](http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v2_coco_2018_03_29.tar.gz) differs from the previous version, so converting the model requires a new sub-graph replacement configuration file and new command line parameters. The major differences are:
|
||||
|
||||
* The `Preprocessor` block has two outputs: the pre-processed image and the pre-processed image size.
|
||||
* The `Postprocessor` block has one more input (in comparison with models created with TensorFlow Object Detection API
|
||||
version 1.6 or lower): the pre-processed image size.
|
||||
* Some node names have been changed in the `Postprocessor` block.
|
||||
|
||||
The updated sub-graph replacement configuration file `extensions/front/tf/ssd_v2_support.json` reflecting these changes
|
||||
is the following:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"custom_attributes": {
|
||||
"code_type": "caffe.PriorBoxParameter.CENTER_SIZE",
|
||||
"confidence_threshold": 0.01,
|
||||
"keep_top_k": 200,
|
||||
"nms_threshold": 0.6,
|
||||
"pad_mode": "caffe.ResizeParameter.CONSTANT",
|
||||
"resize_mode": "caffe.ResizeParameter.WARP"
|
||||
},
|
||||
"id": "TFObjectDetectionAPIDetectionOutput",
|
||||
"include_inputs_to_sub_graph": true,
|
||||
"include_outputs_to_sub_graph": true,
|
||||
"instances": {
|
||||
"end_points": [
|
||||
"detection_boxes",
|
||||
"detection_scores",
|
||||
"num_detections"
|
||||
],
|
||||
"start_points": [
|
||||
"Postprocessor/Shape",
|
||||
"Postprocessor/scale_logits",
|
||||
"Postprocessor/ExpandDims",
|
||||
"Postprocessor/Reshape_1",
|
||||
"Postprocessor/ToFloat"
|
||||
]
|
||||
},
|
||||
"match_kind": "points"
|
||||
},
|
||||
{
|
||||
"custom_attributes": {
|
||||
},
|
||||
"id": "PreprocessorReplacement",
|
||||
"inputs": [
|
||||
[
|
||||
{
|
||||
"node": "map/Shape$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayUnstack/Shape$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3$",
|
||||
"port": 2
|
||||
}
|
||||
]
|
||||
],
|
||||
"instances": [
|
||||
".*Preprocessor/"
|
||||
],
|
||||
"match_kind": "scope",
|
||||
"outputs": [
|
||||
{
|
||||
"node": "sub$",
|
||||
"port": 0
|
||||
},
|
||||
{
|
||||
"node": "map/TensorArrayStack_1/TensorArrayGatherV3$",
|
||||
"port": 0
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
### Example of Model Optimizer Command-Line for TensorFlow SSD MobileNet V2
|
||||
The final command line to convert MobileNet SSD V2 from the TensorFlow Object Detection Zoo is the following:
|
||||
|
||||
```sh
|
||||
./mo_tf.py --input_model=<path_to_frozen.pb> --tensorflow_use_custom_operations_config extensions/front/tf/ssd_v2_support.json --output="detection_boxes,detection_scores,num_detections"
|
||||
```
|
||||
@@ -19,133 +19,181 @@ Measuring inference performance involves many variables and is extremely use-cas
|
||||
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-datalabels"></script>
|
||||
<script src="https://cdnjs.cloudflare.com/ajax/libs/chartjs-plugin-annotation/0.5.7/chartjs-plugin-annotation.min.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-barchart-background@1.3.0/build/Plugin.Barchart.Background.min.js"></script>
|
||||
<script src="https://cdn.jsdelivr.net/npm/chartjs-plugin-deferred@1"></script>
|
||||
<!-- download this file and place on your server (or include the styles inline) -->
|
||||
<link rel="stylesheet" href="ovgraphs.css" type="text/css">
|
||||
\endhtmlonly
|
||||
|
||||
|
||||
\htmlonly
|
||||
<script src="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-1-096.js" id="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-1-096"></script>
|
||||
<script src="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-2-170.js" id="bert-large-uncased-whole-word-masking-squad-int8-0001-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="deeplabv3-tf-ov-2021-1-096.js" id="deeplabv3-tf-ov-2021-1-096"></script>
|
||||
<script src="deeplabv3-tf-ov-2021-2-170.js" id="deeplabv3-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="densenet-121-tf-ov-2021-1-096.js" id="densenet-121-tf-ov-2021-1-096"></script>
|
||||
<script src="densenet-121-tf-ov-2021-2-170.js" id="densenet-121-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="faster-rcnn-resnet50-coco-tf-ov-2021-1-096.js" id="faster-rcnn-resnet50-coco-tf-ov-2021-1-096"></script>
|
||||
<script src="faster-rcnn-resnet50-coco-tf-ov-2021-2-170.js" id="faster-rcnn-resnet50-coco-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="googlenet-v1-tf-ov-2021-1-096.js" id="googlenet-v1-tf-ov-2021-1-096"></script>
|
||||
<script src="googlenet-v1-tf-ov-2021-2-170.js" id="googlenet-v1-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="inception-v3-tf-ov-2021-1-096.js" id="inception-v3-tf-ov-2021-1-096"></script>
|
||||
<script src="inception-v3-tf-ov-2021-2-170.js" id="inception-v3-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="mobilenet-ssd-cf-ov-2021-1-096.js" id="mobilenet-ssd-cf-ov-2021-1-096"></script>
|
||||
<script src="mobilenet-ssd-cf-ov-2021-2-170.js" id="mobilenet-ssd-cf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="mobilenet-v1-1-0-224-tf-ov-2021-1-096.js" id="mobilenet-v1-1-0-224-tf-ov-2021-1-096"></script>
|
||||
<script src="mobilenet-v1-1-0-224-tf-ov-2021-2-170.js" id="mobilenet-v1-1-0-224-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="mobilenet-v2-pytorch-ov-2021-1-096.js" id="mobilenet-v2-pytorch-ov-2021-1-096"></script>
|
||||
<script src="mobilenet-v2-pytorch-ov-2021-2-170.js" id="mobilenet-v2-pytorch-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="resnet-18-pytorch-ov-2021-1-096.js" id="resnet-18-pytorch-ov-2021-1-096"></script>
|
||||
<script src="resnet-18-pytorch-ov-2021-2-170.js" id="resnet-18-pytorch-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="resnet-50-tf-ov-2021-1-096.js" id="resnet-50-tf-ov-2021-1-096"></script>
|
||||
<script src="resnet-50-tf-ov-2021-2-170.js" id="resnet-50-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
|
||||
\htmlonly
|
||||
<script src="se-resnext-50-cf-ov-2021-1-096.js" id="se-resnext-50-cf-ov-2021-1-096"></script>
|
||||
<script src="se-resnext-50-cf-ov-2021-2-170.js" id="se-resnext-50-cf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="squeezenet1-1-cf-ov-2021-1-096.js" id="squeezenet1-1-cf-ov-2021-1-096"></script>
|
||||
<script src="squeezenet1-1-cf-ov-2021-2-170.js" id="squeezenet1-1-cf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
|
||||
\htmlonly
|
||||
<script src="ssd300-cf-ov-2021-1-096.js" id="ssd300-cf-ov-2021-1-096"></script>
|
||||
<script src="ssd300-cf-ov-2021-2-170.js" id="ssd300-cf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
\htmlonly
|
||||
<script src="yolo-v3-tf-ov-2021-1-096.js" id="yolo-v3-tf-ov-2021-1-096"></script>
|
||||
<script src="yolo-v3-tf-ov-2021-2-170.js" id="yolo-v3-tf-ov-2021-2-170"></script>
|
||||
\endhtmlonly
|
||||
|
||||
|
||||
## Platform Configurations
|
||||
|
||||
Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.1.
|
||||
Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.2.
|
||||
|
||||
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of September 25, 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure.
|
||||
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of December 9, 2020 and may not reflect all publicly available updates. See configuration disclosure for details. No product can be absolutely secure.
|
||||
|
||||
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, see [Performance Benchmark Test Disclosure](https://www.intel.com/content/www/us/en/benchmarks/benchmark.html).
|
||||
Performance varies by use, configuration and other factors. Learn more at [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex).
|
||||
|
||||
Your costs and results may vary.
|
||||
|
||||
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.
|
||||
|
||||
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. [Notice Revision #2010804](https://software.intel.com/articles/optimization-notice).
|
||||
Intel optimizations, for Intel compilers or other products, may not optimize to the same degree for non-Intel products.
|
||||
|
||||
Testing by Intel done on: see test date for each HW platform below.
|
||||
|
||||
**CPU Inference Engines**
|
||||
|
||||
| | Intel® Xeon® E-2124G | Intel® Xeon® Silver 4216R | Intel® Xeon® Gold 5218T | Intel® Xeon® Platinum 8270 |
|
||||
| ------------------------------- | ----------------------| ---------------------------- | ---------------------------- | ---------------------------- |
|
||||
| Motherboard | ASUS* WS C246 PRO | Intel® Server Board S2600STB | Intel® Server Board S2600STB | Intel® Server Board S2600STB |
|
||||
| CPU | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz |
|
||||
| Hyper Threading | OFF | ON | ON | ON |
|
||||
| Turbo Setting | ON | ON | ON | ON |
|
||||
| Memory | 2 x 16 GB DDR4 2666MHz| 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2933MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic |
|
||||
| BIOS Vendor | American Megatrends Inc.* | Intel Corporation | Intel Corporation | Intel Corporation |
|
||||
| BIOS Version | 0904 | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230 |
|
||||
| BIOS Release | April 12, 2019 | September 28, 2019 | September 28, 2019 | September 28, 2019 |
|
||||
| BIOS Settings | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>change power policy <br>to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit |
|
||||
| Batch size | 1 | 1 | 1 | 1 |
|
||||
| Precision | INT8 | INT8 | INT8 | INT8 |
|
||||
| Number of concurrent inference requests | 4 | 32 | 32 | 52 |
|
||||
| Test Date | September 25, 2020 | September 25, 2020 | September 25, 2020 | September 25, 2020 |
|
||||
| Power dissipation, TDP in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) |
|
||||
| | Intel® Xeon® E-2124G | Intel® Xeon® W1290P | Intel® Xeon® Silver 4216R |
|
||||
| ------------------------------- | ---------------------- | --------------------------- | ---------------------------- |
|
||||
| Motherboard | ASUS* WS C246 PRO | ASUS* WS W480-ACE | Intel® Server Board S2600STB |
|
||||
| CPU | Intel® Xeon® E-2124G CPU @ 3.40GHz | Intel® Xeon® W-1290P CPU @ 3.70GHz | Intel® Xeon® Silver 4216R CPU @ 2.20GHz |
|
||||
| Hyper Threading | OFF | ON | ON |
|
||||
| Turbo Setting | ON | ON | ON |
|
||||
| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 @ 2666MHz |12 x 32 GB DDR4 2666MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic |
|
||||
| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc. | Intel Corporation |
|
||||
| BIOS Version | 0904 | 607 | SE5C620.86B.02.01.<br>0009.092820190230 |
|
||||
| BIOS Release | April 12, 2019 | May 29, 2020 | September 28, 2019 |
|
||||
| BIOS Settings | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>change power policy <br>to "performance", <br>save & exit |
|
||||
| Batch size | 1 | 1 | 1
|
||||
| Precision | INT8 | INT8 | INT8
|
||||
| Number of concurrent inference requests | 4 | 5 | 32
|
||||
| Test Date | December 9, 2020 | December 9, 2020 | December 9, 2020
|
||||
| Power dissipation, TDP in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [539](https://ark.intel.com/content/www/us/en/ark/products/199336/intel-xeon-w-1290p-processor-20m-cache-3-70-ghz.html) |[1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) |
|
||||
|
||||
**CPU Inference Engines (continue)**
|
||||
|
||||
| | Intel® Core™ i5-8500 | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | 11th Gen Intel® Core™ i5-1145G7E |
|
||||
| -------------------- | ---------------------------------- | ----------------------------------- |--------------------------------------|-----------------------------------|
|
||||
| Motherboard | ASUS* PRIME Z370-A | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | Intel Corporation<br>internal/Reference Validation Platform |
|
||||
| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | 11th Gen Intel® Core™ i5-1145G7E @ 2.60GHz |
|
||||
| Hyper Threading | OFF | ON | ON | ON |
|
||||
| Turbo Setting | ON | ON | ON | ON |
|
||||
| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | 2 x 8 GB DDR4 3200MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.0.0-23-generic | 5.0.0-23-generic | 5.8.0-05-generic |
|
||||
| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | Intel Corporation |
|
||||
| BIOS Version | 2401 | F11 | 505 | TGLIFUI1.R00.3243.A04.2006302148 |
|
||||
| BIOS Release | July 12, 2019 | March 13, 2019 | December 17, 2019 | June 30, 2020 |
|
||||
| BIOS Settings | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings | Default Settings |
|
||||
| Batch size | 1 | 1 | 1 | 1 |
|
||||
| Precision | INT8 | INT8 | INT8 | INT8 |
|
||||
| Number of concurrent inference requests | 3 | 4 | 24 | 4 |
|
||||
| Test Date | September 25, 2020 | September 25, 2020 | September 25, 2020 | September 25, 2020 |
|
||||
| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1) | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [28](https://ark.intel.com/content/www/us/en/ark/products/208081/intel-core-i5-1145g7e-processor-8m-cache-up-to-4-10-ghz.html) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [309](https://mysamples.intel.com/SAM_U_Product/ProductDetail.aspx?InputMMID=99A3D1&RequestID=0&ProductID=1213750) |
|
||||
| | Intel® Xeon® Gold 5218T | Intel® Xeon® Platinum 8270 |
|
||||
| ------------------------------- | ---------------------------- | ---------------------------- |
|
||||
| Motherboard | Intel® Server Board S2600STB | Intel® Server Board S2600STB |
|
||||
| CPU | Intel® Xeon® Gold 5218T CPU @ 2.10GHz | Intel® Xeon® Platinum 8270 CPU @ 2.70GHz |
|
||||
| Hyper Threading | ON | ON |
|
||||
| Turbo Setting | ON | ON |
|
||||
| Memory | 12 x 32 GB DDR4 2666MHz | 12 x 32 GB DDR4 2933MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic |
|
||||
| BIOS Vendor | Intel Corporation | Intel Corporation |
|
||||
| BIOS Version | SE5C620.86B.02.01.<br>0009.092820190230 | SE5C620.86B.02.01.<br>0009.092820190230 |
|
||||
| BIOS Release | September 28, 2019 | September 28, 2019 |
|
||||
| BIOS Settings | Select optimized default settings, <br>change power policy to "performance", <br>save & exit | Select optimized default settings, <br>change power policy to "performance", <br>save & exit |
|
||||
| Batch size | 1 | 1 |
|
||||
| Precision | INT8 | INT8 |
|
||||
| Number of concurrent inference requests |32 | 52 |
|
||||
| Test Date | December 9, 2020 | December 9, 2020 |
|
||||
| Power dissipation, TDP in Watt | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) |
|
||||
|
||||
|
||||
**CPU Inference Engines (continue)**
|
||||
|
||||
| | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | Intel® Core™ i9-10900TE<br>(iEi Flex BX210AI)| 11th Gen Intel® Core™ i7-1185G7 |
|
||||
| -------------------- | ----------------------------------- |--------------------------------------| ---------------------------------------------|---------------------------------|
|
||||
| Motherboard | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | iEi / B595 | Intel Corporation<br>internal/Reference<br>Validation Platform |
|
||||
| CPU | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | Intel® Core™ i9-10900TE CPU @ 1.80GHz | 11th Gen Intel® Core™ i7-1185G7 @ 3.00GHz |
|
||||
| Hyper Threading | ON | ON | ON | ON |
|
||||
| Turbo Setting | ON | ON | ON | ON |
|
||||
| Memory | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | 2 x 8 GB DDR4 @ 2400MHz | 2 x 8 GB DDR4 3200MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.8.0-05-generic | 5.8.0-05-generic |
|
||||
| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | Intel Corporation |
|
||||
| BIOS Version | F11 | 505 | Z667AR10 | TGLSFWI1.R00.3425.<br>A00.2010162309 |
|
||||
| BIOS Release | March 13, 2019 | December 17, 2019 | July 15, 2020 | October 16, 2020 |
|
||||
| BIOS Settings | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings | Default Settings | Default Settings |
|
||||
| Batch size | 1 | 1 | 1 | 1 |
|
||||
| Precision | INT8 | INT8 | INT8 | INT8 |
|
||||
| Number of concurrent inference requests |4 | 24 | 5 | 4 |
|
||||
| Test Date | December 9, 2020 | December 9, 2020 | December 9, 2020 | December 9, 2020 |
|
||||
| Power dissipation, TDP in Watt | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [35](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i9-10900te-processor-20m-cache-up-to-4-60-ghz.html) | [28](https://ark.intel.com/content/www/us/en/ark/products/208081/intel-core-i5-1145g7e-processor-8m-cache-up-to-4-10-ghz.html) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [444](https://ark.intel.com/content/www/us/en/ark/products/203901/intel-core-i9-10900te-processor-20m-cache-up-to-4-60-ghz.html) | [426](https://mysamples.intel.com/SAM_U_Product/ProductDetail.aspx?InputMMID=99A3D1&RequestID=0&ProductID=1213750) |
|
||||
|
||||
|
||||
**CPU Inference Engines (continue)**
|
||||
|
||||
| | Intel® Core™ i5-8500 | Intel® Core™ i5-10500TE | Intel® Core™ i5-10500TE<br>(iEi Flex-BX210AI)|
|
||||
| -------------------- | ---------------------------------- | ----------------------------------- |-------------------------------------- |
|
||||
| Motherboard | ASUS* PRIME Z370-A | GIGABYTE* Z490 AORUS PRO AX | iEi / B595 |
|
||||
| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i5-10500TE CPU @ 2.30GHz | Intel® Core™ i5-10500TE CPU @ 2.30GHz |
|
||||
| Hyper Threading | OFF | ON | ON |
|
||||
| Turbo Setting | ON | ON | ON |
|
||||
| Memory | 2 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 @ 2666MHz | 1 x 8 GB DDR4 @ 2400MHz |
|
||||
| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS |
|
||||
| Kernel Version | 5.3.0-24-generic | 5.3.0-24-generic | 5.3.0-24-generic |
|
||||
| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* |
|
||||
| BIOS Version | 2401 | F3 | Z667AR10 |
|
||||
| BIOS Release | July 12, 2019 | March 25, 2020 | July 17, 2020 |
|
||||
| BIOS Settings | Select optimized default settings, <br>save & exit | Select optimized default settings, <br>set OS type to "other", <br>save & exit | Default Settings |
|
||||
| Batch size | 1 | 1 | 1 |
|
||||
| Precision | INT8 | INT8 | INT8 |
|
||||
| Number of concurrent inference requests | 3 | 4 | 4 |
|
||||
| Test Date | December 9, 2020 | December 9, 2020 | December 9, 2020 |
|
||||
| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1)| [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | [35](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) |
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) | [195](https://ark.intel.com/content/www/us/en/ark/products/203891/intel-core-i5-10500te-processor-12m-cache-up-to-3-70-ghz.html) |
|
||||
|
||||
|
||||
**CPU Inference Engines (continue)**
|
||||
|
||||
@@ -165,7 +213,7 @@ Testing by Intel done on: see test date for each HW platform below.
|
||||
| Batch size | 1 | 1 |
|
||||
| Precision | INT8 | INT8 |
|
||||
| Number of concurrent inference requests | 4 | 4 |
|
||||
| Test Date | September 25, 2020 | September 25, 2020 |
|
||||
| Test Date | December 9, 2020 | December 9, 2020 |
|
||||
| Power dissipation, TDP in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)|
|
||||
| CPU Price on September 29, 2020, USD<br>Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) |
|
||||
|
||||
@@ -173,7 +221,7 @@ Testing by Intel done on: see test date for each HW platform below.
|
||||
|
||||
**Accelerator Inference Engines**
|
||||
|
||||
| | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design<br>with Intel® Movidius™ VPUs (Uzel* UI-AR8) |
|
||||
| | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design<br>with Intel® Movidius™ VPUs (Mustang-V100-MX8) |
|
||||
| --------------------------------------- | ------------------------------------- | ------------------------------------- |
|
||||
| VPU | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 |
|
||||
| Connection | USB 2.0/3.0 | PCIe X4 |
|
||||
@@ -181,7 +229,7 @@ Testing by Intel done on: see test date for each HW platform below.
|
||||
| Precision | FP16 | FP16 |
|
||||
| Number of concurrent inference requests | 4 | 32 |
|
||||
| Power dissipation, TDP in Watt | 2.5 | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) |
|
||||
| CPU Price, USD<br>Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from September 29, 2020) | [768](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) (from May 15, 2020) |
|
||||
| CPU Price, USD<br>Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from December 9, 2020) | [214](https://www.arrow.com/en/products/mustang-v100-mx8-r10/iei-technology?gclid=Cj0KCQiA5bz-BRD-ARIsABjT4ng1v1apmxz3BVCPA-tdIsOwbEjTtqnmp_rQJGMfJ6Q2xTq6ADtf9OYaAhMUEALw_wcB) (from December 9, 2020) |
|
||||
| Host Computer | Intel® Core™ i7 | Intel® Core™ i5 |
|
||||
| Motherboard | ASUS* Z370-A II | Uzelinfo* / US-E1300 |
|
||||
| CPU | Intel® Core™ i7-8700 CPU @ 3.20GHz | Intel® Core™ i5-6600 CPU @ 3.30GHz |
|
||||
@@ -193,9 +241,9 @@ Testing by Intel done on: see test date for each HW platform below.
|
||||
| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* |
|
||||
| BIOS Version | 411 | 5.12 |
|
||||
| BIOS Release | September 21, 2018 | September 21, 2018 |
|
||||
| Test Date | September 25, 2020 | September 25, 2020 |
|
||||
| Test Date | December 9, 2020 | December 9, 2020 |
|
||||
|
||||
Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2021.1.html)
|
||||
Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2021.2.html)
|
||||
|
||||
\htmlonly
|
||||
<style>
|
||||
@@ -206,7 +254,7 @@ Please follow this link for more detailed configuration descriptions: [Configura
|
||||
<div class="opt-notice-wrapper">
|
||||
<p class="opt-notice">
|
||||
\endhtmlonly
|
||||
For more complete information about performance and benchmark results, visit: [www.intel.com/benchmarks](https://www.intel.com/benchmarks) and [Optimization Notice](https://software.intel.com/articles/optimization-notice). [Legal Information](../Legal_Information.md).
|
||||
Results may vary. For workloads and configurations visit: [www.intel.com/PerformanceIndex](https://www.intel.com/PerformanceIndex) and [Legal Information](../Legal_Information.md).
|
||||
\htmlonly
|
||||
</p>
|
||||
</div>
|
||||
|
||||
@@ -49,7 +49,7 @@ Intel partners with various vendors all over the world. Visit the [Intel® AI: I
|
||||
We published a set of guidelines and recommendations to optimize your models available in an [introductory](../IE_DG/Intro_to_Performance.md) guide and an [advanced](../optimization_guide/dldt_optimization_guide.md) guide. For further support, please join the conversation in the [Community Forum](https://software.intel.com/en-us/forums/intel-distribution-of-openvino-toolkit).
|
||||
|
||||
#### 9. Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support?
|
||||
The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](./performance_int8_vs_fp32.html) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270.
|
||||
The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](performance_int8_vs_fp32.md) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270.
|
||||
|
||||
#### 10. Previous releases included benchmarks on googlenet-v1-CF (Caffe). Why is there no longer benchmarks on this neural network model?
|
||||
We replaced googlenet-v1-CF to resnet-18-pytorch due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1-CF. Both googlenet-v1-CF and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases.
|
||||
|
||||
@@ -9,40 +9,36 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<th>Intel® Core™ <br>i7-8700T</th>
|
||||
<th>Intel® Xeon® <br>Gold <br>5218T</th>
|
||||
<th>Intel® Xeon® <br>Platinum <br>8270</th>
|
||||
<th>Intel® Core™ <br>i7-1065G7</th>
|
||||
<th>Intel® Core™ <br>i5-1145G7E</th>
|
||||
<th>Intel® Core™ <br>i7-1185G7</th>
|
||||
</tr>
|
||||
<tr align="left">
|
||||
<th>OpenVINO <br>benchmark <br>model name</th>
|
||||
<th>Dataset</th>
|
||||
<th colspan="4" align="center">Throughput speed-up FP16-INT8 vs FP32</th>
|
||||
<th colspan="3" align="center">Throughput speed-up FP16-INT8 vs FP32</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>bert-large-<br>uncased-whole-word-<br>masking-squad-0001</td>
|
||||
<td>SQuAD</td>
|
||||
<td>1.6</td>
|
||||
<td>2.5</td>
|
||||
<td>2.7</td>
|
||||
<td>2.0</td>
|
||||
<td>N/A</td>
|
||||
<td>2.8</td>
|
||||
<td>2.6</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>brain-tumor-<br>segmentation-<br>0001-MXNET</td>
|
||||
<td>BraTS</td>
|
||||
<td>1.5</td>
|
||||
<td>1.7</td>
|
||||
<td>1.6</td>
|
||||
<td>1.9</td>
|
||||
<td>1.7</td>
|
||||
<td>1.8</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>deeplabv3-TF</td>
|
||||
<td>VOC 2012<br>Segmentation</td>
|
||||
<td>1.4</td>
|
||||
<td>1.5</td>
|
||||
<td>2.4</td>
|
||||
<td>2.6</td>
|
||||
<td>2.8</td>
|
||||
<td>2.9</td>
|
||||
<td>3.1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>densenet-121-TF</td>
|
||||
@@ -50,7 +46,6 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>1.6</td>
|
||||
<td>3.2</td>
|
||||
<td>3.2</td>
|
||||
<td>3.0</td>
|
||||
<td>3.2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -59,17 +54,15 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>2.0</td>
|
||||
<td>3.6</td>
|
||||
<td>3.5</td>
|
||||
<td>3.2</td>
|
||||
<td>3.5</td>
|
||||
<td>3.4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>faster_rcnn_<br>resnet50_coco-TF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>1.7</td>
|
||||
<td>3.5</td>
|
||||
<td>3.4</td>
|
||||
<td>3.6</td>
|
||||
<td>3.6</td>
|
||||
<td>3.4</td>
|
||||
<td>3.4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>googlenet-v1-TF</td>
|
||||
@@ -78,7 +71,6 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>3.6</td>
|
||||
<td>3.7</td>
|
||||
<td>3.5</td>
|
||||
<td>3.6</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>inception-v3-TF</td>
|
||||
@@ -86,43 +78,38 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>1.8</td>
|
||||
<td>3.8</td>
|
||||
<td>4.0</td>
|
||||
<td>3.7</td>
|
||||
<td>3.7</td>
|
||||
<td>3.5</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-<br>ssd-CF</td>
|
||||
<td>VOC2012</td>
|
||||
<td>1.5</td>
|
||||
<td>3.0</td>
|
||||
<td>3.3</td>
|
||||
<td>3.1</td>
|
||||
<td>3.3</td>
|
||||
<td>3.6</td>
|
||||
<td>3.1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v1-1.0-<br>224-TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.5</td>
|
||||
<td>3.2</td>
|
||||
<td>3.9</td>
|
||||
<td>2.9</td>
|
||||
<td>3.2</td>
|
||||
<td>4.1</td>
|
||||
<td>3.1</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v2-1.0-<br>224-TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.3</td>
|
||||
<td>2.7</td>
|
||||
<td>3.8</td>
|
||||
<td>2.2</td>
|
||||
<td>4.3</td>
|
||||
<td>2.5</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v2-<br>pytorch</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.4</td>
|
||||
<td>2.6</td>
|
||||
<td>3.6</td>
|
||||
<td>2.3</td>
|
||||
<td>2.8</td>
|
||||
<td>4.6</td>
|
||||
<td>2.4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
@@ -132,61 +119,54 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>3.7</td>
|
||||
<td>3.8</td>
|
||||
<td>3.6</td>
|
||||
<td>3.6</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>resnet-50-<br>pytorch</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.8</td>
|
||||
<td>3.6</td>
|
||||
<td>3.8</td>
|
||||
<td>3.5</td>
|
||||
<td>3.6</td>
|
||||
<td>3.9</td>
|
||||
<td>3.4</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>resnet-50-<br>TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.8</td>
|
||||
<td>3.5</td>
|
||||
<td>3.8</td>
|
||||
<td>3.6</td>
|
||||
<td>3.9</td>
|
||||
<td>3.4</td>
|
||||
<td>4.0</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>squeezenet1.1-<br>CF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>1.6</td>
|
||||
<td>2.9</td>
|
||||
<td>3.2</td>
|
||||
<td>3.0</td>
|
||||
<td>3.4</td>
|
||||
<td>3.2</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssd_mobilenet_<br>v1_coco-tf</td>
|
||||
<td>VOC2012</td>
|
||||
<td>1.6</td>
|
||||
<td>3.0</td>
|
||||
<td>3.4</td>
|
||||
<td>3.1</td>
|
||||
<td>3.3</td>
|
||||
<td>3.7</td>
|
||||
<td>3.0</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssd300-CF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>1.8</td>
|
||||
<td>3.7</td>
|
||||
<td>3.6</td>
|
||||
<td>3.7</td>
|
||||
<td>3.8</td>
|
||||
<td>4.0</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssdlite_<br>mobilenet_<br>v2-TF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>1.4</td>
|
||||
<td>2.3</td>
|
||||
<td>3.1</td>
|
||||
<td>2.4</td>
|
||||
<td>2.6</td>
|
||||
<td>3.9</td>
|
||||
<td>2.5</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>yolo_v3-TF</td>
|
||||
@@ -194,8 +174,7 @@ The table below illustrates the speed-up factor for the performance gain by swit
|
||||
<td>1.8</td>
|
||||
<td>3.8</td>
|
||||
<td>3.9</td>
|
||||
<td>3.7</td>
|
||||
<td>3.8</td>
|
||||
<td>3.6</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
@@ -208,21 +187,14 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<th></th>
|
||||
<th>Intel® Core™ <br>i9-10920X CPU<br>@ 3.50GHZ (VNNI)</th>
|
||||
<th>Intel® Core™ <br>i9-9820X CPU<br>@ 3.30GHz (AVX512)</th>
|
||||
<th>Intel® Core™ <br>i7-8700 CPU<br>@ 3.20GHz (AVX2)</th>
|
||||
<th>Intel® Core™ <br>i7-6700 CPU<br>@ 4.0GHz (AVX2)</th>
|
||||
<th>Intel® Core™ <br>i7-1185G7 CPU<br>@ 4.0GHz (TGL VNNI)</th>
|
||||
</tr>
|
||||
<tr align="left">
|
||||
<th>OpenVINO Benchmark <br>Model Name</th>
|
||||
<th>Dataset</th>
|
||||
<th>Metric Name</th>
|
||||
<th colspan="3" align="center">Absolute Accuracy Drop, %</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>bert-large-<br>uncased-whole-word-<br>masking-squad-0001</td>
|
||||
<td>SQuAD</td>
|
||||
<td>F1</td>
|
||||
<td>0.65</td>
|
||||
<td>0.57</td>
|
||||
<td>0.83</td>
|
||||
<th colspan="4" align="center">Absolute Accuracy Drop, %</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>brain-tumor-<br>segmentation-<br>0001-MXNET</td>
|
||||
@@ -230,23 +202,26 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>Dice-index@ <br>Mean@ <br>Overall Tumor</td>
|
||||
<td>0.08</td>
|
||||
<td>0.08</td>
|
||||
<td>0.9</td>
|
||||
<td>0.08</td>
|
||||
<td>0.08</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>deeplabv3-TF</td>
|
||||
<td>VOC 2012<br>Segmentation</td>
|
||||
<td>mean_iou</td>
|
||||
<td>0.73</td>
|
||||
<td>1.10</td>
|
||||
<td>1.10</td>
|
||||
<td>0.73</td>
|
||||
<td>1.11</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>densenet-121-TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.74</td>
|
||||
<td>0.74</td>
|
||||
<td>0.76</td>
|
||||
<td>0.73</td>
|
||||
<td>0.72</td>
|
||||
<td>0.72</td>
|
||||
<td>0.73</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>facenet-<br>20180408-<br>102900-TF</td>
|
||||
@@ -255,22 +230,25 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.02</td>
|
||||
<td>0.02</td>
|
||||
<td>0.02</td>
|
||||
<td>0.47</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>faster_rcnn_<br>resnet50_coco-TF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>coco_<br>precision</td>
|
||||
<td>0.21</td>
|
||||
<td>0.21</td>
|
||||
<td>0.20</td>
|
||||
<td>0.20</td>
|
||||
<td>0.21</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>googlenet-v1-TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.03</td>
|
||||
<td>0.03</td>
|
||||
<td>0.01</td>
|
||||
<td>0.01</td>
|
||||
<td>0.03</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>inception-v3-TF</td>
|
||||
@@ -279,6 +257,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.03</td>
|
||||
<td>0.01</td>
|
||||
<td>0.01</td>
|
||||
<td>0.03</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-<br>ssd-CF</td>
|
||||
@@ -287,6 +266,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.35</td>
|
||||
<td>0.34</td>
|
||||
<td>0.34</td>
|
||||
<td>0.35</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v1-1.0-<br>224-TF</td>
|
||||
@@ -295,22 +275,25 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.27</td>
|
||||
<td>0.20</td>
|
||||
<td>0.20</td>
|
||||
<td>0.27</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v2-1.0-<br>224-TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.45</td>
|
||||
<td>0.94</td>
|
||||
<td>0.94</td>
|
||||
<td>0.44</td>
|
||||
<td>0.92</td>
|
||||
<td>0.92</td>
|
||||
<td>0.44</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>mobilenet-v2-<br>PYTORCH</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.35</td>
|
||||
<td>0.63</td>
|
||||
<td>0.63</td>
|
||||
<td>0.25</td>
|
||||
<td>7.42</td>
|
||||
<td>7.42</td>
|
||||
<td>0.25</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>resnet-18-<br>pytorch</td>
|
||||
@@ -319,6 +302,7 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.26</td>
|
||||
<td>0.25</td>
|
||||
<td>0.25</td>
|
||||
<td>0.26</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>resnet-50-<br>PYTORCH</td>
|
||||
@@ -327,58 +311,65 @@ The following table shows the absolute accuracy drop that is calculated as the d
|
||||
<td>0.18</td>
|
||||
<td>0.19</td>
|
||||
<td>0.19</td>
|
||||
<td>0.18</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>resnet-50-<br>TF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.15</td>
|
||||
<td>0.11</td>
|
||||
<td>0.11</td>
|
||||
<td>0.15</td>
|
||||
<td>0.10</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>squeezenet1.1-<br>CF</td>
|
||||
<td>ImageNet</td>
|
||||
<td>acc@top-1</td>
|
||||
<td>0.66</td>
|
||||
<td>0.66</td>
|
||||
<td>0.64</td>
|
||||
<td>0.64</td>
|
||||
<td>0.66</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssd_mobilenet_<br>v1_coco-tf</td>
|
||||
<td>VOC2012</td>
|
||||
<td>COCO mAp</td>
|
||||
<td>0.24</td>
|
||||
<td>0.24</td>
|
||||
<td>3.07</td>
|
||||
<td>3.07</td>
|
||||
<td>0.24</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssd300-CF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>COCO mAp</td>
|
||||
<td>0.06</td>
|
||||
<td>0.06</td>
|
||||
<td>0.05</td>
|
||||
<td>0.05</td>
|
||||
<td>0.06</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>ssdlite_<br>mobilenet_<br>v2-TF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>COCO mAp</td>
|
||||
<td>0.14</td>
|
||||
<td>0.43</td>
|
||||
<td>0.43</td>
|
||||
<td>0.14</td>
|
||||
<td>0.47</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>yolo_v3-TF</td>
|
||||
<td>MS COCO</td>
|
||||
<td>COCO mAp</td>
|
||||
<td>0.20</td>
|
||||
<td>0.20</td>
|
||||
<td>0.36</td>
|
||||
<td>0.12</td>
|
||||
<td>0.35</td>
|
||||
<td>0.35</td>
|
||||
<td>0.12</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||

|
||||

|
||||
|
||||
\htmlonly
|
||||
<style>
|
||||
|
||||
@@ -2,16 +2,17 @@
|
||||
|
||||
EXCLUDE_SYMBOLS = INFERENCE_ENGINE_C_API_EXTERN \
|
||||
INFERENCE_ENGINE_C_API \
|
||||
INFERENCE_ENGINE_C_API_CALLBACK \
|
||||
IE_NODISCARD
|
||||
|
||||
PREDEFINED = "__attribute__(x)=" \
|
||||
"__VA_ARGS__=" \
|
||||
"INFERENCE_ENGINE_C_API_EXTERN=" \
|
||||
"INFERENCE_ENGINE_C_API_CALLBACK=" \
|
||||
"INFERENCE_ENGINE_C_API=" \
|
||||
"IE_NODISCARD=" \
|
||||
"__cdecl=" \
|
||||
"__declspec(x)=" \
|
||||
"__GNUC__=" \
|
||||
"_WIN32"
|
||||
|
||||
FILE_PATTERNS = *.h
|
||||
|
||||
@@ -903,8 +903,8 @@ EXCLUDE_PATTERNS = */temp/* \
|
||||
# exclude all test directories use the pattern */test/*
|
||||
|
||||
EXCLUDE_SYMBOLS = InferenceEngine::details \
|
||||
InferenceEngine::gpu::details \
|
||||
PRECISION_NAME \
|
||||
TBLOB_TOP_RESULT \
|
||||
CASE \
|
||||
CASE2 \
|
||||
_CONFIG_KEY \
|
||||
@@ -929,24 +929,26 @@ EXCLUDE_SYMBOLS = InferenceEngine::details \
|
||||
INFERENCE_ENGINE_API_CPP \
|
||||
INFERENCE_ENGINE_API_CLASS \
|
||||
INFERENCE_ENGINE_DEPRECATED \
|
||||
INFERENCE_ENGINE_NN_BUILDER_API_CLASS \
|
||||
INFERENCE_ENGINE_NN_BUILDER_DEPRECATED \
|
||||
IE_SUPPRESS_DEPRECATED_START \
|
||||
IE_SUPPRESS_DEPRECATED_END \
|
||||
IE_SUPPRESS_DEPRECATED_START_WIN \
|
||||
IE_SUPPRESS_DEPRECATED_END_WIN \
|
||||
IE_SUPPRESS_DEPRECATED_END_WIN \
|
||||
INFERENCE_ENGINE_INTERNAL \
|
||||
INFERENCE_ENGINE_INTERNAL_CNNLAYER_CLASS \
|
||||
IE_DO_PRAGMA \
|
||||
REG_VALIDATOR_FOR
|
||||
parallel_* \
|
||||
for_* \
|
||||
splitter \
|
||||
InferenceEngine::parallel_* \
|
||||
NOMINMAX \
|
||||
TBB_PREVIEW_NUMA_SUPPORT \
|
||||
IE_THREAD_*
|
||||
|
||||
# The EXAMPLE_PATH tag can be used to specify one or more files or directories
|
||||
# that contain example code fragments that are included (see the \include
|
||||
# command).
|
||||
|
||||
EXAMPLE_PATH = template_extension \
|
||||
../inference-engine/samples
|
||||
EXAMPLE_PATH = "@CMAKE_CURRENT_SOURCE_DIR@"
|
||||
|
||||
# If the value of the EXAMPLE_PATH tag contains directories, you can use the
|
||||
# EXAMPLE_PATTERNS tag to specify one or more wildcard pattern (like *.cpp and
|
||||
|
||||
@@ -37,17 +37,15 @@
|
||||
</tab>
|
||||
<tab type="user" title="Model Optimizations Techniques" url="@ref openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques"/>
|
||||
<tab type="user" title="Cutting off Parts of a Model" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Cutting_Model"/>
|
||||
<tab type="usergroup" title="Sub-graph Replacement in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer">
|
||||
<tab type="user" title="[DEPRECATED] Case-Study: Converting SSD models created with the TensorFlow* Object Detection API" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_SSD_ObjectDetection_API"/>
|
||||
<tab type="user" title="[DEPRECATED] Case-Study: Converting Faster R-CNN models created with the TensorFlow* Object Detection API" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_TensorFlow_Faster_RCNN_ObjectDetection_API"/>
|
||||
</tab>
|
||||
<tab type="user" title="Sub-graph Replacement in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Subgraph_Replacement_Model_Optimizer"/>
|
||||
<tab type="user" title="Supported Framework Layers" url="@ref openvino_docs_MO_DG_prepare_model_Supported_Frameworks_Layers"/>
|
||||
<tab type="user" title="[DEPRECATED] IR Notation Reference" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_Legacy_IR_Layers_Catalog_Spec"/>
|
||||
<tab type="user" title="IR suitable for INT8 inference" url="@ref openvino_docs_MO_DG_prepare_model_convert_model_IR_suitable_for_INT8_inference"/>
|
||||
</tab>
|
||||
<tab type="usergroup" title="Custom Layers in Model Optimizer" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer">
|
||||
<tab type="usergroup" title="Model Optimizer Extensibility" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Customize_Model_Optimizer">
|
||||
<tab type="user" title="Extending Model Optimizer with New Primitives" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_with_New_Primitives"/>
|
||||
<tab type="user" title="Extending MXNet Model Optimizer with New Primitives" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives"/>
|
||||
<tab type="user" title="Extending Model Optimizer with Caffe* Python Layers" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_Model_Optimizer_With_Caffe_Python_Layers"/>
|
||||
<tab type="user" title="Extending Model Optimizer for Custom MXNet* Operations" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Extending_MXNet_Model_Optimizer_with_New_Primitives"/>
|
||||
<tab type="user" title="Legacy Mode for Caffe* Custom Layers" url="@ref openvino_docs_MO_DG_prepare_model_customize_model_optimizer_Legacy_Mode_for_Caffe_Custom_Layers"/>
|
||||
<tab type="user" title="[DEPRECATED] Offloading Sub-Graph Inference" url="https://docs.openvinotoolkit.org/2020.1/_docs_MO_DG_prepare_model_customize_model_optimizer_Offloading_Sub_Graph_Inference.html"/>
|
||||
</tab>
|
||||
@@ -57,10 +55,10 @@
|
||||
</tab>
|
||||
<!-- Model Downloader -->
|
||||
<tab id="model_downloader" type="user" title="Model Downloader" url="@ref omz_tools_downloader_README"/>
|
||||
<!-- Custom Layers Guide -->
|
||||
<tab type="usergroup" title="Custom Layers Guide" url="@ref openvino_docs_HOWTO_Custom_Layers_Guide"></tab>
|
||||
</tab>
|
||||
|
||||
<!-- Custom Operations Guide -->
|
||||
<tab type="usergroup" title="Custom Operations Guide" url="@ref openvino_docs_HOWTO_Custom_Layers_Guide"></tab>
|
||||
</tab>
|
||||
|
||||
<!-- Intermediate Representation and Operations Sets -->
|
||||
<tab id="intermediate_representaton_and_operations_sets" type="usergroup" title="Intermediate Representation and Operations Sets" url="@ref openvino_docs_MO_DG_IR_and_opsets">
|
||||
<tab type="usergroup" title="Available Operations Sets" url="@ref openvino_docs_ops_opset">
|
||||
@@ -81,6 +79,7 @@
|
||||
<tab type="user" title="Atan-1" url="@ref openvino_docs_ops_arithmetic_Atan_1"/>
|
||||
<tab type="user" title="Atanh-3" url="@ref openvino_docs_ops_arithmetic_Atanh_3"/>
|
||||
<tab type="user" title="AvgPool-1" url="@ref openvino_docs_ops_pooling_AvgPool_1"/>
|
||||
<tab type="user" title="BatchNormInference-1" url="@ref openvino_docs_ops_normalization_BatchNormInference_1"/>
|
||||
<tab type="user" title="BatchNormInference-5" url="@ref openvino_docs_ops_normalization_BatchNormInference_5"/>
|
||||
<tab type="user" title="BatchToSpace-2" url="@ref openvino_docs_ops_movement_BatchToSpace_2"/>
|
||||
<tab type="user" title="BinaryConvolution-1" url="@ref openvino_docs_ops_convolution_BinaryConvolution_1"/>
|
||||
@@ -261,6 +260,7 @@
|
||||
<tab type="usergroup" title="Utilities to Validate Your Converted Model" url="@ref openvino_inference_engine_tools_cross_check_tool_README">
|
||||
<tab type="user" title="Using Cross Check Tool for Per-Layer Comparison Between Plugins" url="@ref openvino_inference_engine_tools_cross_check_tool_README"/>
|
||||
</tab>
|
||||
<tab type="user" title="Introduction to OpenVINO state API" url="@ref openvino_docs_IE_DG_network_state_intro"/>
|
||||
<tab type="usergroup" title="Supported Devices" url="@ref openvino_docs_IE_DG_supported_plugins_Supported_Devices">
|
||||
<tab type="usergroup" title="GPU Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_CL_DNN">
|
||||
<tab type="user" title="RemoteBlob API of GPU Plugin" url="@ref openvino_docs_IE_DG_supported_plugins_GPU_RemoteBlob_API"/>
|
||||
@@ -296,7 +296,7 @@
|
||||
|
||||
<!-- Compile Tool -->
|
||||
<tab type="user" title="Compile Tool" url="@ref openvino_inference_engine_tools_compile_tool_README"/>
|
||||
|
||||
|
||||
<!-- API References -->
|
||||
<tab id="api_references" type="usergroup" title="API References">
|
||||
<!-- IE C -->
|
||||
|
||||
@@ -9,7 +9,12 @@ GENERATE_TAGFILE = "@DOCS_BINARY_DIR@/ie_plugin_api.tag"
|
||||
EXTRACT_LOCAL_CLASSES = NO
|
||||
|
||||
INPUT = "@DOCS_BINARY_DIR@/docs/IE_PLUGIN_DG" \
|
||||
"@IE_SOURCE_DIR@/src/plugin_api"
|
||||
"@IE_SOURCE_DIR@/src/plugin_api" \
|
||||
"@IE_SOURCE_DIR@/src/transformations/include" \
|
||||
"@OpenVINO_MAIN_SOURCE_DIR@/openvino/itt/include/openvino"
|
||||
|
||||
|
||||
RECURSIVE = YES
|
||||
|
||||
FILE_PATTERNS = *.c \
|
||||
*.cpp \
|
||||
@@ -18,21 +23,20 @@ FILE_PATTERNS = *.c \
|
||||
*.hpp \
|
||||
*.md
|
||||
|
||||
EXCLUDE_PATTERNS = cnn_network_ngraph_impl.hpp \
|
||||
ie_imemory_state_internal.hpp \
|
||||
ie_memory_state_internal.hpp \
|
||||
ie_memory_state_base.hpp \
|
||||
convert_function_to_cnn_network.hpp \
|
||||
generic_ie.hpp
|
||||
EXCLUDE_PATTERNS = generic_ie.hpp
|
||||
|
||||
EXCLUDE_SYMBOLS =
|
||||
EXCLUDE_SYMBOLS = InferenceEngine::details
|
||||
|
||||
TAGFILES = @DOCS_BINARY_DIR@/ie_api.tag=.."
|
||||
|
||||
EXAMPLE_PATH = "@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/src" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/include" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/src/CMakeLists.txt" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/"
|
||||
CMakeLists.txt \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/examples"
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/CMakeLists.txt" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/transformations" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/template_plugin/tests/functional/shared_tests_instances/" \
|
||||
"@CMAKE_CURRENT_SOURCE_DIR@/snippets"
|
||||
"@IE_SOURCE_DIR@/tests/functional/plugin/shared/include" \
|
||||
|
||||
EXAMPLE_PATTERNS = *.cpp \
|
||||
*.hpp
|
||||
@@ -41,12 +45,17 @@ ENUM_VALUES_PER_LINE = 1
|
||||
|
||||
EXPAND_ONLY_PREDEF = YES
|
||||
|
||||
PREDEFINED = INFERENCE_ENGINE_API \
|
||||
INFERENCE_ENGINE_API_CPP \
|
||||
INFERENCE_ENGINE_API_CLASS \
|
||||
INFERENCE_ENGINE_DEPRECATED \
|
||||
IE_SUPPRESS_DEPRECATED_START \
|
||||
IE_SUPPRESS_DEPRECATED_END \
|
||||
IE_SUPPRESS_DEPRECATED_START_WIN \
|
||||
IE_SUPPRESS_DEPRECATED_END_WIN \
|
||||
IE_THREAD=IE_THREAD_TBB
|
||||
PREDEFINED = "INFERENCE_ENGINE_API=" \
|
||||
"INFERENCE_ENGINE_API_CPP=" \
|
||||
"INFERENCE_ENGINE_API_CLASS=" \
|
||||
"INFERENCE_ENGINE_DEPRECATED=" \
|
||||
"inference_engine_transformations_EXPORTS" \
|
||||
"TRANSFORMATIONS_API=" \
|
||||
"NGRAPH_HELPER_DLL_EXPORT=" \
|
||||
"NGRAPH_HELPER_DLL_IMPORT=" \
|
||||
"IE_SUPPRESS_DEPRECATED_START=" \
|
||||
"IE_SUPPRESS_DEPRECATED_END=" \
|
||||
"IE_SUPPRESS_DEPRECATED_START_WIN=" \
|
||||
"IE_SUPPRESS_DEPRECATED_END_WIN=" \
|
||||
"IE_THREAD=IE_THREAD_TBB" \
|
||||
"NGRAPH_RTTI_DECLARATION="
|
||||
|
||||
@@ -16,8 +16,10 @@
|
||||
</tab>
|
||||
<!-- API References -->
|
||||
<tab type="usergroup" title="API REFERENCE">
|
||||
<!-- IE Developer Package -->
|
||||
<tab type="modules" visible="yes" title="Inference Engine Plugin API Reference"/>
|
||||
<!-- IE Plugin API Reference -->
|
||||
<tab type="user" url="group__ie__dev__api.html" visible="yes" title="Inference Engine Plugin API Reference"/>
|
||||
<!-- IE Transformations API Reference -->
|
||||
<tab type="user" url="group__ie__transformation__api.html" visible="yes" title="Inference Engine Transformations API Reference"/>
|
||||
</tab>
|
||||
<tab type="usergroup" title="MAIN OPENVINO™ DOCS" url="../index.html"/>
|
||||
</navindex>
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
<!-- GET STARTED category -->
|
||||
<tab type="usergroup" title="GET STARTED" url="index.html">
|
||||
<!-- Install Directly -->
|
||||
<tab type="usergroup" title="Install Directly" url=""><!--automatically generated-->
|
||||
<tab type="usergroup" title="Installation Guides" url=""><!--automatically generated-->
|
||||
<tab type="usergroup" title="Linux" url="@ref openvino_docs_install_guides_installing_openvino_linux">
|
||||
<tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Linux* OS" url="@ref openvino_docs_install_guides_installing_openvino_linux"/>
|
||||
<tab type="user" title="[DEPRECATED] Install Intel® Distribution of OpenVINO™ toolkit for Linux with FPGA Support" url="@ref openvino_docs_install_guides_installing_openvino_linux_fpga"/>
|
||||
@@ -19,17 +19,18 @@
|
||||
<tab type="user" title="Raspbian OS" url="@ref openvino_docs_install_guides_installing_openvino_raspbian"/>
|
||||
</tab>
|
||||
<!-- Install From Images and Repositories -->
|
||||
<tab type="usergroup" title="Install From Images and Repositories" url=""><!--automatically generated-->
|
||||
<tab type="usergroup" title="Install From Images and Repositories" url="@ref openvino_docs_install_guides_installing_openvino_images">
|
||||
<tab type="usergroup" title="Docker" url="@ref openvino_docs_install_guides_installing_openvino_docker_linux">
|
||||
<tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Linux* from a Docker* Image" url="@ref openvino_docs_install_guides_installing_openvino_docker_linux"/>
|
||||
<tab type="user" title="Install Intel® Distribution of OpenVINO™ toolkit for Windows* from a Docker* Image" url="@ref openvino_docs_install_guides_installing_openvino_docker_windows"/>
|
||||
</tab>
|
||||
<tab type="user" title="Docker with DL Workbench" url="./workbench_docs_Workbench_DG_Install_from_Docker_Hub.html"/><!-- Link to the original Workbench topic -->
|
||||
<tab type="user" title="APT" url="@ref openvino_docs_install_guides_installing_openvino_apt"/>
|
||||
<tab type="user" title="YUM" url="@ref openvino_docs_install_guides_installing_openvino_yum"/>
|
||||
<tab type="user" title="Anaconda Cloud" url="@ref openvino_docs_install_guides_installing_openvino_conda"/>
|
||||
<tab type="user" title="Yocto" url="@ref openvino_docs_install_guides_installing_openvino_yocto"/>
|
||||
<tab type="user" title="PyPI" url="@ref openvino_docs_install_guides_installing_openvino_pip"/>
|
||||
<tab type="user" title="Build from Source" url="https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md"/>
|
||||
<tab type="user" title="Build from Source" url="https://github.com/openvinotoolkit/openvino/wiki/BuildingCode"/>
|
||||
</tab>
|
||||
<!-- Get Started Guides-->
|
||||
<tab type="usergroup" title="Get Started Guides" url=""><!--automatically generated-->
|
||||
@@ -37,7 +38,10 @@
|
||||
<tab type="user" title="Linux" url="@ref openvino_docs_get_started_get_started_linux"/>
|
||||
<tab type="user" title="Windows" url="@ref openvino_docs_get_started_get_started_windows"/>
|
||||
<tab type="user" title="macOS" url="@ref openvino_docs_get_started_get_started_macos"/>
|
||||
<tab type="user" title="Raspbian" url="@ref openvino_docs_get_started_get_started_raspbian"/>
|
||||
<tab type="user" title="Get Started with OpenVINO via DL Workbench" url="@ref openvino_docs_get_started_get_started_dl_workbench"/>
|
||||
<tab type="user" title="Legal Information" url="@ref openvino_docs_Legal_Information"/>
|
||||
<tab type="user" title="Introduction to DL Workbench" url="./openvino_docs_get_started_get_started_dl_workbench.html"/><!-- Link to the original Workbench topic -->
|
||||
</tab>
|
||||
<!-- Configuration for Hardware -->
|
||||
<tab type="usergroup" title="Configuration for Hardware" url=""><!--automatically generated-->
|
||||
@@ -56,6 +60,7 @@
|
||||
<tab type="user" title="Introduction" url="@ref openvino_docs_security_guide_introduction"/>
|
||||
<tab type="user" title="Using DL Workbench Securely" url="@ref openvino_docs_security_guide_workbench"/>
|
||||
<tab type="user" title="Using Encrypted Models" url="@ref openvino_docs_IE_DG_protecting_model_guide"/>
|
||||
<tab type="user" title="Security Add-on" url="@ref ovsa_get_started"/>
|
||||
</tab>
|
||||
</tab>
|
||||
|
||||
@@ -72,7 +77,7 @@
|
||||
<!-- Performance Benchmarks -->
|
||||
<tab type="usergroup" title="Performance Measures" url="@ref openvino_docs_performance_benchmarks">
|
||||
<tab type="user" title="Performance Information Frequently Asked Questions" url="@ref openvino_docs_performance_benchmarks_faq"/>
|
||||
<tab type="user" title="Download Performance Data Spreadsheet in MS Excel* Format" url="https://docs.openvinotoolkit.org/downloads/benchmark_files/OV-2021.1-Download-Excel.xlsx"/>
|
||||
<tab type="user" title="Download Performance Data Spreadsheet in MS Excel* Format" url="https://docs.openvinotoolkit.org/downloads/benchmark_files/OV-2021.2-Download-Excel.xlsx"/>
|
||||
<tab type="user" title="INT8 vs. FP32 Comparison on Select Networks and Platforms" url="@ref openvino_docs_performance_int8_vs_fp32"/>
|
||||
</tab>
|
||||
<tab type="user" title="Performance Optimization Guide" url="@ref openvino_docs_optimization_guide_dldt_optimization_guide"/>
|
||||
@@ -120,14 +125,15 @@
|
||||
<tab type="user" title="Hello Query Device C++ Sample" url="@ref openvino_inference_engine_samples_hello_query_device_README"/>
|
||||
<tab type="user" title="Hello Query Device Python* Sample" url="@ref openvino_inference_engine_ie_bridges_python_sample_hello_query_device_README"/>
|
||||
<tab type="user" title="nGraph Function C++ Sample" url="@ref openvino_inference_engine_samples_ngraph_function_creation_sample_README"/>
|
||||
<tab type="user" title="nGraph Function Python Sample" url="@ref openvino_inference_engine_ie_bridges_python_samples_ngraph_function_creation_sample_README"/>
|
||||
<tab type="user" title="Object Detection C++ Sample SSD" url="@ref openvino_inference_engine_samples_object_detection_sample_ssd_README"/>
|
||||
<tab type="user" title="Object Detection Python* Sample SSD" url="@ref openvino_inference_engine_ie_bridges_python_sample_object_detection_sample_ssd_README"/>
|
||||
<tab type="user" title="Object Detection C Sample SSD" url="@ref openvino_inference_engine_ie_bridges_c_samples_object_detection_sample_ssd_README"/>
|
||||
<tab type="user" title="Automatic Speech Recognition C++ Sample" url="@ref openvino_inference_engine_samples_speech_sample_README"/>
|
||||
<tab type="user" title="Neural Style Transfer C++ Sample" url="@ref openvino_inference_engine_samples_style_transfer_sample_README"/>
|
||||
<tab type="user" title="Neural Style Transfer Python* Sample" url="@ref openvino_inference_engine_ie_bridges_python_sample_style_transfer_sample_README"/>
|
||||
<tab type="user" title="Benchmark C++ App" url="@ref openvino_inference_engine_samples_benchmark_app_README"/>
|
||||
<tab type="user" title="Benchmark Python* App" url="@ref openvino_inference_engine_tools_benchmark_tool_README"/>
|
||||
<tab type="user" title="Benchmark C++ Tool" url="@ref openvino_inference_engine_samples_benchmark_app_README"/>
|
||||
<tab type="user" title="Benchmark Python* Tool" url="@ref openvino_inference_engine_tools_benchmark_tool_README"/>
|
||||
</tab>
|
||||
|
||||
<!-- DL Streamer Examples -->
|
||||
@@ -146,7 +152,8 @@
|
||||
<tab type="user" title="Benchmark Sample" url="@ref gst_samples_benchmark_README"/>
|
||||
</tab>
|
||||
<tab type="usergroup" title="Add-Ons" url="">
|
||||
<tab type="user" title="Model Server" url="@ref openvino_docs_ovms"/>
|
||||
<tab type="user" title="Model Server" url="@ref openvino_docs_ovms"/>
|
||||
<tab type="user" title="Security Add-on" url="./ovsa_get_started.html"/>
|
||||
</tab>
|
||||
</tab>
|
||||
|
||||
|
||||
3
docs/get_started/dl_workbench_img/DL_Workbench.jpg
Normal file
3
docs/get_started/dl_workbench_img/DL_Workbench.jpg
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:6675f4b68df7eaa3d6188ecc8b5d53be572cf9c92f53abac3bc6416e6b428d0c
|
||||
size 196146
|
||||
3
docs/get_started/dl_workbench_img/Get_Started_Page-b.png
Normal file
3
docs/get_started/dl_workbench_img/Get_Started_Page-b.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:539deb67a7d1c0e8b0c037f8e7488445be0895e8e717bed5cfec64131936870c
|
||||
size 198207
|
||||
3
docs/get_started/dl_workbench_img/convert_model.png
Normal file
3
docs/get_started/dl_workbench_img/convert_model.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:2925e58a71d684e23776e6ed55cc85d9085b3ba5e484720528aeac5fa59f9e3a
|
||||
size 55404
|
||||
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f4a52661c05977d878c614c4f8510935982ce8a0e120e05690307d7c95e4ab31
|
||||
size 73999
|
||||
3
docs/get_started/dl_workbench_img/dataset_loading.png
Normal file
3
docs/get_started/dl_workbench_img/dataset_loading.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ddb0550f3f04c177ec116d6c41e6d3a2ac1fedea7121e10ad3836f84c86a5c78
|
||||
size 35278
|
||||
3
docs/get_started/dl_workbench_img/generate_dataset.png
Normal file
3
docs/get_started/dl_workbench_img/generate_dataset.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:f1e329304ff3d586bb2b8e2442333ede085593f40b1567bd5250508d33d3b9f9
|
||||
size 32668
|
||||
3
docs/get_started/dl_workbench_img/import_model_01.png
Normal file
3
docs/get_started/dl_workbench_img/import_model_01.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:605515f25a746579d3622b7a274c7dece95e4fbfc6c1817f99431c1abf116070
|
||||
size 55409
|
||||
3
docs/get_started/dl_workbench_img/import_model_02.png
Normal file
3
docs/get_started/dl_workbench_img/import_model_02.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:0ca48900ca8f6733c4a8ebc957517fbed80f3c080f53d251eeebb01f082c8f83
|
||||
size 55646
|
||||
3
docs/get_started/dl_workbench_img/inference_banner.png
Normal file
3
docs/get_started/dl_workbench_img/inference_banner.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:ba94c2c0e0cb98b9e43c876d060d8a7965182461b0d505167eb71134d4975b8f
|
||||
size 58204
|
||||
3
docs/get_started/dl_workbench_img/inference_complete.png
Normal file
3
docs/get_started/dl_workbench_img/inference_complete.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:75628b7d02f1fe5c25a233fa16ae1c6c3d5060bf3d15bc7b1e5b9ea71ce50b73
|
||||
size 50227
|
||||
3
docs/get_started/dl_workbench_img/model_loading.png
Normal file
3
docs/get_started/dl_workbench_img/model_loading.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:72ab36115cecfee4b215e1b21911ebac3706e513b72eea7bb829932f7bdb3a19
|
||||
size 70515
|
||||
3
docs/get_started/dl_workbench_img/selected.png
Normal file
3
docs/get_started/dl_workbench_img/selected.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:70aee6f0fd30c8e2139950c6bc831dc11b2616ea8f04b991efc9b3f5b7b11ce6
|
||||
size 88891
|
||||
3
docs/get_started/dl_workbench_img/validation_dataset.png
Normal file
3
docs/get_started/dl_workbench_img/validation_dataset.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:c1e297da7f7dfd2af7a0ba47ba1e5c14376f21b15dfcde1fe6f5ad3412ad8feb
|
||||
size 21296
|
||||
140
docs/get_started/get_started_dl_workbench.md
Normal file
140
docs/get_started/get_started_dl_workbench.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Get Started with OpenVINO™ Toolkit via Deep Learning Workbench {#openvino_docs_get_started_get_started_dl_workbench}
|
||||
|
||||
The OpenVINO™ toolkit optimizes and runs Deep Learning Neural Network models on Intel® hardware. This guide helps you get started with the OpenVINO™ toolkit via the Deep Learning Workbench (DL Workbench) on Linux\*, Windows\*, or macOS\*.
|
||||
|
||||
In this guide, you will:
|
||||
* Learn the OpenVINO™ inference workflow.
|
||||
* Start DL Workbench on Linux. Links to instructions for other operating systems are provided as well.
|
||||
* Create a project and run a baseline inference.
|
||||
|
||||
[DL Workbench](@ref workbench_docs_Workbench_DG_Introduction) is a web-based graphical environment that enables you to easily use various sophisticated
|
||||
OpenVINO™ toolkit components:
|
||||
* [Model Downloader](@ref omz_tools_downloader_README) to download models from the [Intel® Open Model Zoo](@ref omz_models_intel_index)
|
||||
with pretrained models for a range of different tasks
|
||||
* [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) to transform models into
|
||||
the Intermediate Representation (IR) format
|
||||
* [Post-Training Optimization toolkit](@ref pot_README) to calibrate a model and then execute it in the
|
||||
INT8 precision
|
||||
* [Accuracy Checker](@ref omz_tools_accuracy_checker_README) to determine the accuracy of a model
|
||||
* [Benchmark Tool](@ref openvino_inference_engine_samples_benchmark_app_README) to estimate inference performance on supported devices
|
||||
|
||||

|
||||
|
||||
DL Workbench supports the following scenarios:
|
||||
1. [Calibrate the model in INT8 precision](@ref workbench_docs_Workbench_DG_Int_8_Quantization)
|
||||
2. [Find the best combination](@ref workbench_docs_Workbench_DG_View_Inference_Results) of inference parameters: [number of streams and batches](../optimization_guide/dldt_optimization_guide.md)
|
||||
3. [Analyze inference results](@ref workbench_docs_Workbench_DG_Visualize_Model) and [compare them across different configurations](@ref workbench_docs_Workbench_DG_Compare_Performance_between_Two_Versions_of_Models)
|
||||
4. [Implement an optimal configuration into your application](@ref workbench_docs_Workbench_DG_Deploy_and_Integrate_Performance_Criteria_into_Application)
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Prerequisite | Linux* | Windows* | macOS*
|
||||
:----- | :----- |:----- |:-----
|
||||
Operating system|Ubuntu\* 18.04. Other Linux distributions, such as Ubuntu\* 16.04 and CentOS\* 7, are not validated.|Windows\* 10 | macOS\* 10.15 Catalina
|
||||
CPU | Intel® Core™ i5| Intel® Core™ i5 | Intel® Core™ i5
|
||||
GPU| Intel® Pentium® processor N4200/5 with Intel® HD Graphics | Not supported| Not supported
|
||||
HDDL, Myriad| Intel® Neural Compute Stick 2 <br> Intel® Vision Accelerator Design with Intel® Movidius™ VPUs| Not supported | Not supported
|
||||
Available RAM space| 4 GB| 4 GB| 4 GB
|
||||
Available storage space | 8 GB + space for imported artifacts| 8 GB + space for imported artifacts| 8 GB + space for imported artifacts
|
||||
Docker\*| Docker CE 18.06.1 | Docker Desktop 2.1.0.1|Docker CE 18.06.1
|
||||
Web browser| Google Chrome\* 76 <br> Browsers like Mozilla Firefox\* 71 or Apple Safari\* 12 are not validated. <br> Microsoft Internet Explorer\* is not supported.| Google Chrome\* 76 <br> Browsers like Mozilla Firefox\* 71 or Apple Safari\* 12 are not validated. <br> Microsoft Internet Explorer\* is not supported.| Google Chrome\* 76 <br>Browsers like Mozilla Firefox\* 71 or Apple Safari\* 12 are not validated. <br> Microsoft Internet Explorer\* is not supported.
|
||||
Resolution| 1440 x 890|1440 x 890|1440 x 890
|
||||
Internet|Optional|Optional|Optional
|
||||
Installation method| From Docker Hub <br> From OpenVINO™ toolkit package|From Docker Hub|From Docker Hub
|
||||
|
||||
## Start DL Workbench
|
||||
|
||||
This section provides instructions to run the DL Workbench on Linux from Docker Hub.
|
||||
|
||||
Use the command below to pull the latest Docker image with the application and run it:
|
||||
|
||||
```bash
|
||||
wget https://raw.githubusercontent.com/openvinotoolkit/workbench_aux/master/start_workbench.sh && bash start_workbench.sh
|
||||
```
|
||||
DL Workbench uses [authentication tokens](@ref workbench_docs_Workbench_DG_Authentication) to access the application. A token
|
||||
is generated automatically and displayed in the console output when you run the container for the first time. Once the command is executed, follow the link with the token. The **Get Started** page opens:
|
||||

|
||||
|
||||
For details and more installation options, visit the links below:
|
||||
* [Install DL Workbench from Docker Hub* on Linux* OS](@ref workbench_docs_Workbench_DG_Install_from_DockerHub_Linux)
|
||||
* [Install DL Workbench from Docker Hub on Windows*](@ref workbench_docs_Workbench_DG_Install_from_Docker_Hub_Win)
|
||||
* [Install DL Workbench from Docker Hub on macOS*](@ref workbench_docs_Workbench_DG_Install_from_Docker_Hub_mac)
|
||||
* [Install DL Workbench from the OpenVINO toolkit package on Linux](@ref workbench_docs_Workbench_DG_Install_from_Package)
|
||||
|
||||
## <a name="workflow-overview"></a>OpenVINO™ DL Workbench Workflow Overview
|
||||
|
||||
The simplified OpenVINO™ DL Workbench workflow is:
|
||||
1. **Get a trained model** for your inference task. Example inference tasks: pedestrian detection, face detection, vehicle detection, license plate recognition, head pose.
|
||||
2. **Run the trained model through the Model Optimizer** to convert the model to an Intermediate Representation, which consists of a pair of `.xml` and `.bin` files that are used as the input for Inference Engine.
|
||||
3. **Run inference against the Intermediate Representation** (optimized model) and output inference results.
|
||||
|
||||
## Run Baseline Inference
|
||||
|
||||
This section illustrates a sample use case of how to infer a pretrained model from the [Intel® Open Model Zoo](@ref omz_models_intel_index) with an autogenerated noise dataset on a CPU device.
|
||||
|
||||
<iframe width="560" height="315" src="https://www.youtube.com/embed/9TRJwEmY0K4" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
|
||||
|
||||
|
||||
Once you log in to the DL Workbench, create a project, which is a combination of a model, a dataset, and a target device. Follow the steps below:
|
||||
|
||||
### Step 1. Open a New Project
|
||||
|
||||
On the the **Active Projects** page, click **Create** to open the **Create Project** page:
|
||||

|
||||
|
||||
### Step 2. Choose a Pretrained Model
|
||||
|
||||
Click **Import** next to the **Model** table on the **Create Project** page. The **Import Model** page opens. Select the squeezenet1.1 model from the Open Model Zoo and click **Import**.
|
||||

|
||||
|
||||
### Step 3. Convert the Model into Intermediate Representation
|
||||
|
||||
The **Convert Model to IR** tab opens. Keep the FP16 precision and click **Convert**.
|
||||

|
||||
|
||||
You are directed back to the **Create Project** page where you can see the status of the chosen model.
|
||||

|
||||
|
||||
### Step 4. Generate a Noise Dataset
|
||||
|
||||
Scroll down to the **Validation Dataset** table. Click **Generate** next to the table heading.
|
||||

|
||||
|
||||
The **Autogenerate Dataset** page opens. Click **Generate**.
|
||||

|
||||
|
||||
You are directed back to the **Create Project** page where you can see the status of the dataset.
|
||||

|
||||
|
||||
### Step 5. Create the Project and Run a Baseline Inference
|
||||
|
||||
On the **Create Project** page, select the imported model, CPU target, and the generated dataset. Click **Create**.
|
||||

|
||||
|
||||
The inference starts and you cannot proceed until it is done.
|
||||

|
||||
|
||||
Once the inference is complete, the **Projects** page opens automatically. Find your inference job in the **Projects Settings** table indicating all jobs.
|
||||

|
||||
|
||||
Congratulations, you have performed your first inference in the OpenVINO DL Workbench. Now you can proceed to:
|
||||
* [Select the inference](@ref workbench_docs_Workbench_DG_Run_Single_Inference)
|
||||
* [Visualize statistics](@ref workbench_docs_Workbench_DG_Visualize_Model)
|
||||
* [Experiment with model optimization](@ref workbench_docs_Workbench_DG_Int_8_Quantization)
|
||||
and inference options to profile the configuration
|
||||
|
||||
For detailed instructions to create a new project, visit the links below:
|
||||
* [Select a model](@ref workbench_docs_Workbench_DG_Select_Model)
|
||||
* [Select a dataset](@ref workbench_docs_Workbench_DG_Select_Datasets)
|
||||
* [Select a target and an environment](@ref workbench_docs_Workbench_DG_Select_Environment). This can be your local workstation or a remote target. If you use a remote target, [register the remote machine](@ref workbench_docs_Workbench_DG_Add_Remote_Target) first.
|
||||
|
||||
## Additional Resources
|
||||
|
||||
* [OpenVINO™ Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
|
||||
* [OpenVINO™ Toolkit Overview](../index.md)
|
||||
* [DL Workbench Installation Guide](@ref workbench_docs_Workbench_DG_Install_Workbench)
|
||||
* [Inference Engine Developer Guide](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md)
|
||||
* [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
|
||||
* [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md)
|
||||
* [Overview of OpenVINO™ Toolkit Pre-Trained Models](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models)
|
||||
* [OpenVINO™ Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/inference-tutorials-generic)
|
||||
@@ -195,7 +195,7 @@ You will perform the following steps:
|
||||
|
||||
Each demo and code sample is a separate application, but they use the same behavior and components. The code samples and demo applications are:
|
||||
|
||||
* [Code Samples](../IE_DG/Samples_Overview.html) - Small console applications that show how to utilize specific OpenVINO capabilities within an application and execute specific tasks such as loading a model, running inference, querying specific device capabilities, and more.
|
||||
* [Code Samples](../IE_DG/Samples_Overview.md) - Small console applications that show how to utilize specific OpenVINO capabilities within an application and execute specific tasks such as loading a model, running inference, querying specific device capabilities, and more.
|
||||
|
||||
* [Demo Applications](@ref omz_demos_README) - Console applications that provide robust application templates to support developers in implementing specific deep learning scenarios. They may also involve more complex processing pipelines that gather analysis from several models that run inference simultaneously. For example concurrently detecting a person in a video stream and detecting attributes such as age, gender and/or emotions.
|
||||
|
||||
@@ -370,7 +370,7 @@ As an alternative, the Intel® Distribution of OpenVINO™ toolkit includes two
|
||||
|
||||
### <a name="run-image-classification"></a>Step 4: Run the Image Classification Code Sample
|
||||
|
||||
> **NOTE**: The Image Classification code sample is automatically compiled when you ran the Image Classification demo script. If you want to compile it manually, see the [Inference Engine Code Samples Overview](../IE_DG/Samples_Overview.html#build_samples_linux) section.
|
||||
> **NOTE**: The Image Classification code sample is automatically compiled when you ran the Image Classification demo script. If you want to compile it manually, see the *Build the Sample Applications on Linux* section in the [Inference Engine Code Samples Overview](../IE_DG/Samples_Overview.md).
|
||||
|
||||
To run the **Image Classification** code sample with an input image on the IR:
|
||||
|
||||
|
||||
109
docs/get_started/get_started_raspbian.md
Normal file
109
docs/get_started/get_started_raspbian.md
Normal file
@@ -0,0 +1,109 @@
|
||||
# Get Started with OpenVINO™ Toolkit on Raspbian* OS {#openvino_docs_get_started_get_started_raspbian}
|
||||
|
||||
The OpenVINO™ toolkit optimizes and runs Deep Learning Neural Network models on Intel® hardware. This guide helps you get started with the OpenVINO™ toolkit you installed on Raspbian* OS.
|
||||
|
||||
In this guide, you will:
|
||||
* Learn the OpenVINO™ inference workflow.
|
||||
* Build and run sample code using detailed instructions.
|
||||
|
||||
## <a name="openvino-components"></a>OpenVINO™ Toolkit Components
|
||||
On Raspbian* OS, the OpenVINO™ toolkit consists of the following components:
|
||||
* **Inference Engine:** The software libraries that run inference against the Intermediate Representation (optimized model) to produce inference results.
|
||||
* **MYRIAD Plugin:** The plugin developed for inference of neural networks on Intel® Neural Compute Stick 2.
|
||||
|
||||
> **NOTE**:
|
||||
> * The OpenVINO™ package for Raspberry* does not include the [Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md). To convert models to Intermediate Representation (IR), you need to install it separately to your host machine.
|
||||
> * The package does not include the Open Model Zoo demo applications. You can download them separately from the [Open Models Zoo repository](https://github.com/opencv/open_model_zoo).
|
||||
|
||||
In addition, [code samples](../IE_DG/Samples_Overview.md) are provided to help you get up and running with the toolkit.
|
||||
|
||||
## <a name="openvino-installation"></a>Intel® Distribution of OpenVINO™ Toolkit Directory Structure
|
||||
This guide assumes you completed all Intel® Distribution of OpenVINO™ toolkit installation and configuration steps. If you have not yet installed and configured the toolkit, see [Install Intel® Distribution of OpenVINO™ toolkit for Raspbian*](../install_guides/installing-openvino-raspbian.md).
|
||||
|
||||
The OpenVINO toolkit for Raspbian* OS is distributed without installer. This document refers to the directory to which you unpacked the toolkit package as `<INSTALL_DIR>`.
|
||||
|
||||
The primary tools for deploying your models and applications are installed to the `<INSTALL_DIR>/deployment_tools` directory.
|
||||
<details>
|
||||
<summary><strong>Click for the <code>deployment_tools</code> directory structure</strong></summary>
|
||||
|
||||
|
||||
| Directory | Description |
|
||||
|:----------------------------------------|:--------------------------------------------------------------------------------------|
|
||||
| `inference_engine/` | Inference Engine directory. Contains Inference Engine API binaries and source files, samples and extensions source files, and resources like hardware drivers.|
|
||||
| `external/` | Third-party dependencies and drivers.|
|
||||
| `include/` | Inference Engine header files. For API documentation, see the [Inference Engine API Reference](./annotated.html). |
|
||||
| `lib/` | Inference Engine libraries.|
|
||||
| `samples/` | Inference Engine samples. Contains source code for C++ and Python* samples and build scripts. See the [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md). |
|
||||
| `share/` | CMake configuration files for linking with Inference Engine.|
|
||||
|
||||
</details>
|
||||
|
||||
## <a name="workflow-overview"></a>OpenVINO™ Workflow Overview
|
||||
|
||||
The OpenVINO™ workflow on Raspbian* OS is as follows:
|
||||
1. **Get a pre-trained model** for your inference task. If you want to use your model for inference, the model must be converted to the `.bin` and `.xml` Intermediate Representation (IR) files, which are used as input by Inference Engine. On Raspberry PI, OpenVINO™ toolkit includes only the Inference Engine module. The Model Optimizer is not supported on this platform. To get the optimized models you can use one of the following options:
|
||||
|
||||
* Download public and Intel's pre-trained models from the [Open Model Zoo](https://github.com/opencv/open_model_zoo) using [Model Downloader tool](@ref omz_tools_downloader_README).
|
||||
<br> For more information on pre-trained models, see [Pre-Trained Models Documentation](@ref omz_models_intel_index)
|
||||
|
||||
* Convert a model using the Model Optimizer from a full installation of Intel® Distribution of OpenVINO™ toolkit on one of the supported platforms. Installation instructions are available:
|
||||
* [Installation Guide for macOS*](../install_guides/installing-openvino-macos.md)
|
||||
* [Installation Guide for Windows*](../install_guides/installing-openvino-windows.md)
|
||||
* [Installation Guide for Linux*](../install_guides/installing-openvino-linux.md)
|
||||
2. **Use the Inference Engine API in the application** to run inference against the Intermediate Representation (optimized model) and output inference results. The application can be an OpenVINO™ sample or your own application.
|
||||
|
||||
## <a name="using-sample"></a>Build and Run Code Samples
|
||||
|
||||
Follow the steps below to run pre-trained Face Detection network using Inference Engine samples from the OpenVINO toolkit.
|
||||
|
||||
1. Create a samples build directory. This example uses a directory named `build`:
|
||||
```sh
|
||||
mkdir build && cd build
|
||||
```
|
||||
2. Build the Object Detection Sample with the following command:
|
||||
```sh
|
||||
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS="-march=armv7-a" /opt/intel/openvino/deployment_tools/inference_engine/samples/cpp
|
||||
```
|
||||
```sh
|
||||
make -j2 object_detection_sample_ssd
|
||||
```
|
||||
3. Download the pre-trained Face Detection model with the Model Downloader:
|
||||
|
||||
```sh
|
||||
git clone --depth 1 https://github.com/openvinotoolkit/open_model_zoo
|
||||
cd open_model_zoo/tools/downloader
|
||||
python3 -m pip install -r requirements.in
|
||||
python3 downloader.py --name face-detection-adas-0001
|
||||
```
|
||||
|
||||
4. Run the sample, specifying the model and path to the input image:
|
||||
```sh
|
||||
./armv7l/Release/object_detection_sample_ssd -m face-detection-adas-0001.xml -d MYRIAD -i <path_to_image>
|
||||
```
|
||||
The application outputs an image (`out_0.bmp`) with detected faced enclosed in rectangles.
|
||||
|
||||
## <a name="basic-guidelines-sample-application"></a>Basic Guidelines for Using Code Samples
|
||||
|
||||
Following are some basic guidelines for executing the OpenVINO™ workflow using the code samples:
|
||||
|
||||
1. Before using the OpenVINO™ samples, always set up the environment:
|
||||
```sh
|
||||
source <INSTALL_DIR>/bin/setupvars.sh
|
||||
```
|
||||
2. Have the directory path for the following:
|
||||
- Code Sample binaries
|
||||
- Media: Video or image. Many sources are available from which you can download video media to use the code samples and demo applications, like https://videos.pexels.com and https://images.google.com.
|
||||
- Model in the IR format (.bin and .xml files).
|
||||
|
||||
|
||||
## Additional Resources
|
||||
|
||||
Use these resources to learn more about the OpenVINO™ toolkit:
|
||||
|
||||
* [OpenVINO™ Release Notes](https://software.intel.com/en-us/articles/OpenVINO-RelNotes)
|
||||
* [OpenVINO™ Toolkit Overview](../index.md)
|
||||
* [Inference Engine Developer Guide](../IE_DG/Deep_Learning_Inference_Engine_DevGuide.md)
|
||||
* [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
|
||||
* [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md)
|
||||
* [Overview of OpenVINO™ Toolkit Pre-Trained Models](https://software.intel.com/en-us/openvino-toolkit/documentation/pretrained-models)
|
||||
* [OpenVINO™ Hello World Face Detection Exercise](https://github.com/intel-iot-devkit/inference-tutorials-generic)
|
||||
@@ -12,8 +12,8 @@
|
||||
* [Introducing int8 quantization for fast CPU inference using OpenVINO](https://www.intel.ai/introducing-int8-quantization-for-fast-cpu-inference-using-openvino/)
|
||||
* [Accelerate Vision-based AI with Intel® Distribution of OpenVINO™ Toolkit](https://www.intel.ai/accelerate-vision-based-ai-with-intel-distribution-of-openvino-toolkit/)
|
||||
|
||||
## Custom Layers Guide
|
||||
To learn about what is *custom layers* and how to work with them in the Deep Learning Deployment Toolkit, see the [Custom Layers Guide](../HOWTO/Custom_Layers_Guide.md).
|
||||
## Custom Operations Guide
|
||||
To learn about what is *custom operation* and how to work with them in the Deep Learning Deployment Toolkit, see the [Custom Operations Guide](../HOWTO/Custom_Layers_Guide.md).
|
||||
|
||||
## Introducing OpenVINO™ and Computer Vision | IoT Developer Show Season 2 | Intel Software
|
||||
|
||||
@@ -65,4 +65,4 @@ To learn about what is *custom layers* and how to work with them in the Deep Lea
|
||||
[performance-boost-dl]: ../img/performance-boost-DL-algorithm.jpg
|
||||
[digital-security-surveillance]: ../img/digital-security-surveillance.jpg
|
||||
[robotics-with-AI]: ../img/robotics-with-AI.jpg
|
||||
[people-counter-syestem]: ../img/people-counter-syestem.jpg
|
||||
[people-counter-syestem]: ../img/people-counter-syestem.jpg
|
||||
|
||||
3
docs/img/MO_connection_example_1.png
Normal file
3
docs/img/MO_connection_example_1.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:1e432a8beb8290adec0d498f5c4aff63eed36c7fdf4bade3db0e0b0bdc5ff70f
|
||||
size 26737
|
||||
3
docs/img/MO_conversion_pipeline.png
Normal file
3
docs/img/MO_conversion_pipeline.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:bb7beec6a00edbde13eca331d513f72b35b442f41f3a0c0e84f5185a1dcbf9ec
|
||||
size 50220
|
||||
3
docs/img/MO_graph_after_extractors.png
Normal file
3
docs/img/MO_graph_after_extractors.png
Normal file
@@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:bae3a4992d6d4f157674496b6230ae68f0cc33a58e4f305eeb935c09b5278409
|
||||
size 28518
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user