diff --git a/build-instruction.md b/build-instruction.md index 9d16305d98b..b17d51748d0 100644 --- a/build-instruction.md +++ b/build-instruction.md @@ -46,9 +46,6 @@ The open source version of Inference Engine includes the following plugins: | MYRIAD plugin | Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2, Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X | | Heterogeneous plugin | Heterogeneous plugin enables computing for inference on one network on several Intel® devices. | -Inference Engine plugin for Intel® FPGA is distributed only in a binary form, -as a part of [Intel® Distribution of OpenVINO™]. - ## Build on Linux\* Systems The software was validated on: diff --git a/docs/HOWTO/Custom_Layers_Guide.md b/docs/HOWTO/Custom_Layers_Guide.md index 40700917808..2ded4bf5669 100644 --- a/docs/HOWTO/Custom_Layers_Guide.md +++ b/docs/HOWTO/Custom_Layers_Guide.md @@ -195,7 +195,7 @@ For a step-by-step walk-through creating and executing a custom layer, see [Cust - Intel® Distribution of OpenVINO™ toolkit home page: [https://software.intel.com/en-us/openvino-toolkit](https://software.intel.com/en-us/openvino-toolkit) - OpenVINO™ toolkit online documentation: [https://docs.openvinotoolkit.org](https://docs.openvinotoolkit.org) - [Model Optimizer Developer Guide](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) -- [Kernel Extensivility in the Inference Engine Developer Guide](../IE_DG/Integrate_your_kernels_into_IE.md) +- [Inference Engine Extensibility Mechanism](../IE_DG/Extensibility_DG/Intro.md) - [Inference Engine Samples Overview](../IE_DG/Samples_Overview.md) - [Overview of OpenVINO™ Toolkit Pre-Trained Models](@ref omz_models_intel_index) - [Inference Engine Tutorials](https://github.com/intel-iot-devkit/inference-tutorials-generic) diff --git a/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md index 1144f4c9d70..20f415a7317 100644 --- a/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md +++ b/docs/IE_DG/Deep_Learning_Inference_Engine_DevGuide.md @@ -42,8 +42,6 @@ inference of a pre-trained and optimized deep learning model and a set of sample ## Table of Contents -* [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) - * [Inference Engine API Changes History](API_Changes.md) * [Introduction to Inference Engine](inference_engine_intro.md) @@ -76,7 +74,6 @@ inference of a pre-trained and optimized deep learning model and a set of sample * [Supported Devices](supported_plugins/Supported_Devices.md) * [GPU](supported_plugins/CL_DNN.md) * [CPU](supported_plugins/CPU.md) - * [FPGA](supported_plugins/FPGA.md) * [VPU](supported_plugins/VPU.md) * [MYRIAD](supported_plugins/MYRIAD.md) * [HDDL](supported_plugins/HDDL.md) @@ -88,4 +85,4 @@ inference of a pre-trained and optimized deep learning model and a set of sample * [Known Issues](Known_Issues_Limitations.md) -**Typical Next Step:** [Introduction to Intel® Deep Learning Deployment Toolkit](Introduction.md) +**Typical Next Step:** [Introduction to Inference Engine](inference_engine_intro.md) diff --git a/docs/IE_DG/Glossary.md b/docs/IE_DG/Glossary.md index 780a3d5fcab..047d4484a66 100644 --- a/docs/IE_DG/Glossary.md +++ b/docs/IE_DG/Glossary.md @@ -64,7 +64,7 @@ Glossary of terms used in the Inference Engine | :--- | :--- | | Batch | Number of images to analyze during one call of infer. Maximum batch size is a property of the network and it is set before loading of the network to the plugin. In NHWC, NCHW and NCDHW image data layout representation, the N refers to the number of images in the batch | | Blob | Memory container used for storing inputs, outputs of the network, weights and biases of the layers | -| Device (Affinitity) | A preferred Intel(R) hardware device to run the inference (CPU, GPU, FPGA, etc.) | +| Device (Affinitity) | A preferred Intel(R) hardware device to run the inference (CPU, GPU, etc.) | | Extensibility mechanism, Custom layers | The mechanism that provides you with capabilities to extend the Inference Engine and Model Optimizer so that they can work with topologies containing layers that are not yet supported | | ICNNNetwork | An Interface of the Convolutional Neural Network that Inference Engine reads from IR. Consists of topology, weights and biases | | IExecutableNetwork | An instance of the loaded network which allows the Inference Engine to request (several) infer requests and perform inference synchronously or asynchronously | diff --git a/docs/IE_DG/Graph_debug_capabilities.md b/docs/IE_DG/Graph_debug_capabilities.md deleted file mode 100644 index c3a76be27ae..00000000000 --- a/docs/IE_DG/Graph_debug_capabilities.md +++ /dev/null @@ -1,24 +0,0 @@ -# Graph Debug Capabilities {#openvino_docs_IE_DG_Graph_debug_capabilities} - -Inference Engine supports two different objects for a graph representation: the nGraph function and -CNNNetwork. Both representations provide an API to get detailed information about the graph structure. - -## nGraph Function - -To receive additional messages about applied graph modifications, rebuild the nGraph library with -the `-DNGRAPH_DEBUG_ENABLE=ON` option. - -To visualize the nGraph function to the xDot format or to an image file, use the -`ngraph::pass::VisualizeTree` graph transformation pass: - -@snippet openvino/docs/snippets/Graph_debug_capabilities0.cpp part0 - -## CNNNetwork - -To serialize the CNNNetwork to the Inference Engine Intermediate Representation (IR) format, use the -`CNNNetwork::serialize(...)` method: - -@snippet openvino/docs/snippets/Graph_debug_capabilities1.cpp part1 - -> **NOTE**: CNNNetwork created from the nGraph function might differ from the original nGraph -> function because the Inference Engine applies some graph transformation. diff --git a/docs/IE_DG/Intro_to_Performance.md b/docs/IE_DG/Intro_to_Performance.md index 2987a3628ba..12913c5811c 100644 --- a/docs/IE_DG/Intro_to_Performance.md +++ b/docs/IE_DG/Intro_to_Performance.md @@ -27,7 +27,7 @@ latency penalty. So, for more real-time oriented usages, lower batch sizes (as l Refer to the [Benchmark App](../../inference-engine/samples/benchmark_app/README.md) sample, which allows latency vs. throughput measuring. ## Using Async API -To gain better performance on accelerators, such as VPU or FPGA, the Inference Engine uses the asynchronous approach (see +To gain better performance on accelerators, such as VPU, the Inference Engine uses the asynchronous approach (see [Integrating Inference Engine in Your Application (current API)](Integrate_with_customer_application_new_API.md)). The point is amortizing the costs of data transfers, by pipe-lining, see [Async API explained](@ref omz_demos_object_detection_demo_ssd_async_README). Since the pipe-lining relies on the availability of the parallel slack, running multiple inference requests in parallel is essential. diff --git a/docs/IE_DG/PythonPackage_Overview.md b/docs/IE_DG/PythonPackage_Overview.md index 411f082609f..3a5704a75f5 100644 --- a/docs/IE_DG/PythonPackage_Overview.md +++ b/docs/IE_DG/PythonPackage_Overview.md @@ -12,4 +12,4 @@ The OpenVINO™ Python\* package includes the following sub-packages: - `openvino.tools.benchmark` - Measure latency and throughput. ## See Also -* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) +* [Introduction to Inference Engine](inference_engine_intro.md) diff --git a/docs/IE_DG/Samples_Overview.md b/docs/IE_DG/Samples_Overview.md index 6f4411d47ba..64a9462ef31 100644 --- a/docs/IE_DG/Samples_Overview.md +++ b/docs/IE_DG/Samples_Overview.md @@ -49,9 +49,11 @@ You can download the [pre-trained models](@ref omz_models_intel_index) using the The officially supported Linux* build environment is the following: -* Ubuntu* 16.04 LTS 64-bit or CentOS* 7.4 64-bit -* GCC* 5.4.0 (for Ubuntu* 16.04) or GCC* 4.8.5 (for CentOS* 7.4) -* CMake* version 2.8.12 or higher +* Ubuntu* 18.04 LTS 64-bit or CentOS* 7.6 64-bit +* GCC* 7.5.0 (for Ubuntu* 18.04) or GCC* 4.8.5 (for CentOS* 7.6) +* CMake* version 3.10 or higher + +> **NOTE**: For building samples from the open-source version of OpenVINO™ toolkit, see the [build instructions on GitHub](https://github.com/openvinotoolkit/openvino/blob/master/build-instruction.md). To build the C or C++ sample applications for Linux, go to the `/inference_engine/samples/c` or `/inference_engine/samples/cpp` directory, respectively, and run the `build_samples.sh` script: ```sh @@ -99,7 +101,7 @@ for the debug configuration — in `/intel64/Debug/`. The recommended Windows* build environment is the following: * Microsoft Windows* 10 * Microsoft Visual Studio* 2017, or 2019 -* CMake* version 2.8.12 or higher +* CMake* version 3.10 or higher > **NOTE**: If you want to use Microsoft Visual Studio 2019, you are required to install CMake 3.14. @@ -181,4 +183,4 @@ sample, read the sample documentation by clicking the sample name in the samples list above. ## See Also -* [Introduction to Intel's Deep Learning Inference Engine](Introduction.md) +* [Introduction to Inference Engine](inference_engine_intro.md) diff --git a/docs/IE_DG/Tools_Overview.md b/docs/IE_DG/Tools_Overview.md index 6c543c810d0..6600554785b 100644 --- a/docs/IE_DG/Tools_Overview.md +++ b/docs/IE_DG/Tools_Overview.md @@ -14,4 +14,4 @@ The OpenVINO™ toolkit installation includes the following tools: ## See Also -* [Introduction to Deep Learning Inference Engine](Introduction.md) +* [Introduction to Inference Engine](inference_engine_intro.md) diff --git a/docs/IE_DG/inference_engine_intro.md b/docs/IE_DG/inference_engine_intro.md index cb3b43fcab7..c69166a3411 100644 --- a/docs/IE_DG/inference_engine_intro.md +++ b/docs/IE_DG/inference_engine_intro.md @@ -3,7 +3,7 @@ Introduction to Inference Engine {#openvino_docs_IE_DG_inference_engine_intro} After you have used the Model Optimizer to create an Intermediate Representation (IR), use the Inference Engine to infer the result for a given input data. -Inference Engine is a set of C++ libraries providing a common API to deliver inference solutions on the platform of your choice: CPU, GPU, VPU, or FPGA. Use the Inference Engine API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. While the C++ libraries is the primary implementation, C libraries and Python bindings are also available. +Inference Engine is a set of C++ libraries providing a common API to deliver inference solutions on the platform of your choice: CPU, GPU, or VPU. Use the Inference Engine API to read the Intermediate Representation, set the input and output formats, and execute the model on devices. While the C++ libraries is the primary implementation, C libraries and Python bindings are also available. For Intel® Distribution of OpenVINO™ toolkit, Inference Engine binaries are delivered within release packages. @@ -13,7 +13,7 @@ To learn about how to use the Inference Engine API for your application, see the For complete API Reference, see the [API Reference](usergroup29.html) section. -Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, FPGA, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. +Inference Engine uses a plugin architecture. Inference Engine plugin is a software component that contains complete implementation for inference on a certain Intel® hardware device: CPU, GPU, VPU, etc. Each plugin implements the unified API and provides additional hardware-specific APIs. Modules in the Inference Engine component --------------------------------------- @@ -53,7 +53,6 @@ For each supported target device, Inference Engine provides a plugin — a DLL/s | ------------- | ------------- | |CPU| Intel® Xeon® with Intel® AVX2 and AVX512, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® SSE | |GPU| Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics -|FPGA| Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA, Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) | |MYRIAD| Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X| |GNA| Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor |HETERO|Automatic splitting of a network inference between several devices (for example if a device doesn't support certain layers| @@ -65,7 +64,6 @@ The table below shows the plugin libraries and additional dependencies for Linux |--------|------------------------|-------------------------------------------------|--------------------------|--------------------------------------------------------------------------------------------------------| | CPU | `libMKLDNNPlugin.so` | `libinference_engine_lp_transformations.so` | `MKLDNNPlugin.dll` | `inference_engine_lp_transformations.dll` | | GPU | `libclDNNPlugin.so` | `libinference_engine_lp_transformations.so`, `libOpenCL.so` | `clDNNPlugin.dll` | `OpenCL.dll`, `inference_engine_lp_transformations.dll` | -| FPGA | `libdliaPlugin.so` | `libdla_compiler_core.so`, `libdla_runtime_core.so`, `libcrypto.so`, `libalteracl.so`, `liblpsolve5525.so`, `libprotobuf.so`, `libacl_emulator_kernel_rt.so` | `dliaPlugin.dll` | `dla_compiler_core.dll`, `dla_runtime_core.dll`, `crypto.dll`, `alteracl.dll`, `lpsolve5525.dll`, `protobuf.dll`, `acl_emulator_kernel_rt.dll` | MYRIAD | `libmyriadPlugin.so` | `libusb.so`, `libinference_engine_lp_transformations.so` | `myriadPlugin.dll` | `usb.dll`, `inference_engine_lp_transformations.dll` | | HDDL | `libHDDLPlugin.so` | `libbsl.so`, `libhddlapi.so`, `libmvnc-hddl.so`, `libinference_engine_lp_transformations.so`| `HDDLPlugin.dll` | `bsl.dll`, `hddlapi.dll`, `json-c.dll`, `libcrypto-1_1-x64.dll`, `libssl-1_1-x64.dll`, `mvnc-hddl.dll`, `inference_engine_lp_transformations.dll` | | GNA | `libGNAPlugin.so` | `libgna.so`, `libinference_engine_lp_transformations.so` | `GNAPlugin.dll` | `gna.dll`, `inference_engine_lp_transformations.dll` | diff --git a/docs/IE_DG/nGraph_Flow.md b/docs/IE_DG/nGraph_Flow.md deleted file mode 100644 index 2ebb2ae2d56..00000000000 --- a/docs/IE_DG/nGraph_Flow.md +++ /dev/null @@ -1,134 +0,0 @@ -# Introduction to nGraph Flow in Inference Engine {#openvino_docs_IE_DG_nGraph_Flow} - -## New Run-Time Intermediate Representation (IR): nGraph - -Starting from the OpenVINO™ release 2020.1, the Inference Engine integrates the -nGraph Core. -That implies that the Inference Engine uses a new way to represent a model in run time underneath of -the conventional `CNNNetwork` API, which is an instance of `ngraph::Function`. - -Besides the representation update, nGraph integration resulted in the following changes and new features: - -1. New operations sets. When operations from the nGraph Core were combined with conventional layers -from `CNNNetwork`, there were created a [new sets of operations called `opset1`, `opset2` and etc.](../ops/opset.md), -which covered both interfaces except several not very important cases. -Operations from `opset3` are generated by the Model Optimizer and are accepted in the Inference Engine. - -2. New version approach that attaches a version to each operation rather than to the entire IR file format. -IR is still versioned but has a different meaning. For details, see [Deep Learning Network Intermediate Representation and Operation Sets in OpenVINO™](../MO_DG/IR_and_opsets.md). - -3. Creating models in run-time without loading IR from an xml/binary file. You can enable it by creating -`ngraph::Function` passing it to `CNNNetwork`. - -4. Run-time reshape capability and constant folding are implemented through the nGraph code for more operations compared to previous releases. -As a result, more models can be reshaped. For details, see the [dedicated guide about the reshape capability](ShapeInference.md). - -5. Loading model from ONNX format without converting it to the Inference Engine IR. - -The conventional flow that is not based on nGraph is still available. -The complete picture of co-existence of legacy and new flows is presented below. -The rest of the document describes the coexistence of legacy and new flows showed in the picture below: - -![](img/TopLevelNGraphFlow.png) - - -## Read the Intermediate Representation to `CNNNetwork` - -As the new operation set is introduced, the Model Optimizer generates the IR version 10 using the new operations by default. -Each layer generated in the IR has a semantics matching to the corresponding operation from the nGraph namespaces `opset1`, `opset2` etc. -The IR version 10 automatically triggers the nGraph flow inside the Inference Engine. -When such IR is read in an application, the Inference Engine IR reader produces `CNNNetwork` that encapsulates the `ngraph::Function` instance underneath. -Thus the OpenVINO IR becomes a new serialization format for the nGraph IR, and it can be deserialized reading the `CNNNetwork`. - -> **IMPORTANT**: Conventional interfaces are used (`CNNNetwork`, the reader), so no changes required in most applications. - -> **NOTE**: While you still can use old APIs, there is an independent process of continuous improvements in the Inference Engine API. -> These changes are independent of nGraph integration and do not enable or disable new features. - -Interpretation of the IR version 10 differs from the old IR version. -Besides having a different operations set, the IR version 10 ignores the shapes and data types assigned to the ports in an XML file. -Both shapes and types are reinferred while loading to the Inference Engine using the nGraph shape and type propagation function that is a part of each nGraph operation. - -### Legacy IR Versions - -Starting from the OpenVINO™ release 2021.1 you cannot read IR version 7 and lower in the Inference Engine. - -## Build a Model in the Application - -Alternative method to feed the Inference Engine with a model is to create the model in the run time. -It is achieved by creation of the `ngraph::Function` construction using nGraph operation classes and optionally user-defined operations. -For details, see [Add Custom nGraph Operations](Extensibility_DG/AddingNGraphOps.md) and [examples](nGraphTutorial.md). -At this stage, the code is completely independent of the rest of the Inference Engine code and can be built separately. -After you construct an instance of `ngraph::Function`, you can use it to create `CNNNetwork` by passing it to the new constructor for this class. - -Initializing `CNNNetwork` from the nGraph Function means encapsulating the object and not converting it to a conventional representation. -Going to low-level details, technically it is achieved by using another class for the `CNNNetwork` internals. -The old representation that is used for former versions of IR before version 10 uses `CNNNetworkImpl`. -The new representation that is built around nGraph uses `CNNNetworkNGraphImpl`. - -![](img/NewAndOldCNNNetworkImpl.png) - -## Automatic Conversion to the Old Representation - -The old representation is still required in the cases listed below. -When old representation is required, the conversion from the `ngraph::Function` to the old representation is called automatically. -The following methods lead to the automatic conversion: - -1. Using the old API, which is expected to produce an old representation. Guaranteed to be read-only. Once you call such a method, the original nGraph representation is preserved and continues to be used in the successive calls. - - 1.1. `CNNNetwork::serialize`. Dumps the old representation after automatically called conversion. Cannot be used to dump IR V10. For details, see [Graph Debug Capabilities](Graph_debug_capabilities.md). - -2. Calling `CNNNetwork` methods that modify the model. After that nGraph representation is lost and cannot be used afterwards. - - 1.1. `CNNNetwork::addLayer` - - 1.2. CNNNetwork::setBatchSize. Still implemented through old logic for backward compatibility without using nGraph capabilities. - For details, see [Using Shape Inference](ShapeInference.md). - -3. Using methods that return objects inside an old representation. -Using these methods does not mean modification of the model, but you are not limited by the API to make read-only changes. -These methods should be used in the read-only mode with respect to a model representation. -If the model is changed, for example attribute of some layer is changed or layers are reconnected, the modification is lost whenever any method that uses nGraph is called, including methods inside plugins like CNNNetwork::reshape. -It is hard to predict whether the nGraph function is used in a plugin or other methods of CNNNetworks, so modifying a network using the following methods is *strongly not recommended*. -This is an important limitation that is introduced for the old API calls listed below: - - 1.1. `Data::getInputTo` - - 1.2. `Data::getCreatorLayer` - - 1.3. `CNNNetwork::getLayerByName` - - 1.4. Iterating over `CNNLayer` objects in `CNNNetwork`: `CNNNetwork::begin`, `details::CNNNetworkIterator` class. - -4. Using a conventional plugin that accepts the old representation only. - -Though the conversion is always a one-way process, which means there is no method to convert back, there are important caveats. - -In the cases [1] and [3], both representations are held underneath and you should use the old representation in the read-only mode only from the caller side. -It is hard to track from the Inference Engine side whether the API is used in the read-only mode or for modification of the model. - -That is why when using potentially modifying methods listed in section [3] above, you should not modify the model via those methods. -Use a direct manipulation of the nGraph function instead. - -## Conversion Function - -Inference Engine implements the conversion function that is used when the nGraph function is transformed to the old `CNNNetworkImpl` representation. -This conversion function is hidden and you cannot call it directly from the application. -Nevertheless, it is an important component of the model transformation pipeline in the Inference Engine. -Some issues of models may be caught during the conversion process in this function. -Exceptions are thrown in this function, and you should know what this function does to find a root cause. - -The conversion function performs the following steps: - -1. Convert and decompose some operations as the first step of the nGraph function preparation for optimization. -Reduce operation set to easily optimize it at the next stages. -For example, decomposing of BatchNormInference happens at this stage. - -2. Optimizing transformations that usually happen in the Model Optimizer are called here, because the nGraph function is not always read from an already optimized IR. - -3. Changing operation set from `opsetX` to legacy layer semantics described in the [Legacy Layers Catalog](../MO_DG/prepare_model/convert_model/Legacy_IR_Layers_Catalog_Spec.md). -The model is still represented as the nGraph function at this stage, but the operation set is completely different. - -4. One-to-one conversion of nGraph representation to the corresponding `CNNNetworkImpl` without changing its semantics. -You can see the result of the conversion by calling the `CNNNetwork::serialize` method, which produces legacy IR semantics, which is not nGraph-based even if it is applied to `CNNNetwork` constructed from the nGraph Function. -It may help in debugging, see [Graph Debug Capabilities](Graph_debug_capabilities.md) to view all options for dumping new and old IR representations. diff --git a/docs/IE_DG/supported_plugins/CPU.md b/docs/IE_DG/supported_plugins/CPU.md index df9693a8f37..3d41c6030f1 100644 --- a/docs/IE_DG/supported_plugins/CPU.md +++ b/docs/IE_DG/supported_plugins/CPU.md @@ -14,8 +14,8 @@ OpenVINO™ toolkit is officially supported and validated on the following platf | Host | OS (64-bit) | | :--- | :--- | -| Development | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | -| Target | Ubuntu* 16.04/CentOS* 7.4/MS Windows* 10 | +| Development | Ubuntu* 18.04, CentOS* 7.5, MS Windows* 10 | +| Target | Ubuntu* 18.04, CentOS* 7.5, MS Windows* 10 | The CPU Plugin supports inference on Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® AVX2, Intel Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE). diff --git a/docs/IE_DG/supported_plugins/FPGA.md b/docs/IE_DG/supported_plugins/FPGA.md index ee76253db04..63ae6e62ed7 100644 --- a/docs/IE_DG/supported_plugins/FPGA.md +++ b/docs/IE_DG/supported_plugins/FPGA.md @@ -19,294 +19,4 @@ Intel will be transitioning to the next-generation programmable deep-learning so Intel® Distribution of OpenVINO™ toolkit 2020.3.X LTS release will continue to support Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA and the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For questions about next-generation programmable deep-learning solutions based on FPGAs, please talk to your sales representative or contact us to get the latest FPGA updates. -## Introducing FPGA Plugin - -The FPGA plugin provides an opportunity for high performance scoring of neural networks on Intel® FPGA devices. - -> **NOTE**: Before using the FPGA plugin, ensure that you have installed and configured either the Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. For installation and configuration details, see [FPGA installation](Supported_Devices.md). - -## Heterogeneous Execution - -When your topology contains layers that are not supported by the Intel® FPGA plugin, use [Heterogeneous plugin](HETERO.md) with dedicated fallback device. - -If a network has layers that are not supported in the Intel® FPGA plugin or in a fallback plugin, you can implement a custom layer on the CPU/GPU and use the [Extensibility mechanism](../Extensibility_DG/Intro.md). -In addition to adding custom kernels, you must still point to the CPU plugin or the GPU plugin as fallback devices for heterogeneous plugin. - -## Supported Networks - -The following network topologies are supported in heterogeneous mode, running on FPGA with fallback to CPU or GPU devices. - -> **IMPORTANT**: Use only bitstreams from the current version of the OpenVINO toolkit. Bitstreams from older versions of the OpenVINO toolkit are incompatible with later versions of the OpenVINO toolkit. For example, you cannot use the `1-0-1_A10DK_FP16_Generic` bitstream, when the OpenVINO toolkit supports the `2019R2_PL2_FP16_InceptionV1_SqueezeNet_VGG_YoloV3.aocx` bitstream. - - -| Network | Bitstreams (Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2)) | Bitstreams (Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA) | -|:-------------------------------------|:-------------------------------------------------------------------|:---------------------------------------------------------------------------------------------| -| AlexNet | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | -| GoogleNet v1 | 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic, 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | 2020-4_RC_FP16_AlexNet_GoogleNet_Generic, 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | -| VGG-16 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| VGG-19 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| SqueezeNet v 1.0 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | -| SqueezeNet v 1.1 | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | -| ResNet-18 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| ResNet-50 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| ResNet-101 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| ResNet-152 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| MobileNet (Caffe) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp | -| MobileNet (TensorFlow) | 2020-4_PL2_FP16_MobileNet_Clamp, 2020-4_PL2_FP11_MobileNet_Clamp | 2020-4_RC_FP16_MobileNet_Clamp, 2020-4_RC_FP11_MobileNet_Clamp| -| SqueezeNet-based variant of the SSD* | 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG, 2020-4_PL2_FP11_SqueezeNet | 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | -| ResNet-based variant of SSD | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_ResNet_TinyYolo_VGG | -| RMNet | 2020-4_PL2_FP16_RMNet, 2020-4_PL2_FP11_RMNet | 2020-4_RC_FP16_RMNet, 2020-4_RC_FP11_RMNet | -| Yolo v3 | 2020-4_PL2_FP16_ResNet_YoloV3, 2020-4_PL2_FP11_YoloV3_ELU | 2020-4_RC_FP16_ResNet_YoloV3, 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | - - -In addition to the list above, arbitrary topologies having big continues subgraphs consisting of layers supported by FPGA plugin are recommended to be executed on FPGA plugin. - -## Bitstreams that are Optimal to Use with the Intel's Pre-Trained Models - -The table below provides you with a list of Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) bitstreams that are optimal to use for the Intel's pre-trained models. - -
- Click to expand/collapse the table - -| Model Name | FP11 Bitstreams | FP16 Bitstreams | -| :--- | :--- | :--- | -| action-recognition-0001-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| action-recognition-0001-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| age-gender-recognition-retail-0013 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| asl-recognition-0004 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| driver-action-recognition-adas-0002-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| driver-action-recognition-adas-0002-encoder | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| emotions-recognition-retail-0003 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| face-detection-0100 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| face-detection-0102 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| face-detection-0104 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| face-detection-0105 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| face-detection-0106 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| face-detection-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| face-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| face-detection-retail-0004 | 2020-3_PL2_FP11_TinyYolo_SSD300.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| face-detection-retail-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| face-reidentification-retail-0095 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| facial-landmarks-35-adas-0002 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| faster-rcnn-resnet101-coco-sparse-60-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| gaze-estimation-adas-0002 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| handwritten-japanese-recognition-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| handwritten-score-recognition-0003 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| head-pose-estimation-adas-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| human-pose-estimation-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| icnet-camvid-ava-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| icnet-camvid-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| icnet-camvid-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| image-retrieval-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| instance-segmentation-security-0010 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| instance-segmentation-security-0050 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| instance-segmentation-security-0083 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| instance-segmentation-security-1025 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| landmarks-regression-retail-0009 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| license-plate-recognition-barrier-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| pedestrian-and-vehicle-detector-adas-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| pedestrian-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| pedestrian-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| person-attributes-recognition-crossroad-0230 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-action-recognition-0005 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-action-recognition-0006 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-action-recognition-teacher-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-asl-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| person-detection-raisinghand-recognition-0001 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-retail-0002 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-detection-retail-0013 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-reidentification-retail-0031 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | -| person-reidentification-retail-0248 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-reidentification-retail-0249 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| person-reidentification-retail-0300 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| person-vehicle-bike-detection-crossroad-0078 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_ELU.aocx | -| person-vehicle-bike-detection-crossroad-1016 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| product-detection-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| resnet18-xnor-binary-onnx-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_RMNet.aocx | -| resnet50-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| road-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| semantic-segmentation-adas-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| single-image-super-resolution-1032 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | -| single-image-super-resolution-1033 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | -| text-detection-0003 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| text-detection-0004 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| text-image-super-resolution-0001 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_RMNet.aocx | -| text-recognition-0012 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| text-spotting-0002-detector | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| text-spotting-0002-recognizer-decoder | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| text-spotting-0002-recognizer-encoder | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| unet-camvid-onnx-0001 | 2020-3_PL2_FP11_InceptionV1_ResNet_VGG.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| vehicle-attributes-recognition-barrier-0039 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| vehicle-detection-adas-0002 | 2020-3_PL2_FP11_YoloV3_ELU.aocx | 2020-3_PL2_FP16_SwishExcitation.aocx | -| vehicle-detection-adas-binary-0001 | 2020-3_PL2_FP11_AlexNet_GoogleNet_Generic.aocx | 2020-3_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic.aocx | -| vehicle-license-plate-detection-barrier-0106 | 2020-3_PL2_FP11_MobileNet_Clamp.aocx | 2020-3_PL2_FP16_MobileNet_Clamp.aocx | -| yolo-v2-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| yolo-v2-ava-sparse-35-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| yolo-v2-ava-sparse-70-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_SqueezeNet_TinyYolo_VGG.aocx | -| yolo-v2-tiny-ava-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| yolo-v2-tiny-ava-sparse-30-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | -| yolo-v2-tiny-ava-sparse-60-0001 | 2020-3_PL2_FP11_SqueezeNet.aocx | 2020-3_PL2_FP16_ResNet_YoloV3.aocx | - -
- -## Translate from Architecture to FPGA Bitstream Files - -Various FPGA bitstreams that support CNN are available in the OpenVINO™ toolkit package for FPGA. - -To select the correct bitstream (`.aocx`) file for an architecture, select a network (for example, Resnet-18) from the table above for either the Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 1), Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2) or the Intel® Programmable Acceleration Card (PAC) with Intel® Arria® 10 GX FPGA and note the corresponding architecture. - -The following table describes several parameters that might help you to select the proper bitstream for your needs: - -| Name | Board | Precision | LRN Support | Leaky ReLU Support | PReLU Support | Clamp Support | ELU Support | -|:------------------------------------------|:--------------------------------------------------------------------------------|:----------|:------------|:-------------------|:--------------|:--------------|:------------| -| 2020-4_PL2_FP11_AlexNet_GoogleNet_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | -| 2020-4_PL2_FP11_SqueezeNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | false | -| 2020-4_PL2_FP11_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | true | false | -| 2020-4_PL2_FP11_InceptionV1_ResNet_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | -| 2020-4_PL2_FP11_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | -| 2020-4_PL2_FP11_TinyYolo_SSD300 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | true | true | true | false | false | -| 2020-4_PL2_FP11_YoloV3_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | true | true | false | true | -| 2020-4_PL2_FP11_Streaming_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | -| 2020-4_PL2_FP11_Streaming_Slicing_InternalUseOnly | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | -| 2020-4_PL2_FP11_SwishExcitation | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP11 | false | false | false | false | false | -| 2020-4_PL2_FP16_AlexNet_GoogleNet_SSD300_Generic | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | true | true | true | false | false | -| 2020-4_PL2_FP16_ELU | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | -| 2020-4_PL2_FP16_MobileNet_Clamp | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | true | false | -| 2020-4_PL2_FP16_ResNet_YoloV3 | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | -| 2020-4_PL2_FP16_RMNet | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | true | -| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | true | true | false | false | -| 2020-4_PL2_FP16_SqueezeNet_TinyYolo_VGG | Intel® Vision Accelerator Design with an Intel® Arria® 10 FPGA (Speed Grade 2) | FP16 | false | false | false | false | false | -| 2020-4_RC_FP11_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | true | true | false | false | -| 2020-4_RC_FP11_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | -| 2020-4_RC_FP11_Streaming_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | -| 2020-4_RC_FP11_Streaming_Slicing_InternalUseOnly | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | true | false | false | false | false | -| 2020-4_RC_FP11_ELU | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | true | -| 2020-4_RC_FP11_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | false | false | false | false | -| 2020-4_RC_FP11_InceptionV1_ResNet_SqueezeNet_TinyYolo_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | false | false | -| 2020-4_RC_FP11_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP11 | false | true | true | true | false | -| 2020-4_RC_FP16_AlexNet_GoogleNet_Generic | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | true | true | true | false | false | -| 2020-4_RC_FP16_InceptionV1_SqueezeNet_TinyYolo_VGG | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | -| 2020-4_RC_FP16_RMNet | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | true | -| 2020-4_RC_FP16_SwishExcitation | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | false | false | false | false | -| 2020-4_RC_FP16_MobileNet_Clamp | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | true | false | -| 2020-4_RC_FP16_ResNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | -| 2020-4_RC_FP16_InceptionV1_SqueezeNet_YoloV3 | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | FP16 | false | true | true | false | false | - -## Set Environment for Running the FPGA Plugin - -To make the FPGA plugin run directly or through the heterogeneous plugin, set up the environment: -1. Set up environment to access Intel® FPGA RTE for OpenCL: -``` -source /opt/altera/aocl-pro-rte/aclrte-linux64/init_opencl.sh -``` -2. Set the following environment variable and program the board with a DLA bitstream. Programming of the board is not supported during runtime and must be done before running an application. - - | Variable | Setting | - | :----------------------------------| :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| - | ACL_PCIE_USE_JTAG_PROGRAMMING | Set this variable to a value of 1 to force FPGA reprogramming using JTAG | - -## Analyzing Heterogeneous Execution - -Besides generation of .dot files, you can use the error listening mechanism: - -```cpp -class FPGA_ErrorListener : public InferenceEngine::IErrorListener -{ -public: - virtual void onError(const char *msg) noexcept override { - std::cout << msg; - } -}; -... -FPGA_ErrorListener err_listener; -core.SetLogCallback(err_listener); // will be used for FPGA device as well -``` -If during network loading some layers are decided to be executed on a fallback plugin, the following message is printed: - -```cpp -Layer (Name: detection_out, Type: DetectionOutput) is not supported: - custom or unknown. - Has (3) sets of inputs, must be 1, or 2. - Input dimensions (2) should be 4. -``` - -## Multiple FPGA Devices Support - -The Inference Engine FPGA plugin provides an ability to load different networks on multiple FPGA devices. For example, to load two networks AlexNet and MobileNet v2 on two different FPGA devices, follow the steps below: - -1. Program each FGPA device with a corresponding bitstream: -```bash -aocl program acl0 2019R3_PV_PL1_FP16_AlexNet_GoogleNet_InceptionV1_SSD300_Generic.aocx -``` -```bash -aocl program acl1 2019R3_PV_PL1_FP16_MobileNet_Clamp.aocx -``` -For more information about bitstream programming instructions, refer to [Installation Guide for Linux* with Support for FPGA](Supported_Devices.md) -2. All FPGA devices are enumerated with unique ID starting from `0`. By default, all networks are loaded to the default -device with ID `0`. If you want to load a network on a particular non-default device, specify the `KEY_DEVICE_ID` -parameter for C++ and `DEVICE_ID` parameter for Python\*. -The following code snippets demonstrates how to load the AlexNet network on the FPGA device with ID `0` and the -MobileNet v2 network on the device with ID `1`: - * With C++: -```cpp -InferenceEngine::Core core; - -// Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet -auto alexnetNetwork = core.ReadNetwork("alexnet.xml"); -auto exeNetwork1 = core.LoadNetwork(alexnetNetwork, "FPGA.0"); - -// Load MobileNet network on the second FPGA device programmed with MobileNet bitstream -auto mobilenetNetwork = core.ReadNetwork("mobilenet_v2.xml"); -auto exeNetwork2 = core.LoadNetwork(mobilenetNetwork, "FPGA", { { KEY_DEVICE_ID, "1" } }); -``` - * With Python: -```python -# Load AlexNet network on the first FPGA device programmed with bitstream supporting AlexNet -net1 = IENetwork(model="alexnet.xml", weights="alexnet.bin") -plugin.load(network=net1, config={"DEVICE_ID": "0"}) - -# Load MobileNet network on the second FPGA device programmed with MobileNet bitstream -net2 = IENetwork(model="mobilenet_v2.xml", weights="mobilenet_v2.bin") -plugin.load(network=net2, config={"DEVICE_ID": "1"}) -``` -Note that you have to use asynchronous infer requests to utilize several FPGA devices, otherwise the execution on devices is performed sequentially. - -## Import and Export Network Flow - -Since the 2019 R4 release, FPGA and HETERO plugins support the export and import flow, which allows to export a compiled network from a plugin to a binary blob by running the command below: - -```bash -$ ./compile_tool -m resnet.xml -DLA_ARCH_NAME 4x2x16x32_fp16_sb9408_fcd1024_actk4_poolk4_normk1_owk2_image300x300x8192_mbfr -d HETERO:FPGA,CPU -Inference Engine: - API version ............ 2.1 - Build .................. 6db44e09a795cb277a63275ea1395bfcb88e46ac - Description ....... API -Done -``` - -Once the command is executed, the binary blob named `resnet.blob` is created at the working directory. Refer to the [Compile tool](../../../inference-engine/tools/compile_tool/README.md) documentation for more details. - -A compiled binary blob can be later imported via `InferenceEngine::Core::Import`: - -```cpp -InferenceEngine::Core core; -std::ifstream strm("resnet.blob"); -auto execNetwork = core.Import(strm); -``` - -## How to Interpret Performance Counters - -As a result of collecting performance counters using InferenceEngine::InferRequest::GetPerformanceCounts you can find out performance data about execution on FPGA, pre-processing and post-processing data and data transferring from/to FPGA card. - -If network is sliced to two parts that are executed on CPU, you can find performance data about Intel® MKL-DNN kernels, their types, and other useful information. - -## Limitations of the FPGA Support for CNN - -The Inference Engine FPGA plugin has limitations on network topologies, kernel parameters, and batch size. - -* Depending on the bitstream loaded on the target device, the FPGA performs calculations with precision rates ranging from FP11 to FP16. This might have accuracy implications. Use the [Accuracy Checker](@ref omz_tools_accuracy_checker_README) to verify the network accuracy on the validation data set. -* Networks that have many CNN layers that are not supported on FPGA stayed in topologies between supported layers might lead to dividing of graph to many subgraphs that might lead to `CL_OUT_OF_HOST_MEMORY` error. These topologies are not FPGA friendly for this release. -* When you use the heterogeneous plugin, the affinity and distribution of nodes by devices depends on the FPGA bitstream that you use. Some layers might not be supported by a bitstream or parameters of the layer are not supported by the bitstream. - -## See Also -* [Supported Devices](Supported_Devices.md) +For documentation for the FPGA plugin available in previous releases of Intel® Distribution of OpenVINO™ toolkit with FPGA Support, see documentation for the [2020.4 version](https://docs.openvinotoolkit.org/2020.4/openvino_docs_IE_DG_supported_plugins_FPGA.html) and lower. \ No newline at end of file diff --git a/docs/IE_DG/supported_plugins/MYRIAD.md b/docs/IE_DG/supported_plugins/MYRIAD.md index 5fbee431ee1..3b1c3ec018c 100644 --- a/docs/IE_DG/supported_plugins/MYRIAD.md +++ b/docs/IE_DG/supported_plugins/MYRIAD.md @@ -6,11 +6,12 @@ The Inference Engine MYRIAD plugin is developed for inference of neural networks ## Installation on Linux* OS -For installation instructions, refer to the [Installation Guide for Linux*](../../../inference-engine/samples/benchmark_app/README.md). +For installation instructions, refer to the [Installation Guide for Linux*](../../install_guides/installing-openvino-linux.md). + ## Installation on Windows* OS -For installation instructions, refer to the [Installation Guide for Windows*](../../../inference-engine/samples/benchmark_app/README.md). +For installation instructions, refer to the [Installation Guide for Windows*](../../install_guides/installing-openvino-windows.md). ## Supported networks diff --git a/docs/IE_DG/supported_plugins/Supported_Devices.md b/docs/IE_DG/supported_plugins/Supported_Devices.md index cc0ba79491b..27a78279ba4 100644 --- a/docs/IE_DG/supported_plugins/Supported_Devices.md +++ b/docs/IE_DG/supported_plugins/Supported_Devices.md @@ -11,7 +11,6 @@ The Inference Engine provides unique capabilities to infer deep learning models |------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------| |[GPU plugin](CL_DNN.md) |Intel® Processor Graphics, including Intel® HD Graphics and Intel® Iris® Graphics | |[CPU plugin](CPU.md) |Intel® Xeon® with Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and AVX512_BF16, Intel® Core™ Processors with Intel® AVX2, Intel® Atom® Processors with Intel® Streaming SIMD Extensions (Intel® SSE) | -|[FPGA plugin](FPGA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Vision Accelerator Design with an Intel® Arria 10 FPGA (Speed Grade 2), Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA | |[VPU plugins](VPU.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X, Intel® Vision Accelerator Design with Intel® Movidius™ VPUs | |[GNA plugin](GNA.md) (available in the Intel® Distribution of OpenVINO™ toolkit) |Intel® Speech Enabling Developer Kit, Amazon Alexa* Premium Far-Field Developer Kit, Intel® Pentium® Silver J5005 Processor, Intel® Pentium® Silver N5000 Processor, Intel® Celeron® J4005 Processor, Intel® Celeron® J4105 Processor, Intel® Celeron® Processor N4100, Intel® Celeron® Processor N4000, Intel® Core™ i3-8121U Processor, Intel® Core™ i7-1065G7 Processor, Intel® Core™ i7-1060G7 Processor, Intel® Core™ i5-1035G4 Processor, Intel® Core™ i5-1035G7 Processor, Intel® Core™ i5-1035G1 Processor, Intel® Core™ i5-1030G7 Processor, Intel® Core™ i5-1030G4 Processor, Intel® Core™ i3-1005G1 Processor, Intel® Core™ i3-1000G1 Processor, Intel® Core™ i3-1000G4 Processor| |[Multi-Device plugin](MULTI.md) |Multi-Device plugin enables simultaneous inference of the same network on several Intel® devices in parallel | @@ -53,7 +52,6 @@ For example, the CHW value at index (c,h,w) is physically located at index (c\*H |:-------------|:----------------------:|:----------------------:|:----------------------:| |CPU plugin |Supported and preferred |Supported |Supported | |GPU plugin |Supported |Supported and preferred |Supported\* | -|FPGA plugin |Supported |Supported |Not supported | |VPU plugins |Not supported |Supported |Not supported | |GNA plugin |Supported |Supported |Not supported |
\* - currently, only limited set of topologies might benefit from enabling I8 model on GPU
@@ -66,7 +64,6 @@ the supported models formats depends on the actual underlying devices. _Generall |:-------------|:--------:|:-------------:|:-------------:|:-------------:|:------------:|:-------------:| |CPU plugin |Supported |Not supported |Supported |Supported |Not supported |Supported | |GPU plugin |Supported |Supported\* |Supported\* |Supported\* |Not supported |Supported\* | -|FPGA plugin |Supported |Supported\* |Supported |Supported |Not supported |Supported | |VPU plugins |Supported |Supported |Supported |Not supported |Not supported |Not supported | |GNA plugin |Supported |Not supported |Supported |Not supported |Supported |Supported | @@ -80,7 +77,6 @@ the supported input precision depends on the actual underlying devices. _Genera |:-------------|:--------:|:------------:| |CPU plugin |Supported |Not supported | |GPU plugin |Supported |Supported | -|FPGA plugin |Supported |Supported | |VPU plugins |Supported |Supported | |GNA plugin |Supported |Not supported | For [Multi-Device](MULTI.md) and [Heterogeneous](HETERO.md) execution @@ -92,7 +88,6 @@ the supported output precision depends on the actual underlying devices. _Gener |:-------------|:------------:|:------------:|:------------:|:------------:| |CPU plugin |Supported |Supported |Supported |Supported | |GPU plugin |Supported |Supported |Supported |Supported | -|FPGA plugin |Not supported |Supported |Supported |Not supported | |VPU plugins |Not supported |Supported |Supported |Supported | |GNA plugin |Not supported |Supported |Supported |Supported | @@ -109,152 +104,152 @@ For setting relevant configuration, refer to the ### Supported Layers The following layers are supported by the plugins and by [Shape Inference feature](../ShapeInference.md): -| Layers | GPU | CPU | VPU | GNA | FPGA | ShapeInfer | -|:-------------------------------|:-------------:|:-------------:|:-------------:|:-------------:|:---------------:|:-------------:| -| Abs | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Acos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Acosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Activation-Clamp | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Activation-ELU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | -| Activation-Exp | Supported |Supported\*\*\*| Not Supported | Supported | Not Supported | Supported | -| Activation-Leaky ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Activation-Not | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | -| Activation-PReLU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | Supported | -| Activation-ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Activation-ReLU6 | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Activation-Sigmoid/Logistic | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | -| Activation-TanH | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | -| ArgMax | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Asin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Asinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Atan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Atanh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| BatchNormalization | Supported | Supported | Supported | Not Supported | Supported\* | Supported | -| BinaryConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | -| Broadcast | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Ceil | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Concat | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Const | Supported | Supported | Supported | Supported | Not Supported | Not Supported | -| Convolution-Dilated | Supported | Supported | Supported | Not Supported | Supported | Supported | -| Convolution-Dilated 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| Convolution-Grouped | Supported | Supported | Supported | Not Supported | Supported | Supported | -| Convolution-Grouped 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| Convolution-Ordinary | Supported | Supported | Supported | Supported\* | Supported | Supported | -| Convolution-Ordinary 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| Cos | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Cosh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Crop | Supported | Supported | Supported | Supported | Not Supported | Supported | -| CTCGreedyDecoder | Supported\*\* | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | -| Deconvolution | Supported | Supported | Supported | Not Supported | Supported\* | Supported | -| Deconvolution 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| DeformableConvolution | Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | -| DepthToSpace | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| DetectionOutput | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | -| Eltwise-And | Supported |Supported\*\*\*| Not Supported | Not Supported | Not Supported | Supported | -| Eltwise-Add | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | Supported | -| Eltwise-Div | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Equal | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-FloorMod | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Greater | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-GreaterEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Less | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-LessEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-LogicalAnd | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-LogicalOr | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-LogicalXor | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Max | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Min | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Mul | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | -| Eltwise-NotEqual | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Pow | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Prod | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported | -| Eltwise-SquaredDiff | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Eltwise-Sub | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Eltwise-Sum | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Erf | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Exp | Supported | Supported | Not Supported | Supported | Not Supported | Supported | -| FakeQuantize | Not Supported | Supported | Not Supported | Not Supported | Not Supported | Supported | -| Fill | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Flatten | Supported | Supported | Supported | Not Supported | Not Supported | Supported | -| Floor | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| FullyConnected (Inner Product) | Supported |Supported\*\*\*| Supported | Supported | Supported | Supported | -| Gather | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| GatherTree | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Gemm | Supported | Supported | Supported | Not Supported | Not Supported | Supported | -| GRN | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| HardSigmoid | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Interp | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported\* | -| Log | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | -| LRN (Norm) | Supported | Supported | Supported | Not Supported | Supported | Supported | -| LSTMCell | Supported | Supported | Supported | Supported | Not Supported | Not Supported | -| GRUCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| RNNCell | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| LSTMSequence | Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | -| GRUSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| RNNSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| LogSoftmax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | -| Memory | Not Supported | Supported | Not Supported | Supported | Not Supported | Supported | -| MVN | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | -| Neg | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| NonMaxSuppression | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Normalize | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | -| OneHot | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Pad | Supported | Supported\*\* | Supported\* | Not Supported | Not Supported | Supported | -| Permute | Supported | Supported | Supported | Supported\* | Not Supported | Supported | -| Pooling(AVG,MAX) | Supported | Supported | Supported | Supported | Supported | Supported | -| Pooling(AVG,MAX) 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| Power | Supported | Supported\*\* | Supported | Supported\* | Supported\* | Supported | -| PowerFile | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Not Supported | -| PriorBox | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| PriorBoxClustered | Supported\*\* | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Proposal | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| PSROIPooling | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Range | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Reciprocal | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceAnd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceL1 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceL2 | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceLogSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceLogSumExp | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceMax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceMean | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceMin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceOr | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceProd | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceSum | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ReduceSumSquare | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| RegionYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| ReorgYolo | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Resample | Supported | Supported\*\* | Supported | Not Supported | Supported\* | Supported | -| Reshape | Supported |Supported\*\*\*| Supported | Supported | Not Supported | Supported\* | -| ReverseSequence | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| RNN | Not Supported | Supported | Supported | Not Supported | Not Supported | Not Supported | -| ROIPooling | Supported\* | Supported | Supported | Not Supported | Not Supported | Supported | -| ScaleShift | Supported |Supported\*\*\*| Supported\* | Supported | Supported | Supported | -| ScatterUpdate | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Select | Supported | Supported | Supported | Not Supported | Not Supported | Supported | -| Selu | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| ShuffleChannels | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Sign | Supported | Supported\*\* | Supported | Not Supported | Not Supported | Supported | -| Sin | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Sinh | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| SimplerNMS | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Slice | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | -| SoftMax | Supported |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| Softplus | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Softsign | Supported | Supported\*\* | Not Supported | Supported | Not Supported | Supported | -| SpaceToDepth | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| SpatialTransformer | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Split | Supported |Supported\*\*\*| Supported | Supported | Supported\* | Supported | -| Squeeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | -| StridedSlice | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Tan | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| TensorIterator | Not Supported | Supported | Supported | Supported | Not Supported | Not Supported | -| Tile | Supported\*\* |Supported\*\*\*| Supported | Not Supported | Not Supported | Supported | -| TopK | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | Supported | -| Unpooling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | -| Unsqueeze | Supported | Supported\*\* | Supported | Supported | Not Supported | Supported | -| Upsampling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Layers | GPU | CPU | VPU | GNA | ShapeInfer | +|:-------------------------------|:-------------:|:-------------:|:-------------:|:-------------:|:-------------:| +| Abs | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Acos | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Acosh | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Activation-Clamp | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Activation-ELU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Activation-Exp | Supported |Supported\*\*\*| Not Supported | Supported | Supported | +| Activation-Leaky ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Activation-Not | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | +| Activation-PReLU | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Activation-ReLU | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Activation-ReLU6 | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Activation-Sigmoid/Logistic | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Activation-TanH | Supported |Supported\*\*\*| Supported | Supported | Supported | +| ArgMax | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Asin | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Asinh | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Atan | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Atanh | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| BatchNormalization | Supported | Supported | Supported | Not Supported | Supported | +| BinaryConvolution | Supported | Supported | Not Supported | Not Supported | Supported | +| Broadcast | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Ceil | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Concat | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Const | Supported | Supported | Supported | Supported | Not Supported | +| Convolution-Dilated | Supported | Supported | Supported | Not Supported | Supported | +| Convolution-Dilated 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Grouped | Supported | Supported | Supported | Not Supported | Supported | +| Convolution-Grouped 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | +| Convolution-Ordinary | Supported | Supported | Supported | Supported\* | Supported | +| Convolution-Ordinary 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | +| Cos | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Cosh | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Crop | Supported | Supported | Supported | Supported | Supported | +| CTCGreedyDecoder | Supported\*\* | Supported\*\* | Supported\* | Not Supported | Supported | +| Deconvolution | Supported | Supported | Supported | Not Supported | Supported | +| Deconvolution 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | +| DeformableConvolution | Supported | Supported | Not Supported | Not Supported | Supported | +| DepthToSpace | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| DetectionOutput | Supported | Supported\*\* | Supported\* | Not Supported | Supported | +| Eltwise-And | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | +| Eltwise-Add | Supported |Supported\*\*\*| Not Supported | Not Supported | Supported | +| Eltwise-Div | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Equal | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-FloorMod | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Greater | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-GreaterEqual | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Less | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-LessEqual | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-LogicalAnd | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-LogicalOr | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-LogicalXor | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Max | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Min | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Mul | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Eltwise-NotEqual | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Pow | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Prod | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Eltwise-SquaredDiff | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Eltwise-Sub | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Eltwise-Sum | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Erf | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Exp | Supported | Supported | Not Supported | Supported | Supported | +| FakeQuantize | Not Supported | Supported | Not Supported | Not Supported | Supported | +| Fill | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Flatten | Supported | Supported | Supported | Not Supported | Supported | +| Floor | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| FullyConnected (Inner Product) | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Gather | Supported | Supported\*\* | Supported | Not Supported | Supported | +| GatherTree | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Gemm | Supported | Supported | Supported | Not Supported | Supported | +| GRN | Supported\*\* | Supported\*\* | Supported | Not Supported | Supported | +| HardSigmoid | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Interp | Supported\*\* | Supported\*\* | Supported | Not Supported | Supported\* | +| Log | Supported | Supported\*\* | Supported | Supported | Supported | +| LRN (Norm) | Supported | Supported | Supported | Not Supported | Supported | +| LSTMCell | Supported | Supported | Supported | Supported | Not Supported | +| GRUCell | Supported | Supported | Not Supported | Not Supported | Not Supported | +| RNNCell | Supported | Supported | Not Supported | Not Supported | Not Supported | +| LSTMSequence | Supported | Supported | Supported | Not Supported | Not Supported | +| GRUSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | +| RNNSequence | Supported | Supported | Not Supported | Not Supported | Not Supported | +| LogSoftmax | Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | +| Memory | Not Supported | Supported | Not Supported | Supported | Supported | +| MVN | Supported | Supported\*\* | Supported\* | Not Supported | Supported | +| Neg | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| NonMaxSuppression | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Normalize | Supported | Supported\*\* | Supported\* | Not Supported | Supported | +| OneHot | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Pad | Supported | Supported\*\* | Supported\* | Not Supported | Supported | +| Permute | Supported | Supported | Supported | Supported\* | Supported | +| Pooling(AVG,MAX) | Supported | Supported | Supported | Supported | Supported | +| Pooling(AVG,MAX) 3D | Supported | Supported | Not Supported | Not Supported | Not Supported | +| Power | Supported | Supported\*\* | Supported | Supported\* | Supported | +| PowerFile | Not Supported | Supported\*\* | Not Supported | Not Supported | Not Supported | +| PriorBox | Supported | Supported\*\* | Supported | Not Supported | Supported | +| PriorBoxClustered | Supported\*\* | Supported\*\* | Supported | Not Supported | Supported | +| Proposal | Supported | Supported\*\* | Supported | Not Supported | Supported | +| PSROIPooling | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Range | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Reciprocal | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceAnd | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceL1 | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceL2 | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceLogSum | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceLogSumExp | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceMax | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceMean | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceMin | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceOr | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceProd | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceSum | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ReduceSumSquare | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| RegionYolo | Supported | Supported\*\* | Supported | Not Supported | Supported | +| ReorgYolo | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Resample | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Reshape | Supported |Supported\*\*\*| Supported | Supported | Supported\* | +| ReverseSequence | Supported | Supported\*\* | Supported | Not Supported | Supported | +| RNN | Not Supported | Supported | Supported | Not Supported | Not Supported | +| ROIPooling | Supported\* | Supported | Supported | Not Supported | Supported | +| ScaleShift | Supported |Supported\*\*\*| Supported\* | Supported | Supported | +| ScatterUpdate | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Select | Supported | Supported | Supported | Not Supported | Supported | +| Selu | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| ShuffleChannels | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Sign | Supported | Supported\*\* | Supported | Not Supported | Supported | +| Sin | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Sinh | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| SimplerNMS | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Slice | Supported |Supported\*\*\*| Supported | Supported | Supported | +| SoftMax | Supported |Supported\*\*\*| Supported | Not Supported | Supported | +| Softplus | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Softsign | Supported | Supported\*\* | Not Supported | Supported | Supported | +| SpaceToDepth | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| SpatialTransformer | Not Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Split | Supported |Supported\*\*\*| Supported | Supported | Supported | +| Squeeze | Supported | Supported\*\* | Supported | Supported | Supported | +| StridedSlice | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Tan | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| TensorIterator | Not Supported | Supported | Supported | Supported | Not Supported | +| Tile | Supported\*\* |Supported\*\*\*| Supported | Not Supported | Supported | +| TopK | Supported | Supported\*\* | Not Supported | Not Supported | Supported | +| Unpooling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | +| Unsqueeze | Supported | Supported\*\* | Supported | Supported | Supported | +| Upsampling | Supported | Not Supported | Not Supported | Not Supported | Not Supported | \*- support is limited to the specific parameters. Refer to "Known Layers Limitation" section for the device [from the list of supported](Supported_Devices.md). diff --git a/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md b/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md index b121254303e..1df9b7a97e5 100644 --- a/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md +++ b/docs/Inference_Engine_Development_Procedure/CONTRIBUTING.md @@ -1,7 +1,7 @@ # Inference Engine development configuration document {#openvino_docs_Inference_Engine_Development_Procedure_CONTRIBUTING} To create MakeFiles use following process or run build-after-clone.sh script located in the root -folder if you use Ubuntu 16.04. +folder if you use Ubuntu 18.04. To create Visual Studio project run create_vs_proj_x64.cmd from scripts folder. ## Setting up the environment for development diff --git a/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md b/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md index 2be7f8dcb47..f9638ee4cd9 100644 --- a/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md +++ b/docs/Inference_Engine_Development_Procedure/IE_Dev_Procedure.md @@ -30,7 +30,6 @@ * [IE TESTS] * [IE DOCS] * [IE MKLDNN] - * [IE FPGA] * [IE GNA] * [IE CLDNN] * [IE MYRIAD] diff --git a/docs/Legal_Information.md b/docs/Legal_Information.md index 4bcb046a890..00c6cd96835 100644 --- a/docs/Legal_Information.md +++ b/docs/Legal_Information.md @@ -15,3 +15,10 @@ Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others. + +## OpenVINO™ Logo +To build equity around the project, the OpenVINO logo was created for both Intel and community usage. The logo may only be used to represent the OpenVINO toolkit and offerings built using the OpenVINO toolkit. + +## Logo Usage Guidelines +The OpenVINO logo must be used in connection with truthful, non-misleading references to the OpenVINO toolkit, and for no other purpose. +Modification of the logo or use of any separate element(s) of the logo alone is not allowed. diff --git a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md index 144e8dfc917..0cdd936f189 100644 --- a/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md +++ b/docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md @@ -15,31 +15,54 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi ## What's New in the Model Optimizer in this Release? * Common changes: - * Implemented generation of a compressed OpenVINO IR suitable for INT8 inference, which takes up to 4 times less disk space than an expanded one. Use the `--disable_weights_compression` Model Optimizer command-line parameter to get an expanded version. - * Implemented an optimization transformation to replace a sub-graph with the `Erf` operation into the `GeLU` operation. - * Implemented an optimization transformation to replace an upsamping pattern that is represented as a sequence of `Split` and `Concat` operations to a single `Interpolate` operation. - * Fixed a number of Model Optimizer bugs to generate reshape-able IRs of many models with the command line parameter `--keep_shape_ops`. - * Fixed a number of Model Optimizer transformations to set operations name in an IR equal to the original framework model operation name. - * The following operations are no longer generated with `version="opset1"`: `MVN`, `ROIPooling`, `ReorgYolo`. They became a part of new `opset2` operation set and generated with `version="opset2"`. Before this fix, the operations were generated with `version="opset1"` by mistake, they were not a part of `opset1` nGraph namespace; `opset1` specification was fixed accordingly. - + * Implemented several optimization transformations to replace sub-graphs of operations with HSwish, Mish, Swish and SoftPlus operations. + * Model Optimizer generates IR keeping shape-calculating sub-graphs **by default**. Previously, this behavior was triggered if the "--keep_shape_ops" command line parameter was provided. The key is ignored in this release and will be deleted in the next release. To trigger the legacy behavior to generate an IR for a fixed input shape (folding ShapeOf operations and shape-calculating sub-graphs to Constant), use the "--static_shape" command line parameter. Changing model input shape using the Inference Engine API in runtime may fail for such an IR. + * Fixed Model Optimizer conversion issues resulted in non-reshapeable IR using the Inference Engine reshape API. + * Enabled transformations to fix non-reshapeable patterns in the original networks: + * Hardcoded Reshape + * In Reshape(2D)->MatMul pattern + * Reshape->Transpose->Reshape when the pattern can be fused to the ShuffleChannels or DepthToSpace operation + * Hardcoded Interpolate + * In Interpolate->Concat pattern + * Added a dedicated requirements file for TensorFlow 2.X as well as the dedicated install prerequisites scripts. + * Replaced the SparseToDense operation with ScatterNDUpdate-4. * ONNX*: - * Added support for the following operations: `MeanVarianceNormalization` if normalization is performed over spatial dimensions. - + * Enabled an ability to specify the model output **tensor** name using the "--output" command line parameter. + * Added support for the following operations: + * Acosh + * Asinh + * Atanh + * DepthToSpace-11, 13 + * DequantizeLinear-10 (zero_point must be constant) + * HardSigmoid-1,6 + * QuantizeLinear-10 (zero_point must be constant) + * ReduceL1-11, 13 + * ReduceL2-11, 13 + * Resize-11, 13 (except mode="nearest" with 5D+ input, mode="tf_crop_and_resize", and attributes exclude_outside and extrapolation_value with non-zero values) + * ScatterND-11, 13 + * SpaceToDepth-11, 13 * TensorFlow*: - * Added support for the TensorFlow Object Detection models version 1.15.X. - * Added support for the following operations: `BatchToSpaceND`, `SpaceToBatchND`, `Floor`. - + * Added support for the following operations: + * Acosh + * Asinh + * Atanh + * CTCLoss + * EuclideanNorm + * ExtractImagePatches + * FloorDiv * MXNet*: * Added support for the following operations: - * `Reshape` with input shape values equal to -2, -3, and -4. + * Acosh + * Asinh + * Atanh +* Kaldi*: + * Fixed bug with ParallelComponent support. Now it is fully supported with no restrictions. > **NOTE:** > [Intel® System Studio](https://software.intel.com/en-us/system-studio) is an all-in-one, cross-platform tool suite, purpose-built to simplify system bring-up and improve system and IoT device application performance on Intel® platforms. If you are using the Intel® Distribution of OpenVINO™ with Intel® System Studio, go to [Get Started with Intel® System Studio](https://software.intel.com/en-us/articles/get-started-with-openvino-and-intel-system-studio-2019). ## Table of Content -* [Introduction to OpenVINO™ Deep Learning Deployment Toolkit](../IE_DG/Introduction.md) - * [Preparing and Optimizing your Trained Model with Model Optimizer](prepare_model/Prepare_Trained_Model.md) * [Configuring Model Optimizer](prepare_model/Config_Model_Optimizer.md) * [Converting a Model to Intermediate Representation (IR)](prepare_model/convert_model/Converting_Model.md) @@ -82,4 +105,4 @@ Model Optimizer produces an Intermediate Representation (IR) of the network, whi * [Known Issues](Known_Issues_Limitations.md) -**Typical Next Step:** [Introduction to Intel® Deep Learning Deployment Toolkit](../IE_DG/Introduction.md) +**Typical Next Step:** [Preparing and Optimizing your Trained Model with Model Optimizer](prepare_model/Prepare_Trained_Model.md) diff --git a/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md b/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md index 4dc93936126..f04d413bd1a 100644 --- a/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md +++ b/docs/MO_DG/prepare_model/Model_Optimizer_FAQ.md @@ -615,3 +615,16 @@ You need to specify values for each input of the model. For more information, re #### 102. What does the message "Operation _contrib_box_nms is not supported ..." mean? It means that you trying to convert the topology which contains '_contrib_box_nms' operation which is not supported directly. However the sub-graph of operations including the '_contrib_box_nms' could be replaced with DetectionOutput layer if your topology is one of the gluoncv topologies. Specify '--enable_ssd_gluoncv' command line parameter for the Model Optimizer to enable this transformation. + +\htmlonly + + + +\endhtmlonly \ No newline at end of file diff --git a/docs/benchmarks/performance_benchmarks.md b/docs/benchmarks/performance_benchmarks.md index 11b6dede2c5..9f172d82d99 100644 --- a/docs/benchmarks/performance_benchmarks.md +++ b/docs/benchmarks/performance_benchmarks.md @@ -2,117 +2,96 @@ ## Increase Performance for Deep Learning Inference -The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) helps accelerate deep learning inference across a variety of Intel® processors and accelerators. Rather than a one-size-fits-all solution, Intel offers a powerful portfolio of scalable hardware and software solutions, powered by the Intel® Distribution of OpenVINO™ toolkit, to meet the various performance, power, and price requirements of any use case. The benchmarks below demonstrate high performance gains on several public neural networks for a streamlined, quick deployment on **Intel® CPU, VPU and FPGA** platforms. Use this data to help you decide which hardware is best for your applications and solutions, or to plan your AI workload on the Intel computing already included in your solutions. +The [Intel® Distribution of OpenVINO™ toolkit](https://software.intel.com/en-us/openvino-toolkit) helps accelerate deep learning inference across a variety of Intel® processors and accelerators. Rather than a one-size-fits-all solution, Intel offers a powerful portfolio of scalable hardware and software solutions, powered by the Intel® Distribution of OpenVINO™ toolkit, to meet the various performance, power, and price requirements of any use case. The benchmarks below demonstrate high performance gains on several public neural networks for a streamlined, quick deployment on **Intel® CPU and VPU** platforms. Use this data to help you decide which hardware is best for your applications and solutions, or to plan your AI workload on the Intel computing already included in your solutions. Measuring inference performance involves many variables and is extremely use-case and application dependent. We use the below four parameters for measurements, which are key elements to consider for a successful deep learning inference application: -1. **Throughput** - Measures the number of inferences delivered within a latency threshold. (for example, number of frames per second). When deploying a system with deep learning inference, select the throughput that delivers the best trade-off between latency and power for the price and performance that meets your requirements. +1. **Throughput** - Measures the number of inferences delivered within a latency threshold. (for example, number of Frames Per Second - FPS). When deploying a system with deep learning inference, select the throughput that delivers the best trade-off between latency and power for the price and performance that meets your requirements. 2. **Value** - While throughput is important, what is more critical in edge AI deployments is the performance efficiency or performance-per-cost. Application performance in throughput per dollar of system cost is the best measure of value. 3. **Efficiency** - System power is a key consideration from the edge to the data center. When selecting deep learning solutions, power efficiency (throughput/watt) is a critical factor to consider. Intel designs provide excellent power efficiency for running deep learning workloads. -4. **Total Benefit** (Most applicable for Intel® VPU Platforms) - Combining the factors of value and efficiency can be a good way to compare which hardware yields the best performance per watt and per dollar for your particular use case. +4. **Latency** - This measures the synchronous execution of inference requests and is reported in milliseconds. Each inference request (for example: preprocess, infer, postprocess) is allowed to complete before the next is started. This performance metric is relevant in usage scenarios where a single image input needs to be acted upon as soon as possible. An example would be the healthcare sector where medical personnel only request analysis of a single ultra sound scanning image or in real-time or near real-time applications for example an industrial robot's response to actions in its environment or obstacle avoidance for autonomous vehicles. ---- +\htmlonly + + + + + + + + + +\endhtmlonly -## Intel® Xeon® E-2124G -![](img/throughput_xeon_e212g.png) -![](img/value_xeon_e212g.png) -![](img/eff_xeon_e212g.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Xeon® Silver 4216R +\htmlonly + +\endhtmlonly -![](img/throughput_xeon_silver.png) -![](img/value_xeon_silver.png) -![](img/eff_xeon_silver.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Xeon® Gold 5218T +\htmlonly + +\endhtmlonly -![](img/throughput_xeon_gold.png) -![](img/value_xeon_gold.png) -![](img/eff_xeon_gold.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Xeon® Platinum 8270 +\htmlonly + +\endhtmlonly -![](img/throughput_xeon_platinum.png) -![](img/value_xeon_platinum.png) -![](img/eff_xeon_platinum.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Atom™ x5-E3940 -![](img/throughput_atom.png) -![](img/value_atom.png) -![](img/eff_atom.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Core™ i3-8100 -![](img/throughput_i3.png) -![](img/value_i3.png) -![](img/eff_i3.png) +\htmlonly + +\endhtmlonly ---- +\htmlonly + +\endhtmlonly -## Intel® Core™ i5-8500 - -![](img/throughput_i5.png) -![](img/value_i5.png) -![](img/eff_i5.png) - ---- - -## Intel® Core™ i7-8700T - -![](img/throughput_i7.png) -![](img/value_i7.png) -![](img/eff_i7.png) - ---- - -## Intel® Core™ i9-10920X - -![](img/throughput_i9.png) -![](img/value_i9.png) -![](img/eff_i9.png) - ---- - -## Intel® Neural Compute Stick 2 - -![](img/throughput_ncs2.png) -![](img/value_ncs2.png) -![](img/eff_ncs2.png) -![](img/benefit_ncs2.png) - ---- - -## Intel® Vision Accelerator Design with Intel® Movidius™ VPUs (Uzel* UI-AR8) - -![](img/throughput_hddlr.png) -![](img/value_hddlr.png) -![](img/eff_hddlr.png) - ---- - -## Intel® Vision Accelerator Design with Intel® Arria® 10 FPGA - -![](img/throughput_ivad_fpga.png) -![](img/value_ivad_fpga.png) -![](img/eff_ivad_fpga.png) ## Platform Configurations -Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2020.4. +Intel® Distribution of OpenVINO™ toolkit performance benchmark numbers are based on release 2021.1. -Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of July 8, 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. +Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. Performance results are based on testing as of September 25, 2020 and may not reflect all publicly available security updates. See configuration disclosure for details. No product can be absolutely secure. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information, see [Performance Benchmark Test Disclosure](https://www.intel.com/content/www/us/en/benchmarks/benchmark.html). @@ -142,31 +121,31 @@ Testing by Intel done on: see test date for each HW platform below. | Batch size | 1 | 1 | 1 | 1 | | Precision | INT8 | INT8 | INT8 | INT8 | | Number of concurrent inference requests | 4 | 32 | 32 | 52 | -| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | July 8, 2020 | +| Test Date | September 25, 2020 | September 25, 2020 | September 25, 2020 | September 25, 2020 | | Power dissipation, TDP in Watt | [71](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html#tab-blade-1-0-1) | [125](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [105](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html#tab-blade-1-0-1) | [205](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html#tab-blade-1-0-1) | -| CPU Price on July 8, 2020, USD
Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) | +| CPU Price on September 29, 2020, USD
Prices may vary | [213](https://ark.intel.com/content/www/us/en/ark/products/134854/intel-xeon-e-2124g-processor-8m-cache-up-to-4-50-ghz.html) | [1,002](https://ark.intel.com/content/www/us/en/ark/products/193394/intel-xeon-silver-4216-processor-22m-cache-2-10-ghz.html) | [1,349](https://ark.intel.com/content/www/us/en/ark/products/193953/intel-xeon-gold-5218t-processor-22m-cache-2-10-ghz.html) | [7,405](https://ark.intel.com/content/www/us/en/ark/products/192482/intel-xeon-platinum-8270-processor-35-75m-cache-2-70-ghz.html) | **CPU Inference Engines (continue)** -| | Intel® Core™ i5-8500 | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | -| -------------------- | ---------------------------------- | ----------------------------------- |--------------------------------------| -| Motherboard | ASUS* PRIME Z370-A | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | -| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | -| Hyper Threading | OFF | ON | ON | -| Turbo Setting | ON | ON | ON | -| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | -| Kernel Version | 5.3.0-24-generic | 5.0.0-23-generic | 5.0.0-23-generic | -| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | -| BIOS Version | 2401 | F11 | 505 | -| BIOS Release | July 12, 2019 | March 13, 2019 | December 17, 2019 | -| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
set OS type to "other",
save & exit | Default Settings | -| Batch size | 1 | 1 | 1 | -| Precision | INT8 | INT8 | INT8 | -| Number of concurrent inference requests | 3 | 4 | 24 | -| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | -| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1) | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | -| CPU Price on July 8, 2020, USD
Prices may vary | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) +| | Intel® Core™ i5-8500 | Intel® Core™ i7-8700T | Intel® Core™ i9-10920X | 11th Gen Intel® Core™ i5-1145G7E | +| -------------------- | ---------------------------------- | ----------------------------------- |--------------------------------------|-----------------------------------| +| Motherboard | ASUS* PRIME Z370-A | GIGABYTE* Z370M DS3H-CF | ASUS* PRIME X299-A II | Intel Corporation
internal/Reference Validation Platform | +| CPU | Intel® Core™ i5-8500 CPU @ 3.00GHz | Intel® Core™ i7-8700T CPU @ 2.40GHz | Intel® Core™ i9-10920X CPU @ 3.50GHz | 11th Gen Intel® Core™ i5-1145G7E @ 2.60GHz | +| Hyper Threading | OFF | ON | ON | ON | +| Turbo Setting | ON | ON | ON | ON | +| Memory | 2 x 16 GB DDR4 2666MHz | 4 x 16 GB DDR4 2400MHz | 4 x 16 GB DDR4 2666MHz | 2 x 8 GB DDR4 3200MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Kernel Version | 5.3.0-24-generic | 5.0.0-23-generic | 5.0.0-23-generic | 5.8.0-05-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | Intel Corporation | +| BIOS Version | 2401 | F11 | 505 | TGLIFUI1.R00.3243.A04.2006302148 | +| BIOS Release | July 12, 2019 | March 13, 2019 | December 17, 2019 | June 30, 2020 | +| BIOS Settings | Select optimized default settings,
save & exit | Select optimized default settings,
set OS type to "other",
save & exit | Default Settings | Default Settings | +| Batch size | 1 | 1 | 1 | 1 | +| Precision | INT8 | INT8 | INT8 | INT8 | +| Number of concurrent inference requests | 3 | 4 | 24 | 4 | +| Test Date | September 25, 2020 | September 25, 2020 | September 25, 2020 | September 25, 2020 | +| Power dissipation, TDP in Watt | [65](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html#tab-blade-1-0-1) | [35](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html#tab-blade-1-0-1) | [165](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [28](https://ark.intel.com/content/www/us/en/ark/products/208081/intel-core-i5-1145g7e-processor-8m-cache-up-to-4-10-ghz.html) | +| CPU Price on September 29, 2020, USD
Prices may vary | [192](https://ark.intel.com/content/www/us/en/ark/products/129939/intel-core-i5-8500-processor-9m-cache-up-to-4-10-ghz.html) | [303](https://ark.intel.com/content/www/us/en/ark/products/129948/intel-core-i7-8700t-processor-12m-cache-up-to-4-00-ghz.html) | [700](https://ark.intel.com/content/www/us/en/ark/products/198012/intel-core-i9-10920x-x-series-processor-19-25m-cache-3-50-ghz.html) | [309](https://mysamples.intel.com/SAM_U_Product/ProductDetail.aspx?InputMMID=99A3D1&RequestID=0&ProductID=1213750) | **CPU Inference Engines (continue)** @@ -186,35 +165,35 @@ Testing by Intel done on: see test date for each HW platform below. | Batch size | 1 | 1 | | Precision | INT8 | INT8 | | Number of concurrent inference requests | 4 | 4 | -| Test Date | July 8, 2020 | July 8, 2020 | +| Test Date | September 25, 2020 | September 25, 2020 | | Power dissipation, TDP in Watt | [9.5](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [65](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html#tab-blade-1-0-1)| -| CPU Price on July 8, 2020, USD
Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) | +| CPU Price on September 29, 2020, USD
Prices may vary | [34](https://ark.intel.com/content/www/us/en/ark/products/96485/intel-atom-x5-e3940-processor-2m-cache-up-to-1-80-ghz.html) | [117](https://ark.intel.com/content/www/us/en/ark/products/126688/intel-core-i3-8100-processor-6m-cache-3-60-ghz.html) | **Accelerator Inference Engines** -| | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design
with Intel® Movidius™ VPUs (Uzel* UI-AR8) | Intel® Vision Accelerator Design
with Intel® Arria® 10 FPGA - IEI/SAF3*| -| -------------------- | ------------------------------------- | ------------------------------------- | ------------------------- | -| VPU | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 | 1 X Intel® Arria® 10 FPGA | -| Connection | USB 2.0/3.0 | PCIe X4 | PCIe X8 | -| Batch size | 1 | 1 | 1 | -| Precision | FP16 | FP16 | FP11 | -| Number of concurrent inference requests | 4 | 32 | 5 | -| Power dissipation, TDP in Watt | 2.5 | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) | [60](https://www.mouser.com/ProductDetail/IEI/MUSTANG-F100-A10-R10?qs=sGAEpiMZZMtNlGR3Dbecs5Qs0RmP5oxxCbTJPjyRuMXthliRUwiVGw%3D%3D) | -| CPU Price, USD
Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from July 8, 2020) | [768](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) (from May 15, 2020) | [1,650](https://www.bhphotovideo.com/c/product/1477989-REG/qnap_mustang_f100_a10_r10_pcie_fpga_highest_performance.html/?ap=y&ap=y&smp=y&msclkid=371b373256dd1a52beb969ecf5981bf8) (from July 8, 2020) | -| Host Computer | Intel® Core™ i7 | Intel® Core™ i5 | Intel® Xeon® E3 | -| Motherboard | ASUS* Z370-A II | Uzelinfo* / US-E1300 | IEI/SAF3* | -| CPU | Intel® Core™ i7-8700 CPU @ 3.20GHz | Intel® Core™ i5-6600 CPU @ 3.30GHz | Intel® Xeon® CPU E3-1268L v5 @ 2.40GHz | -| Hyper Threading | ON | OFF | OFF | -| Turbo Setting | ON | ON | ON | -| Memory | 4 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 2400MHz | 2 x 16 GB DDR4 2666MHz | -| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | Ubuntu* 16.04 LTS | -| Kernel Version | 5.0.0-23-generic | 5.0.0-23-generic | 4.13.0-45-generic | -| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | American Megatrends Inc.* | -| BIOS Version | 411 | 5.12 | V2RMAR15 | -| BIOS Release | September 21, 2018 | September 21, 2018 | December 03, 2019 | -| Test Date | July 8, 2020 | July 8, 2020 | July 8, 2020 | +| | Intel® Neural Compute Stick 2 | Intel® Vision Accelerator Design
with Intel® Movidius™ VPUs (Uzel* UI-AR8) | +| --------------------------------------- | ------------------------------------- | ------------------------------------- | +| VPU | 1 X Intel® Movidius™ Myriad™ X MA2485 | 8 X Intel® Movidius™ Myriad™ X MA2485 | +| Connection | USB 2.0/3.0 | PCIe X4 | +| Batch size | 1 | 1 | +| Precision | FP16 | FP16 | +| Number of concurrent inference requests | 4 | 32 | +| Power dissipation, TDP in Watt | 2.5 | [30](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) | +| CPU Price, USD
Prices may vary | [69](https://ark.intel.com/content/www/us/en/ark/products/140109/intel-neural-compute-stick-2.html) (from September 29, 2020) | [768](https://www.mouser.com/ProductDetail/IEI/MUSTANG-V100-MX8-R10?qs=u16ybLDytRaZtiUUvsd36w%3D%3D) (from May 15, 2020) | +| Host Computer | Intel® Core™ i7 | Intel® Core™ i5 | +| Motherboard | ASUS* Z370-A II | Uzelinfo* / US-E1300 | +| CPU | Intel® Core™ i7-8700 CPU @ 3.20GHz | Intel® Core™ i5-6600 CPU @ 3.30GHz | +| Hyper Threading | ON | OFF | +| Turbo Setting | ON | ON | +| Memory | 4 x 16 GB DDR4 2666MHz | 2 x 16 GB DDR4 2400MHz | +| Operating System | Ubuntu* 18.04 LTS | Ubuntu* 18.04 LTS | +| Kernel Version | 5.0.0-23-generic | 5.0.0-23-generic | +| BIOS Vendor | American Megatrends Inc.* | American Megatrends Inc.* | +| BIOS Version | 411 | 5.12 | +| BIOS Release | September 21, 2018 | September 21, 2018 | +| Test Date | September 25, 2020 | September 25, 2020 | Please follow this link for more detailed configuration descriptions: [Configuration Details](https://docs.openvinotoolkit.org/resources/benchmark_files/system_configurations_2021.1.html) diff --git a/docs/benchmarks/performance_benchmarks_faq.md b/docs/benchmarks/performance_benchmarks_faq.md index 28c15d5954d..67aa1bb2972 100644 --- a/docs/benchmarks/performance_benchmarks_faq.md +++ b/docs/benchmarks/performance_benchmarks_faq.md @@ -21,17 +21,26 @@ All of the performance benchmarks were generated using the open-sourced tool wit The image size used in the inference depends on the network being benchmarked. The following table shows the list of input sizes for each network model. | **Model** | **Public Network** | **Task** | **Input Size** (Height x Width) | |------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|-----------------------------|-----------------------------------| -| [faster_rcnn_resnet50_coco-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco) | Faster RCNN Tf | object detection | 600x1024 | -| [googlenet-v1-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/googlenet-v1) | GoogLeNet_ILSVRC-2012_Caffe | classification | 224x224 | -| [googlenet-v3-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/googlenet-v3) | Inception v3 Tf | classification | 299x299 | -| [mobilenet-ssd-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-ssd) | SSD (MobileNet)_COCO-2017_Caffe | object detection | 300x300 | +| [bert-large-uncased-whole-word-masking-squad](https://github.com/opencv/open_model_zoo/tree/develop/models/intel/bert-large-uncased-whole-word-masking-squad-int8-0001) | BERT-large |question / answer |384| +| [deeplabv3-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/deeplabv3) | DeepLab v3 Tf |semantic segmentation | 513x513 | +| [densenet-121-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/densenet-121-tf) | Densenet-121 Tf |classification | 224x224 | +| [facenet-20180408-102900-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/facenet-20180408-102900) | FaceNet TF | face recognition | 160x160 | +| [faster_rcnn_resnet50_coco-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/faster_rcnn_resnet50_coco) | Faster RCNN Tf | object detection | 600x1024 | +| [googlenet-v1-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/googlenet-v1-tf) | GoogLeNet_ILSVRC-2012 | classification | 224x224 | +| [inception-v3-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/googlenet-v3) | Inception v3 Tf | classification | 299x299 | +| [mobilenet-ssd-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-ssd) | SSD (MobileNet)_COCO-2017_Caffe | object detection | 300x300 | +| [mobilenet-v1-1.0-224-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v1-1.0-224-tf) | MobileNet v1 Tf | classification | 224x224 | | [mobilenet-v2-1.0-224-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-v2-1.0-224) | MobileNet v2 Tf | classification | 224x224 | -| [mobilenet-v2-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/mobilenet-v2) | Mobilenet V2 Caffe | classification | 224x224 | -| [resnet-101-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-101) | ResNet-101_ILSVRC-2012_Caffe | classification | 224x224 | -| [resnet-50-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-50) | ResNet-50_v1_ILSVRC-2012_Caffe | classification | 224x224 | +| [mobilenet-v2-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/mobilenet-v2-pytorch ) | Mobilenet V2 PyTorch | classification | 224x224 | +| [resnet-18-pytorch](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-18-pytorch) | ResNet-18 PyTorch | classification | 224x224 | +| [resnet-50-pytorch](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/resnet-50-pytorch) | ResNet-50 v1 PyTorch | classification | 224x224 | +| [resnet-50-TF](https://github.com/opencv/open_model_zoo/tree/master/models/public/resnet-50-tf) | ResNet-50_v1_ILSVRC-2012 | classification | 224x224 | | [se-resnext-50-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/se-resnext-50) | Se-ResNext-50_ILSVRC-2012_Caffe | classification | 224x224 | | [squeezenet1.1-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/squeezenet1.1) | SqueezeNet_v1.1_ILSVRC-2012_Caffe | classification | 227x227 | | [ssd300-CF](https://github.com/opencv/open_model_zoo/tree/master/models/public/ssd300) | SSD (VGG-16)_VOC-2007_Caffe | object detection | 300x300 | +| [yolo_v3-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/yolo-v3-tf) | TF Keras YOLO v3 Modelset | object detection | 300x300 | +| [ssd_mobilenet_v1_coco-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssd_mobilenet_v1_coco) | ssd_mobilenet_v1_coco | object detection | 300x300 | +| [ssdlite_mobilenet_v2-TF](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssdlite_mobilenet_v2) | ssd_mobilenet_v2 | object detection | 300x300 | #### 7. Where can I purchase the specific hardware used in the benchmarking? Intel partners with various vendors all over the world. Visit the [Intel® AI: In Production Partners & Solutions Catalog](https://www.intel.com/content/www/us/en/internet-of-things/ai-in-production/partners-solutions-catalog.html) for a list of Equipment Makers and the [Supported Devices](../IE_DG/supported_plugins/Supported_Devices.md) documentation. You can also remotely test and run models before purchasing any hardware by using [Intel® DevCloud for the Edge](http://devcloud.intel.com/edge/). @@ -42,9 +51,17 @@ We published a set of guidelines and recommendations to optimize your models ava #### 9. Why are INT8 optimized models used for benchmarking on CPUs with no VNNI support? The benefit of low-precision optimization using the OpenVINO™ toolkit model optimizer extends beyond processors supporting VNNI through Intel® DL Boost. The reduced bit width of INT8 compared to FP32 allows Intel® CPU to process the data faster and thus offers better throughput on any converted model agnostic of the intrinsically supported low-precision optimizations within Intel® hardware. Please refer to [INT8 vs. FP32 Comparison on Select Networks and Platforms](./performance_int8_vs_fp32.html) for comparison on boost factors for different network models and a selection of Intel® CPU architectures, including AVX-2 with Intel® Core™ i7-8700T, and AVX-512 (VNNI) with Intel® Xeon® 5218T and Intel® Xeon® 8270. -#### 10. Previous releases included benchmarks on googlenet-v1. Why is there no longer benchmarks on this neural network model? -We replaced googlenet-v1 to [resnet-18-pytorch](https://github.com/opencv/open_model_zoo/blob/master/models/public/resnet-18-pytorch/resnet-18-pytorch.md) due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1. Both googlenet-v1 and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases. +#### 10. Previous releases included benchmarks on googlenet-v1-CF (Caffe). Why is there no longer benchmarks on this neural network model? +We replaced googlenet-v1-CF to resnet-18-pytorch due to changes in developer usage. The public model resnet-18 is used by many developers as an Image Classification model. This pre-optimized model was also trained on the ImageNet database, similar to googlenet-v1-CF. Both googlenet-v1-CF and resnet-18 will remain part of the Open Model Zoo. Developers are encouraged to utilize resnet-18-pytorch for Image Classification use cases. +#### 11. Why have resnet-50-CF, mobilenet-v1-1.0-224-CF, mobilenet-v2-CF and resnet-101-CF been removed? +The CAFFE version of resnet-50, mobilenet-v1-1.0-224 and mobilenet-v2 have been replaced with their TensorFlow and PyTorch counterparts. Resnet-50-CF is replaced by resnet-50-TF, mobilenet-v1-1.0-224-CF is replaced by mobilenet-v1-1.0-224-TF and mobilenet-v2-CF is replaced by mobilenetv2-PyTorch. Resnet-50-CF an resnet-101-CF are no longer maintained at their public source repos. + +#### 12. Where can I search for OpenVINO™ performance results based on HW-platforms? +The web site format has changed in order to support the more common search approach of looking for the performance of a given neural network model on different HW-platforms. As opposed to review a given HW-platform's performance on different neural network models. + +#### 13. How is Latency measured? +Latency is measured by running the OpenVINO™ inference engine in synchronous mode. In synchronous mode each frame or image is processed through the entire set of stages (pre-processing, inference, post-processing) before the next frame or image is processed. This KPI is relevant for applications where the inference on a single image is required, for example the analysis of an ultra sound image in a medical application or the analysis of a seismic image in the oil & gas industry. Other use cases include real-time or near real-time applications like an industrial robot's response to changes in its environment and obstacle avoidance for autonomous vehicles where a quick response to the result of the inference is required. \htmlonly