removed opencv from benchmark_app; disabled vpu, gna, gpu plugin builds; disabled opencv by default

create pre-release branch
2019-10-07 16:27:36 +03:00 · 2019-10-04 20:16:04 +03:00
180 changed files with 21651 additions and 20571 deletions
--- a/INT8_WORKFLOW.md
+++ b/INT8_WORKFLOW.md
@@ -0,0 +1,83 @@
+OpenVINO Int8 Workflow In a Nutshell
+-----------------------------------
+To operate with int8, all the data (weights, inputs, activations, etc) should be carefully quantized. The quantization process is driven by:
+
+* Normalization (or scaling) factor, determined by range of the data
+* Quantization level, which depends on whether data is signed or unsigned, and destination precision.
+
+OpenVINO supports two main sources of this information, and thus two main sources of the int8 models:
+
+* Conversion of the framework-quantized models. This approach relies on the training for low precision and subsequent conversion of the resulting model with [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) tool. This approach usually gives optimal accuracy and performance, but requires careful model re-training/fine-tuning. Then for inference, both normalization and quantization factors are deduced fully from the model data (e.g. FakeQuantize layers) and no additional steps are required.
+
+* Post-training quantization of the floating point models with the [Calibration tool](https://docs.openvinotoolkit.org/latest_docs_IE_DG_Int8Inference.html#low_precision_8_bit_integer_inference_workflow). Just like approach described earlier, the calibration is also fully offline additional step to equip a model with (optional) int8 information.  The approach is somewhat more universal, requiring just floating point model and no retraining to leverage the int8. The calibration is iterative process of gathering _activations_ statistics like histogram (for determining scaling/parameters), applying the quantization parameters and evaluating resulting model accuracy to keep it as close to original as possible. For _weights_, in contrast, the maximum abs value per output channel m is found. The per-channel range is then [-m,m]. This calibration process trades the performance vs accuracy and results in a mixed precision model which are a combination of fp32 (high accuracy) and int8 (high performance) layers.
+
+Notice that OpenVINO assumes the symmetrically quantized models (with respect to weights) and either symmetric (signed) or fully unsigned activations.
+
+Quantized Model Example
+-----------------------------------
+For the MLPerf 0.5 submission, the only directly converted quantized model is ssd-mobilenet from Habana ("ssd-mobilenet 300x300 symmetrically quantized finetuned"), referenced at https://github.com/mlperf/inference/tree/master/v0.5/classification_and_detection.
+
+To convert the model, just call the [Model Optimizer](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html). There are certain specifics for [converting The TensorFlow Object Detection API models](https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_convert_model_tf_specific_Convert_Object_Detection_API_Models.html). For example, the original pipeline.config is needed. For the symmetrically quantized it is actually the same as for another Habana's model  ("ssd-mobilenet 300x300 quantized finetuned").
+
+Conversion command-line is as follows:
+```
+$ python3  <OPENVINO_INSTALL_DIR/deployment_tools/model_optimizer/>mo.py
+--input_model <path_to_model/>ssd_mobilenet_v1_quant_ft_no_zero_point_frozen_inference_graph.pb
+--input_shape [1,300,300,3]
+--reverse_input_channels
+--tensorflow_use_custom_operations_config <OPENVINO_INSTALL_DIR/deployment_tools/model_optimizer/>extensions/front/tf/ssd_v2_support.json
+--tensorflow_object_detection_api_pipeline_config <path_to_model/>pipeline.config
+```
+
+Model Calibration Example
+-----------------------------------
+To give an example of the [calibration workflow](https://docs.openvinotoolkit.org/latest/_inference_engine_tools_calibration_tool_README.html), let's consider ResNet-50 (v1.5) example ("resnet50-v1.5	tensorflowfp32 NHWC").
+
+* First, the model is converted from an original framework format using the Model Optimizer tool. Since this is classification (and not detection) model, the command-line is really simple:
+
+```
+$ python3  <OPENVINO_INSTALL_DIR/deployment_tools/model_optimizer/> mo.py --input_model ./resnet50_v1.pb --input_shape [1,224,224,3] --reverse_input_channels
+```
+
+This outputs the model in Intermediate Representation (IR) format ( *.xml and *.bin file). FP32 is default precision
+(use '--data_type FP16' to get fp16 model instead, which is more GPU-friendly).
+
+* Secondly, perform model calibration using the [Calibration tool](https://docs.openvinotoolkit.org/latest_docs_IE_DG_Int8Inference.html#low_precision_8_bit_integer_inference_workflow). The tool is framework-agnostic and accepts the model in the IR format. Model calibration requires a validation dataset (to keep track of the accuracy during calibration). Currently, the calibration tool comes with example support of classification and object detection  models on ImageNet and VOC2007/COCO data sets respectively and associated accuracy metrics. It is relatively straightforward to add another datasets and metrics.
+
+The accuracy validation in turn comes via [Accuracy Checker](https://github.com/opencv/open_model_zoo/tree/develop/tools/accuracy_checker/accuracy_checker/) tool.
+For that, the dataset specific annotations [are converted in the common format](https://github.com/opencv/open_model_zoo/tree/develop/tools/accuracy_checker/accuracy_checker/annotation_converters).
+Specifically for the ImageNet required for the ResNet, the command-line is as follows:
+```
+$ convert_annotation imagenet --annotation_file <PATH_TO_IMAGES>/ILSVRC2012_val.txt --labels_file <PATH_TO_IMAGES>/synset_words.txt --has_background True
+```
+This outputs *.pickle and *.json files used in calibration via
+[configuration files in YML](https://docs.openvinotoolkit.org/latest/_inference_engine_tools_calibration_tool_README.html).
+Alternatively, you can specify the annotation conversion parameters in the config file and let the calibration tool call the 'convert_annotation' tool.
+Similarly, the calibration tool can either accept the converted model as an IR, or the original model directly and perform conversion on the flight.
+Both ways are governed by the 'launchers' section of the config file.
+
+Care must be taken on the configuration in general, as there are many items like pre-processing
+(mean and scale values, RGB vs BGR), resizing (with and without crop, etc), and so on, that can severely
+affect the resulting accuracy. Notice that the pre-processing applied during calibration should match the pre-processing that is later used for inference.
+Also, the pre-processing parameters (like mean/scale, or RGB-BGR conversion) can be either part of the Model Optimizer cmd-line
+('mo_params' section of the config file) and this will bake the input transformations directly _into the resulting model_,
+or 'preprocessing' section of the 'dataset'. The latter doesn't not include the pre-processing into the model,
+but applies it to _every loaded dataset image_ instead (before using within the calibration).
+The choice depends on your inference pipeline: if the pre-processing is explicitly performed in the code,
+the model shouldn't include that, to avoid double pre-processing.
+
+See example YML files for the MLPerf models in the 'example_calibration_files' folder.
+The files define the original models, govern conversion to the IR, dataset annotations conversion,
+and finally the calibration itself. You only have to patch the paths to your local machines.
+*Notice that the pre-processing is not included into a model
+(and thus assumed to be applied to an input image before inferencing that), see earlier this section*.
+
+Finally, the calibration command-line is as simple as:
+```
+$ python3 calibrate.py
+-c <PATH_TO_CONFIG>/resnet_v1.5_50.yml
+-M <PATH_TO_MODEL_OPTIMIZER>
+-C <PATH_TO_OUTPUT_FP_IR>
+--output_dir <PATH_TO_OUTPUT_I8_IR>
+```
+Resulting IR contains original floating point (that all OpenVINO device plugins should support) and (optional) int8 statistics, that some devices might ignore (if int8 is not supported on the device), falling back to the original model.
--- a/README.md
+++ b/README.md
@@ -2,9 +2,10 @@
 [![Stable release](https://img.shields.io/badge/version-2019.R3-green.svg)](https://github.com/opencv/dldt/releases/tag/2019_R3)
 [![Apache License Version 2.0](https://img.shields.io/badge/license-Apache_2.0-green.svg)](LICENSE)

-This toolkit allows developers to deploy pre-trained deep learning models through a high-level C++ Inference Engine API integrated with application logic. 
+This toolkit allows developers to deploy pre-trained deep learning models through a high-level C++ Inference Engine API integrated with application logic.

-This open source version includes two components, namely Model Optimizer and Inference Engine, as well as CPU, GPU and heterogeneous plugins to accelerate deep learning inferencing on Intel(R) CPUs and Intel(R) Processor Graphics. It supports pre-trained models from the [Open Model Zoo](https://github.com/opencv/open_model_zoo/) along with 100+ open source and public models in popular formats such as Caffe*, Tensorflow*, MXNet* and ONNX*. 
+This open source version includes two components, namely Model Optimizer and Inference Engine, as well as CPU, GPU and heterogeneous plugins to accelerate deep learning inferencing on Intel(R) CPUs and Intel(R) Processor Graphics. It supports pre-trained models from the [Open Model Zoo](https://github.com/opencv/open_model_zoo/) along with 100+ open source and public models in popular formats such as Caffe*, Tensorflow*, MXNet* and ONNX*.
+For int8 workflow primer, please see INT8_WORKFLOW.md.

 ## Repository components:
 * [Inference Engine](https://software.intel.com/en-us/articles/OpenVINO-InferEngine)
@@ -35,7 +36,7 @@ Deep Learning Deployment Toolkit is licensed under Apache License, Version 2.0.
 ## Support
 Please report questions, issues and suggestions using:
 * [\#openvino](https://stackoverflow.com/search?q=%23openvino) tag on StackOverflow*
-* [GitHub* Issues](https://github.com/opencv/dldt/issues) 
+* [GitHub* Issues](https://github.com/opencv/dldt/issues)
 * [Forum](https://software.intel.com/en-us/forums/computer-vision)

 ---
--- a/example_calibration_files/mobilenet_v1_1.0_224.yml
+++ b/example_calibration_files/mobilenet_v1_1.0_224.yml
@@ -0,0 +1,38 @@
+models:
+  - name: MobileNet_v1_1.0_224
+    launchers:
+      - framework: dlsdk
+        device: CPU
+        tf_model: <PATH_TO_MODEL>/mobilenet_v1_1.0_224.pb
+        adapter: classification
+        mo_params:
+          data_type: FP16
+          input_shape: (1, 224, 224, 3)
+        cpu_extensions: AUTO
+    datasets:
+      - name: ImageNet2012_bkgr
+        data_source: 
+        annotation: <PATH_TO_AC_ANNOTATIONS>/imagenet.pickle
+        dataset_meta: <PATH_TO_AC_ANNOTATIONS>/imagenet.json
+        annotation_conversion:
+          converter: imagenet
+          annotation_file: <PATH_TO_IMAGENET_IMAGES>/ILSVRC2012_val.txt
+          labels_file: <PATH_TO_IMAGENET_IMAGES>/synset_words.txt
+          has_background: True
+        subsample_size: 2000
+        preprocessing:
+          - type: bgr_to_rgb
+          - type: resize
+            size: 256
+          - type: crop
+            size: 224
+          - type: normalization
+            mean: (127.5, 127.5, 127.5)
+            std: 127.5
+        metrics:
+          - name: accuracy @ top1
+            type: accuracy
+            top_k: 1
+          - name: accuracy @ top5
+            type: accuracy
+            top_k: 5
--- a/example_calibration_files/resnet_v1.5_50.yml
+++ b/example_calibration_files/resnet_v1.5_50.yml
@@ -0,0 +1,39 @@
+models:
+  - name: ResNet_v1.5_50
+    launchers:
+      - framework: dlsdk
+        device: CPU
+        tf_model: <PATH_TO_MODEL>/resnet_v1.5_50.pb
+        adapter: classification
+        mo_params:
+          data_type: FP16
+          input_shape: (1, 224, 224, 3)
+          output: softmax_tensor
+        cpu_extensions: AUTO
+    datasets:
+      - name: ImageNet2012_bkgr
+        data_source: <PATH_TO_IMAGENET_IMAGES>
+        annotation: <PATH_TO_AC_ANNOTATIONS>/imagenet.pickle
+        dataset_meta: <PATH_TO_AC_ANNOTATIONS>/imagenet.json
+        annotation_conversion:
+          converter: imagenet
+          annotation_file: <PATH_TO_IMAGENET_IMAGES>/ILSVRC2012_val.txt
+          labels_file: <PATH_TO_IMAGENET_IMAGES>/synset_words.txt
+          has_background: True
+        subsample_size: 2000
+        preprocessing:
+          - type: bgr_to_rgb
+          - type: resize
+            size: 256
+            aspect_ratio_scale: greater
+          - type: crop
+            size: 224
+          - type: normalization
+            mean: 123, 117, 104
+        metrics:
+          - name: accuracy @ top1
+            type: accuracy
+            top_k: 1
+          - name: accuracy @ top5
+            type: accuracy
+            top_k: 5
--- a/example_calibration_files/ssd_resnet34.yml
+++ b/example_calibration_files/ssd_resnet34.yml
@@ -0,0 +1,44 @@
+models:
+  - name: SSD_ResNet34
+    launchers:
+      - framework: dlsdk
+        device: CPU
+        onnx_model: <PATH_TO_MODEL>/resnet34-ssd1200.onnx
+        adapter:
+          type: ssd_onnx
+          scores_out: '.*scores.*'
+          labels_out: '.*labels.*'
+          bboxes_out: '.*bboxes.*'
+        cpu_extensions: AUTO
+        mo_params:
+          data_type: FP16
+
+    datasets:
+      - name: COCO2017_80cl_bkgr
+        reader: pillow_imread
+        data_source: <PATH_TO_DATASET>/COCO/2017/val2017
+        annotation: <PATH_TO_AC_ANNOTATIONS>/mscoco_detection.pickle
+        dataset_meta: <PATH_TO_AC_ANNOTATIONS>/mscoco_detection.json
+        annotation_conversion:
+          converter: mscoco_detection
+          annotation_file: <PATH_TO_DATASET>/COCO/2017/annotations/instances_val2017.json
+          has_background: True
+          use_full_label_map: False
+        subsample_size: 300
+        preprocessing:
+          - type: resize
+            size: 1200
+            use_pillow: true
+            interpolation: BILINEAR
+          - type: normalization
+            mean: (123.675, 116.28, 103.53)
+            std: (58.395, 57.12, 57.375)
+        postprocessing:
+          - type: resize_prediction_boxes
+        metrics:
+          - type: map
+            integral: 11point
+            ignore_difficult: true
+            presenter: print_scalar
+          - type: coco_precision
+          - type: coco_orig_precision   
--- a/inference-engine/README.md
+++ b/inference-engine/README.md
@@ -7,24 +7,7 @@
  - [Software Requirements](#software-requirements)
  - [Build Steps](#build-steps)
  - [Additional Build Options](#additional-build-options)
- [Build for Raspbian* Stretch OS](#build-for-raspbian-stretch-os)
-  - [Hardware Requirements](#hardware-requirements)
-  - [Native Compilation](#native-compilation)
-  - [Cross Compilation Using Docker*](#cross-compilation-using-docker)
-  - [Additional Build Options](#additional-build-options-1)
- [Build on Windows* Systems](#build-on-windows-systems)
-  - [Software Requirements](#software-requirements-1)
-  - [Build Steps](#build-steps-1)
-  - [Additional Build Options](#additional-build-options-2)
-  - [Building Inference Engine with Ninja* Build System](#building-inference-engine-with-ninja-build-system)
- [Build on macOS* Systems](#build-on-macos-systems)
-  - [Software Requirements](#software-requirements-2)
-  - [Build Steps](#build-steps-2)
-  - [Additional Build Options](#additional-build-options-3)
- [Use Custom OpenCV Builds for Inference Engine](#use-custom-opencv-builds-for-inference-engine)
- [(Optional) Additional Installation Steps for the Intel® Movidius™ Neural Compute Stick and Neural Compute Stick 2](#optional-additional-installation-steps-for-the-intel-movidius-neural-compute-stick-and-neural-compute-stick-2)
-  - [For Linux, Raspbian Stretch* OS](#for-linux-raspbian-stretch-os)
-  - [For Windows](#for-windows-1)
+- [(Optional) Use Custom OpenCV Builds for Inference Engine](#use-custom-opencv-builds-for-inference-engine)
 - [Next Steps](#next-steps)
 - [Additional Resources](#additional-resources)

@@ -41,13 +24,12 @@ The open source version of Inference Engine includes the following plugins:
 | MYRIAD plugin        | Intel® Movidius™ Neural Compute Stick powered by the Intel® Movidius™ Myriad™ 2, Intel® Neural Compute Stick 2 powered by the Intel® Movidius™ Myriad™ X |
 | Heterogeneous plugin | Heterogeneous plugin enables computing for inference on one network on several Intel® devices. |

-Inference Engine plugin for Intel® FPGA is distributed only in a binary form as a part of [Intel® Distribution of OpenVINO™](https://software.intel.com/en-us/openvino-toolkit).
+Please see additional document on the low-precision (int8) flow in the root directory.

 ## Build on Linux* Systems

 The software was validated on:
- Ubuntu\* 16.04 (64-bit) with default GCC\* 5.4.0
- CentOS\* 7.4 (64-bit) with default GCC\* 4.8.5
+- Ubuntu\* 16.04, 18.04 (64-bit) with default GCC

 ### Software Requirements
 - [CMake\*](https://cmake.org/download/) 3.5 or higher
@@ -78,15 +60,7 @@ The software was validated on:

 You can use the following additional build options:

- Internal JIT GEMM implementation is used by default.
-
- To switch to OpenBLAS\* implementation, use the `GEMM=OPENBLAS` option and `BLAS_INCLUDE_DIRS` and `BLAS_LIBRARIES` CMake options to specify path to the OpenBLAS headers and library. For example use the following options on CentOS\*: `-DGEMM=OPENBLAS -DBLAS_INCLUDE_DIRS=/usr/include/openblas -DBLAS_LIBRARIES=/usr/lib64/libopenblas.so.0`.
-
- To switch to the optimized MKL-ML\* GEMM implementation, use `-DGEMM=MKL` and `-DMKLROOT=<path_to_MKL>` CMake options to specify a path to unpacked MKL-ML with the `include` and `lib` folders. MKL-ML\* package can be downloaded from the [MKL-DNN repository](https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_lnx_2019.0.5.20190502.tgz).
-
- Threading Building Blocks (TBB) is used by default. To build the Inference Engine with OpenMP* threading, set the `-DTHREADING=OMP` option.
-
- Required versions of TBB and OpenCV packages are downloaded automatically by the CMake-based script. If you want to use the automatically downloaded packages but you already have installed TBB or OpenCV packages configured in your environment, you may need to clean the `TBBROOT` and `OpenCV_DIR` environment variables before running the `cmake` command, otherwise they won't be downloaded and the build may fail if incompatible versions were installed. 
+- Required versions of TBB/OMP and OpenCV packages are downloaded automatically by the CMake-based script. If you want to use the automatically downloaded packages but you already have installed TBB or OpenCV packages configured in your environment, you may need to clean the `TBBROOT` and `OpenCV_DIR` environment variables before running the `cmake` command, otherwise they won't be downloaded and the build may fail if incompatible versions were installed. 

 - If the CMake-based build script can not find and download the OpenCV package that is supported on your platform, or if you want to use a custom build of the OpenCV library, refer to the [Use Custom OpenCV Builds](#use-custom-opencv-builds-for-inference-engine) section for details. 

@@ -96,8 +70,7 @@ You can use the following additional build options:
   -DPYTHON_LIBRARY=/usr/lib/x86_64-linux-gnu/libpython3.7m.so \
   -DPYTHON_INCLUDE_DIR=/usr/include/python3.7
   ```
-
- To switch off/on the CPU and GPU plugins, use the `cmake` options `-DENABLE_MKL_DNN=ON/OFF` and `-DENABLE_CLDNN=ON/OFF` respectively. 
+- To switch off/on the CPU and GPU plugins, use the `cmake` options `-DENABLE_MKL_DNN=ON/OFF` and `-DENABLE_CLDNN=ON/OFF` respectively.
  
 5. Adding to your project

@@ -117,352 +90,6 @@ You can use the following additional build options:
    target_link_libraries(${PROJECT_NAME} ${InferenceEngine_LIBRARIES} dl)
    ```

-## Build for Raspbian Stretch* OS
-
-> **NOTE**: Only the MYRIAD plugin is supported.
-
-### Hardware Requirements
-* Raspberry Pi\* 2 or 3 with Raspbian\* Stretch OS (32-bit). Check that it's CPU supports ARMv7 instruction set (`uname -m` command returns `armv7l`).
-
-  > **NOTE**: Despite the Raspberry Pi\* CPU is ARMv8, 32-bit OS detects ARMv7 CPU instruction set. The default `gcc` compiler applies ARMv6 architecture flag for compatibility with lower versions of boards. For more information, run the `gcc -Q --help=target` command and refer to the description of the `-march=` option.
-
-You can compile the Inference Engine for Raspberry Pi\* in one of the two ways:
-* [Native Compilation](#native-compilation), which is the simplest way, but time-consuming
-* [Cross Compilation Using Docker*](#cross-compilation-using-docker), which is the recommended way
-
-### Native Compilation
-Native compilation of the Inference Engine is the most straightforward solution. However, it might take at least one hour to complete on Raspberry Pi\* 3.
-
-1. Install dependencies:
-
-  ```bash
-  sudo apt-get update
-  sudo apt-get install -y git cmake libusb-1.0-0-dev
-  ```
-
-2. Go to the `inference-engine` directory of the cloned `dldt` repository:
-
-  ```bash
-  cd dldt/inference-engine
-  ```
-
-3. Initialize submodules:
-
-  ```bash
-  git submodule init
-  git submodule update --recursive
-  ```
-
-4. Create a build folder:
-
-  ```bash
-  mkdir build && cd build
-  ```
-
-5. Build the Inference Engine:
-
-  ```bash
-  cmake -DCMAKE_BUILD_TYPE=Release \
-        -DENABLE_SSE42=OFF \
-        -DTHREADING=SEQ \
-        -DENABLE_GNA=OFF .. && make
-  ```
-
-### Cross Compilation Using Docker*
-
-  This compilation was tested on the following configuration:
-
-  * Host: Ubuntu\* 16.04 (64-bit, Intel® Core™ i7-6700K CPU @ 4.00GHz × 8)
-  * Target: Raspbian\* Stretch (32-bit, ARMv7, Raspberry Pi\* 3)
-
-1. Install Docker\*:
-
-  ```bash
-  sudo apt-get install -y docker.io
-  ```
-
-2. Add a current user to `docker` group:
-
-  ```bash
-  sudo usermod -a -G docker $USER
-  ```
-
-  Log out and log in for this to take effect.
-
-3. Create a directory named `ie_cross_armhf` and add a text file named `Dockerfile`
-with the following content:
-
-  ```docker
-  FROM debian:stretch
-
-  USER root
-
-  RUN dpkg --add-architecture armhf && \
-      apt-get update && \
-      apt-get install -y --no-install-recommends \
-      build-essential \
-      crossbuild-essential-armhf \
-      git \
-      wget \
-      libusb-1.0-0-dev:armhf \
-      libgtk-3-dev:armhf \
-      libavcodec-dev:armhf \
-      libavformat-dev:armhf \
-      libswscale-dev:armhf \
-      libgstreamer1.0-dev:armhf \
-      libgstreamer-plugins-base1.0-dev:armhf \
-      libpython3-dev:armhf \
-      python3-pip
-      
-  RUN wget https://www.cmake.org/files/v3.14/cmake-3.14.3.tar.gz && \
-      tar xf cmake-3.14.3.tar.gz && \
-      (cd cmake-3.14.3 && ./bootstrap --parallel=$(nproc --all) && make --jobs=$(nproc --all) && make install) && \
-      rm -rf cmake-3.14.3 cmake-3.14.3.tar.gz
-
-  ```
-
-  It uses the Debian\* Stretch (Debian 9) OS for compilation because it is a base of the Raspbian\* Stretch.
-
-4. Build a Docker\* image:
-
-  ```bash
-  docker image build -t ie_cross_armhf ie_cross_armhf
-  ```
-
-5. Run Docker\* container with mounted source code folder from host:
-
-  ```bash
-  docker run -it -v /absolute/path/to/dldt:/dldt ie_cross_armhf /bin/bash
-  ```
-
-6. While in the container:
-
-    1. Go to the `inference-engine` directory of the cloned `dldt` repository:
-
-      ```bash
-      cd dldt/inference-engine
-      ```
-
-    2. Create a build folder:
-
-      ```bash
-      mkdir build && cd build
-      ```
-
-    3. Build the Inference Engine:
-
-      ```bash
-      cmake -DCMAKE_BUILD_TYPE=Release \
-          -DCMAKE_TOOLCHAIN_FILE="../cmake/arm.toolchain.cmake" \
-          -DTHREADS_PTHREAD_ARG="-pthread" \
-          -DENABLE_SSE42=OFF \
-          -DTHREADING=SEQ \
-          -DENABLE_GNA=OFF .. && make --jobs=$(nproc --all)
-      ```
-
-7. Press "Ctrl"+"D" to exit from Docker\*. You can find the resulting binaries in the `dldt/inference-engine/bin/armv7l/` directory and the OpenCV* installation in the `dldt/inference-engine/temp`.
-   
->**NOTE**: Native applications that link to cross-compiled Inference Engine library require an extra compilation flag `-march=armv7-a`.
-
-### Additional Build Options
-
-You can use the following additional build options:
-
- Required versions of OpenCV packages are downloaded automatically by the CMake-based script. If you want to use the automatically downloaded packages but you already have installed OpenCV packages configured in your environment, you may need to clean the `OpenCV_DIR` environment variable before running the `cmake` command, otherwise they won't be downloaded and the build may fail if incompatible versions were installed. 
-
- If the CMake-based build script can not find and download the OpenCV package that is supported on your platform, or if you want to use a custom build of the OpenCV library, refer to the [Use Custom OpenCV Builds](#use-custom-opencv-builds-for-inference-engine) section for details.
-
- To build Python API wrapper, install `libpython3-dev:armhf` and `python3-pip` packages using `apt-get`, then install `numpy` and `cython` python modules using `pip3` command and add the following cmake options:
-```sh
-  -DENABLE_PYTHON=ON \
-  -DPYTHON_EXECUTABLE=/usr/bin/python3.5 \
-  -DPYTHON_LIBRARY=/usr/lib/arm-linux-gnueabihf/libpython3.5m.so \
-  -DPYTHON_INCLUDE_DIR=/usr/include/python3.5
-```
-
-## Build on Windows* Systems
-
-The software was validated on:
- Microsoft\* Windows\* 10 (64-bit) with Visual Studio 2017 and Intel® C++ Compiler 2018 Update 3
-
-### Software Requirements
- [CMake\*](https://cmake.org/download/) 3.5 or higher
- [OpenBLAS\*](https://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int64.zip/download) and [mingw64\* runtime dependencies](https://sourceforge.net/projects/openblas/files/v0.2.14/mingw64_dll.zip/download).
- [Intel® C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) 18.0 to build the Inference Engine on Windows.
- (Optional) [Intel® Graphics Driver for Windows* [25.20] driver package](https://downloadcenter.intel.com/download/28646/Intel-Graphics-Windows-10-DCH-Drivers?product=80939).
- Python 3.4 or higher for Inference Engine Python API wrapper
-
-### Build Steps
-1. Clone submodules:
-    ```sh
-    git submodule init
-    git submodule update --recursive
-    ```
-2. Download and install [Intel® C++ Compiler](https://software.intel.com/en-us/intel-parallel-studio-xe) 18.0
-3. Install OpenBLAS:
-    1. Download [OpenBLAS\*](https://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int64.zip/download)
-    2. Unzip the downloaded package to a directory on your machine. In this document, this directory is referred to as `<OPENBLAS_DIR>`.
-4. By default, the build enables the Inference Engine GPU plugin to infer models on your Intel® Processor Graphics. This requires you to [download and install the Intel® Graphics Driver for Windows* [25.20] driver package](https://downloadcenter.intel.com/download/28646/Intel-Graphics-Windows-10-DCH-Drivers?product=80939) before running the build. If you don't want to use the GPU plugin, use the `-DENABLE_CLDNN=OFF` CMake build option and skip the installation of the Intel® Graphics Driver.    
-5. Create build directory:
-    ```sh
-    mkdir build
-    ```
-6. In the `build` directory, run `cmake` to fetch project dependencies and generate a Visual Studio solution:
-```sh
-cd build
-cmake -G "Visual Studio 15 2017 Win64" -T "Intel C++ Compiler 18.0" ^
-    -DCMAKE_BUILD_TYPE=Release ^
-    -DICCLIB="C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\compiler\lib" ..
-```
-
-7. Build generated solution in Visual Studio 2017 or run `cmake --build . --config Release` to build from the command line.
-
-8. Before running the samples, add paths to TBB and OpenCV binaries used for the build to the `%PATH%` environment variable. By default, TBB binaries are downloaded by the CMake-based script to the `<dldt_repo>/inference-engine/temp/tbb/lib` folder, OpenCV binaries - to the `<dldt_repo>/inference-engine/temp/opencv_4.1.0/bin` folder.
-
-### Additional Build Options
-
- Internal JIT GEMM implementation is used by default.
- To switch to OpenBLAS GEMM implementation, use the `-DGEMM=OPENBLAS` CMake option and specify path to OpenBLAS using the `-DBLAS_INCLUDE_DIRS=<OPENBLAS_DIR>\include` and `-DBLAS_LIBRARIES=<OPENBLAS_DIR>\lib\libopenblas.dll.a` options. Prebuilt OpenBLAS\* package can be downloaded [here](https://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLAS-v0.2.14-Win64-int64.zip/download). mingw64* runtime dependencies can be downloaded [here](https://sourceforge.net/projects/openblas/files/v0.2.14/mingw64_dll.zip/download).
- To switch to the optimized MKL-ML\* GEMM implementation, use the `-DGEMM=MKL` and `-DMKLROOT=<path_to_MKL>` CMake options to specify a path to unpacked MKL-ML with the `include` and `lib` folders. MKL-ML\* package can be downloaded from the [MKL-DNN repository](https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_win_2019.0.5.20190502.zip).
-
- Threading Building Blocks (TBB) is used by default. To build the Inference Engine with OpenMP* threading, set the `-DTHREADING=OMP` option.
-
- Required versions of TBB and OpenCV packages are downloaded automatically by the CMake-based script. If you want to use the automatically downloaded packages but you already have installed TBB or OpenCV packages configured in your environment, you may need to clean the `TBBROOT` and `OpenCV_DIR` environment variables before running the `cmake` command, otherwise they won't be downloaded and the build may fail if incompatible versions were installed.
-
- If the CMake-based build script can not find and download the OpenCV package that is supported on your platform, or if you want to use a custom build of the OpenCV library, refer to the [Use Custom OpenCV Builds](#use-custom-opencv-builds-for-inference-engine) section for details.
-
- To switch off/on the CPU and GPU plugins, use the `cmake` options `-DENABLE_MKL_DNN=ON/OFF` and `-DENABLE_CLDNN=ON/OFF` respectively.
-
- To build the Python API wrapper, use the `-DENABLE_PYTHON=ON` option. To specify an exact Python version, use the following options:
-   ```sh
-   -DPYTHON_EXECUTABLE="C:\Program Files\Python37\python.exe" ^
-   -DPYTHON_LIBRARY="C:\Program Files\Python37\libs\python37.lib" ^
-   -DPYTHON_INCLUDE_DIR="C:\Program Files\Python37\include"
-   ```
-
-### Building Inference Engine with Ninja* Build System
-
-```sh
-call "C:\Program Files (x86)\IntelSWTools\compilers_and_libraries_2018\windows\bin\ipsxe-comp-vars.bat" intel64 vs2017
-set CXX=icl
-set CC=icl
-:: clean TBBROOT value set by ipsxe-comp-vars.bat, required TBB package will be downloaded by dldt cmake script
-set TBBROOT=
-cmake -G Ninja -Wno-dev -DCMAKE_BUILD_TYPE=Release ..
-cmake --build . --config Release
-```
-
-## Build on macOS* Systems
-
-> **NOTE**: The current version of the OpenVINO™ toolkit for macOS* supports inference on Intel CPUs only.
-
-The software was validated on:
- macOS\* 10.14, 64-bit
-
-### Software Requirements
- [CMake\*](https://cmake.org/download/) 3.5 or higher
- Clang\* compiler from Xcode\* 10.1
- Python\* 3.4 or higher for the Inference Engine Python API wrapper
-
-### Build Steps
-1. Clone submodules:
-    ```sh
-    cd dldt/inference-engine
-    git submodule init
-    git submodule update --recursive
-    ```
-2. Install build dependencies using the `install_dependencies.sh` script in the project root folder.
-3. Create a build folder:
-```sh
-  mkdir build
-```
-4. Inference Engine uses a CMake-based build system. In the created `build` directory, run `cmake` to fetch project dependencies and create Unix makefiles, then run `make` to build the project:
-```sh
-  cmake -DCMAKE_BUILD_TYPE=Release ..
-  make --jobs=$(nproc --all)
-```
-### Additional Build Options
-
-You can use the following additional build options:
- Internal JIT GEMM implementation is used by default.
- To switch to the optimized MKL-ML\* GEMM implementation, use `-DGEMM=MKL` and `-DMKLROOT=<path_to_MKL>` cmake options to specify a path to unpacked MKL-ML with the `include` and `lib` folders. MKL-ML\* package can be downloaded [here](https://github.com/intel/mkl-dnn/releases/download/v0.19/mklml_mac_2019.0.5.20190502.tgz)
-
- Threading Building Blocks (TBB) is used by default. To build the Inference Engine with OpenMP* threading, set the `-DTHREADING=OMP` option.
-
- Required versions of TBB and OpenCV packages are downloaded automatically by the CMake-based script. If you want to use the automatically downloaded packages but you already have installed TBB or OpenCV packages configured in your environment, you may need to clean the `TBBROOT` and `OpenCV_DIR` environment variables before running the `cmake` command, otherwise they won't be downloaded and the build may fail if incompatible versions were installed.
-
- If the CMake-based build script can not find and download the OpenCV package that is supported on your platform, or if you want to use a custom build of the OpenCV library, refer to the [Use Custom OpenCV Builds](#use-custom-opencv-builds-for-inference-engine) section for details.
-
- To build the Python API wrapper, use the `-DENABLE_PYTHON=ON` option. To specify an exact Python version, use the following options:
-```sh
-  -DPYTHON_EXECUTABLE=/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 \
-  -DPYTHON_LIBRARY=/Library/Frameworks/Python.framework/Versions/3.7/lib/libpython3.7m.dylib \
-  -DPYTHON_INCLUDE_DIR=/Library/Frameworks/Python.framework/Versions/3.7/include/python3.7m
-```
-
-## Use Custom OpenCV Builds for Inference Engine
-
-> **NOTE**: The recommended and tested version of OpenCV is 4.1. The minimum supported version is 3.4.0.  
-
-Required versions of OpenCV packages are downloaded automatically during the building Inference Engine library. If the build script can not find and download the OpenCV package that is supported on your platform, you can use one of the following options:
-
-* Download the most suitable version from the list of available pre-build packages from [https://download.01.org/opencv/2019/openvinotoolkit](https://download.01.org/opencv/2019/openvinotoolkit) from the `<release_version>/inference_engine` directory.
-
-* Use a system provided OpenCV package (e.g with running the `apt install libopencv-dev` command). The following modules must be enabled: `imgcodecs`, `videoio`, `highgui`.
-
-* Get the OpenCV package using a package manager: pip, conda, conan etc. The package must have the development components included (header files and CMake scripts).
-
-* Build OpenCV from source using the [build instructions](https://docs.opencv.org/master/df/d65/tutorial_table_of_content_introduction.html) on the OpenCV site. 
-  
-After you got the built OpenCV library, perform the following preparation steps before running the Inference Engine build:
-  
-1. Set the `OpenCV_DIR` environment variable to the directory where the `OpenCVConfig.cmake` file of you custom OpenCV build is located.
-2. Disable the package automatic downloading with using the `-DENABLE_OPENCV=OFF` option for CMake-based build script for Inference Engine.
-
-## (Optional) Additional Installation Steps for the Intel® Movidius™ Neural Compute Stick and Neural Compute Stick 2
-
-> **NOTE**: These steps are only required if you want to perform inference on Intel® Movidius™ Neural Compute Stick or the Intel® Neural Compute Stick 2 using the Inference Engine MYRIAD Plugin. See also [Intel® Neural Compute Stick 2 Get Started](https://software.intel.com/en-us/neural-compute-stick/get-started)
-
-### For Linux, Raspbian\* Stretch OS
-
-1. Add the current Linux user to the `users` group:
-```sh
-sudo usermod -a -G users "$(whoami)"
-```
-Log out and log in for it to take effect.
-
-2. To perform inference on Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2, install the USB rules as follows:
-```sh
-cat <<EOF > 97-myriad-usbboot.rules
-SUBSYSTEM=="usb", ATTRS{idProduct}=="2150", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
-SUBSYSTEM=="usb", ATTRS{idProduct}=="2485", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
-SUBSYSTEM=="usb", ATTRS{idProduct}=="f63b", ATTRS{idVendor}=="03e7", GROUP="users", MODE="0666", ENV{ID_MM_DEVICE_IGNORE}="1"
-EOF
-```
-```sh
-sudo cp 97-myriad-usbboot.rules /etc/udev/rules.d/
-```
-```sh
-sudo udevadm control --reload-rules
-```
-```sh
-sudo udevadm trigger
-```
-```sh
-sudo ldconfig
-```
-```sh
-rm 97-myriad-usbboot.rules
-```
-
-### For Windows
-
-For Intel® Movidius™ Neural Compute Stick and Intel® Neural Compute Stick 2, install the Movidius™ VSC driver:
-1. Go to the `<DLDT_ROOT_DIR>/inference-engine/thirdparty/movidius/MovidiusDriver` directory, where the `DLDT_ROOT_DIR` is the directory to which the DLDT repository was cloned. 
-2. Right click on the `Movidius_VSC_Device.inf` file and choose **Install** from the pop up menu.
-
-You have installed the driver for your Intel® Movidius™ Neural Compute Stick or Intel® Neural Compute Stick 2.
-
 ## Next Steps

 Congratulations, you have built the Inference Engine. To get started with the OpenVINO™ DLDT, proceed to the Get Started guides:
--- a/inference-engine/cmake/InitRHDecoder.cmake.in
+++ b/inference-engine/cmake/InitRHDecoder.cmake.in
@@ -0,0 +1,32 @@
+# Copyright (C) 2018-2019 Intel Corporation
+#
+# SPDX-License-Identifier: Apache-2.0
+#
+
+# module to locate RHDecoder libraries
+function (init_rh_decoder)
+    if (NOT IE_MAIN_SOURCE_DIR)
+        set(RH_Decoder ${IE_EXTERNAL_DIR}/@rh_decoder_version@)
+    endif()
+
+    set(RH_LIB_DIR libs/x64)
+    set(RH_LIB decoder_library)
+
+    if (WIN32)
+        set(RH_PLATFORM_DIR windows)
+    elseif (UNIX)
+        set(RH_PLATFORM_DIR linux)
+    else ()
+        message(FATAL_ERROR "RH Decoder not supported on this platform, only linux, and windows")
+    endif ()
+
+    find_library(RH_DECODER_LIBRARY
+            ${RH_LIB}
+            PATH ${RH_Decoder}/${RH_PLATFORM_DIR}/${RH_LIB_DIR}
+            NO_DEFAULT_PATH)
+
+    get_filename_component(cmake_style_path_to_rh "${RH_Decoder}/${RH_PLATFORM_DIR}/include" ABSOLUTE)
+
+    set(libRH_Decoder_INCLUDE_DIRS ${cmake_style_path_to_rh} PARENT_SCOPE)
+    set(libRH_Decoder_LIBRARIES  ${RH_DECODER_LIBRARY} PARENT_SCOPE)
+endfunction(init_rh_decoder)
--- a/inference-engine/cmake/check_features.cmake
+++ b/inference-engine/cmake/check_features.cmake
@@ -23,6 +23,7 @@ endif()
 #apple specific
 if (APPLE)
    set(ENABLE_GNA OFF)
+    set(ENABLE_ROCKHOPER OFF)
    set(ENABLE_CLDNN OFF)
 endif()

--- a/inference-engine/cmake/dependencies.cmake
+++ b/inference-engine/cmake/dependencies.cmake
@@ -83,7 +83,7 @@ if (THREADING STREQUAL "TBB" OR THREADING STREQUAL "TBB_AUTO")
                ENVIRONMENT "TBBROOT")
    else(APPLE)
        RESOLVE_DEPENDENCY(TBB
-                ARCHIVE_MAC "tbb2019_20190414_v1_mac.tgz"
+                ARCHIVE_MAC "tbb2019_20190414_mac.tgz"
                TARGET_PATH "${TEMP}/tbb"
                ENVIRONMENT "TBBROOT"
                VERSION_REGEX ".*_([a-z]*_([a-z0-9]+\\.)*[0-9]+).*")
@@ -156,6 +156,26 @@ if (ENABLE_GNA)
    debug_message(STATUS "gna=" ${GNA})
 endif()

+if (ENABLE_ROCKHOPER)
+    set(rh_decoder_version "Rockhopper_1.0.0.682")
+
+    set(INCLUDE_RH_DECODER "include(\"\$\{IE_ROOT_DIR\}/share/ie_rh_decoder.cmake\")")
+
+    RESOLVE_DEPENDENCY(RH_Decoder
+            ARCHIVE_UNIFIED "${rh_decoder_version}.zip"
+            TARGET_PATH "${TEMP}/${rh_decoder_version}"
+            VERSION_REGEX ".*_([0-9]+.[0-9]+.[0-9]+.[0-9]+).*")
+
+    configure_file(
+            "${PROJECT_SOURCE_DIR}/cmake/InitRHDecoder.cmake.in"
+            "${CMAKE_BINARY_DIR}/share/ie_rh_decoder.cmake"
+            @ONLY)
+
+    list(APPEND CMAKE_MODULE_PATH ${CMAKE_BINARY_DIR}/share)
+    # for inference engine in tree build - lets include this finder
+    include(ie_rh_decoder)
+endif()
+
 configure_file(
        "${PROJECT_SOURCE_DIR}/cmake/share/InferenceEngineConfig.cmake.in"
        "${CMAKE_BINARY_DIR}/share/InferenceEngineConfig.cmake"
--- a/inference-engine/cmake/features.cmake
+++ b/inference-engine/cmake/features.cmake
@@ -7,12 +7,12 @@ include (options)
 #these options are aimed to optimize build time on development system

 #backed targets
-ie_option (ENABLE_GNA "GNA support for inference engine" ON)
-ie_option (ENABLE_ROCKHOPER "use Rockhopper decoder for converting / output scores" ON)
+ie_option (ENABLE_GNA "GNA support for inference engine" OFF)
+ie_option (ENABLE_ROCKHOPER "use Rockhopper decoder for converting / output scores" OFF)

 ie_option (ENABLE_MKL_DNN "MKL-DNN plugin for inference engine" ON)

-ie_option (ENABLE_CLDNN "clDnn based plugin for inference engine" ON)
+ie_option (ENABLE_CLDNN "clDnn based plugin for inference engine" OFF)

 ie_option (ENABLE_CLDNN_TESTS "Enable clDNN unit tests" OFF)

@@ -37,15 +37,15 @@ if (NOT THREADING STREQUAL "TBB"
        AND NOT THREADING STREQUAL "TBB_AUTO"
        AND NOT THREADING STREQUAL "OMP"
        AND NOT THREADING STREQUAL "SEQ")
-    set (THREADING "TBB")
+    set (THREADING "OMP")
    message(STATUS "THREADING should be set to TBB, TBB_AUTO, OMP or SEQ. Default option is " ${THREADING})
 endif()
 set(THREADING "${THREADING}" CACHE STRING "Threading" FORCE)
 list (APPEND IE_OPTIONS THREADING)

-ie_option (ENABLE_VPU "vpu targeted plugins for inference engine" ON)
+ie_option (ENABLE_VPU "vpu targeted plugins for inference engine" OFF)

-ie_option (ENABLE_MYRIAD "myriad targeted plugin for inference engine" ON)
+ie_option (ENABLE_MYRIAD "myriad targeted plugin for inference engine" OFF)

 ie_option (ENABLE_MYRIAD_NO_BOOT "myriad plugin will skip device boot" OFF)

@@ -55,8 +55,6 @@ ie_option (ENABLE_GAPI_TESTS "tests for GAPI kernels" OFF)

 ie_option (GAPI_TEST_PERF "if GAPI unit tests should examine performance" OFF)

-ie_option (ENABLE_MYRIAD_MVNC_TESTS "functional and behavior tests for mvnc api" OFF)
-
 ie_option (ENABLE_SAMPLES "console samples are part of inference engine package" ON)

 ie_option (ENABLE_SAMPLES_CORE "console samples core library" ON)
@@ -67,8 +65,6 @@ ie_option (ENABLE_FUZZING "instrument build for fuzzing" OFF)

 ie_option (COVERAGE "enable code coverage" OFF)

-ie_option (ENABLE_STRESS_UNIT_TESTS "stress unit tests" OFF)
-
 ie_option (VERBOSE_BUILD "shows extra information about build" OFF)

 ie_option (ENABLE_UNSAFE_LOCATIONS "skip check for MD5 for dependency" OFF)
@@ -79,7 +75,9 @@ ie_option (ENABLE_SEGMENTATION_TESTS "segmentation tests" ON)

 ie_option (ENABLE_OBJECT_DETECTION_TESTS "object detection tests" ON)

-ie_option (ENABLE_OPENCV "enables OpenCV" ON)
+ie_option (ENABLE_DUMP "enables mode for dumping per layer information" OFF)
+
+ie_option (ENABLE_OPENCV "enables OpenCV" OFF)

 ie_option (OS_FOLDER "create OS dedicated folder in output" OFF)

@@ -119,7 +117,7 @@ else()
 endif()

 if (UNIX AND NOT APPLE AND CMAKE_VERSION VERSION_GREATER_EQUAL 3.10)
-    ie_option(ENABLE_CPPCHECK "Enable cppcheck during the build" ON)
+    ie_option(ENABLE_CPPCHECK "Enable cppcheck during the build" OFF)
 else()
    set(ENABLE_CPPCHECK OFF)
 endif()
--- a/inference-engine/cmake/share/InferenceEngineConfig.cmake.in
+++ b/inference-engine/cmake/share/InferenceEngineConfig.cmake.in
@@ -148,6 +148,7 @@ else()

        set(IE_EXTERNAL_DIR "${IE_ROOT_DIR}/external")
        include("${IE_ROOT_DIR}/share/ie_parallel.cmake")
+        @INCLUDE_RH_DECODER@

        add_subdirectory(${IE_SRC_DIR}/extension EXCLUDE_FROM_ALL ie_cpu_extension)
        add_library(IE::ie_cpu_extension ALIAS ie_cpu_extension)
--- a/inference-engine/cmake/vpu_dependencies.cmake
+++ b/inference-engine/cmake/vpu_dependencies.cmake
@@ -9,21 +9,21 @@ set(VPU_SUPPORTED_SOC ma2450 ma2x8x mv0262)
 #

 RESOLVE_DEPENDENCY(VPU_FIRMWARE_MA2450
-    ARCHIVE_UNIFIED firmware_ma2450_759W.zip
+    ARCHIVE_UNIFIED firmware_ma2450_784.zip
    TARGET_PATH "${TEMP}/vpu/firmware/ma2450"
    ENVIRONMENT "VPU_FIRMWARE_MA2450"
    FOLDER)
 debug_message(STATUS "ma2450=" ${VPU_FIRMWARE_MA2450})

 RESOLVE_DEPENDENCY(VPU_FIRMWARE_MV0262
-    ARCHIVE_UNIFIED firmware_mv0262_mdk_R9.8.zip
+    ARCHIVE_UNIFIED firmware_mv0262_784.zip
    TARGET_PATH "${TEMP}/vpu/firmware/mv0262"
    ENVIRONMENT "VPU_FIRMWARE_MV0262"
    FOLDER)
 debug_message(STATUS "mv0262=" ${VPU_FIRMWARE_MV0262})

 RESOLVE_DEPENDENCY(VPU_FIRMWARE_MA2X8X
-    ARCHIVE_UNIFIED firmware_ma2x8x_mdk_R9.8.zip
+    ARCHIVE_UNIFIED firmware_ma2x8x_784.zip
    TARGET_PATH "${TEMP}/vpu/firmware/ma2x8x"
    ENVIRONMENT "VPU_FIRMWARE_MA2X8X"
    FOLDER)
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pxd
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pxd
@@ -3,7 +3,7 @@ from .ie_api_impl_defs cimport Blob, TensorDesc

 from libcpp.string cimport string
 from libcpp.vector cimport vector
-from libcpp.memory cimport unique_ptr
+from libcpp.memory cimport unique_ptr, shared_ptr

 cdef class BlobBuffer:
    cdef Blob.Ptr ptr
@@ -62,3 +62,6 @@ cdef class LayersStatsMap(dict):
 cdef class IECore:
    cdef C.IECore impl
    cpdef ExecutableNetwork load_network(self, IENetwork network, str device_name, config = ?, int num_requests = ?)
+
+cdef class DataPtr:
+    cdef shared_ptr[C.Data] _ptr
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pyx
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api.pyx
@@ -6,7 +6,7 @@ from libcpp.string cimport string
 from libcpp.vector cimport vector
 from libcpp.pair cimport pair
 from libcpp.map cimport map
-from libcpp.memory cimport unique_ptr
+from libcpp.memory cimport unique_ptr, shared_ptr
 from libc.stdlib cimport malloc, free
 from libc.stdint cimport int64_t, uint8_t
 from libc.string cimport memcpy, strcpy
@@ -43,7 +43,9 @@ cdef c_map_to_dict(map[string, string] c_map):

 supported_precisions = ["FP32", "FP16", "Q78", "I32", "I16", "I8", "U32", "U16", "U8"]

-supported_layouts = ["NCHW", "NHWC", "OIHW", "C", "CHW", "HW", "NC", "CN", "BLOCKED", "NCDHW"]
+supported_layouts = {0: "ANY", 1: "NCHW", 2: "NHWC", 3: "NCDHW", 4: "NDHWC", 64: "OIHW", 95: "SCALAR", 96: "C",
+                     128: "CHW", 192: "HW", 193: "NC", 194: "CN", 200: "BLOCKED"}
+
 known_plugins = ['CPU', 'GPU', 'FPGA', 'MYRIAD', 'HETERO', 'HDDL', 'MULTI']

 ctypedef enum StatusCode:
@@ -64,7 +66,7 @@ ctypedef enum StatusCode:
 def get_version():
    return C.get_version().decode()

-cdef  class IECore:
+cdef class IECore:
    def __cinit__(self, xml_config_file: str = ""):
        self.impl = C.IECore(xml_config_file.encode())

@@ -123,7 +125,6 @@ cdef  class IECore:
    def get_config(self, device_name: str, config_name: str):
        return self.impl.getConfig(device_name.encode(), config_name.encode())

-
    @property
    def available_devices(self):
        cdef vector[string] c_devices = self.impl.getAvailableDevices()
@@ -132,6 +133,31 @@ cdef  class IECore:
    # TODO: Add import network functionality
    # TODO: Extend API for query config and attributes when it will be merged in C++ API

+cdef class DataPtr:
+    @property
+    def name(self):
+        return deref(self._ptr).getName().decode()
+    @property
+    def precision(self):
+        return deref(self._ptr).getPrecision().name().decode()
+    @precision.setter
+    def precision(self, precision):
+        if precision not in supported_precisions:
+            raise ValueError("Unsupported precision {}! List of supported precisions: {}".format(precision,
+                                                                                                 supported_precisions))
+        deref(self._ptr).setPrecision(C.Precision.FromStr(precision.encode()))
+
+    @property
+    def dims(self):
+        return deref(self._ptr).getDims()
+    @property
+    def layout(self):
+        return supported_layouts[deref(self._ptr).getLayout()]
+
+    @property
+    def initialized(self):
+        return deref(self._ptr).isInitialized()
+
 cdef class IENetLayer:
    @property
    def name(self):
@@ -141,6 +167,10 @@ cdef class IENetLayer:
        return self.impl.type.decode()
    @property
    def precision(self):
+        warnings.filterwarnings("always", category=DeprecationWarning)
+        warnings.warn("precision property of IENetLayer is deprecated. "
+                      "Please use precision property of DataPtr instead",
+                      DeprecationWarning)
        return self.impl.precision.decode()
    @property
    def affinity(self):
@@ -176,6 +206,10 @@ cdef class IENetLayer:
        return [int(i) for i in string_shape.split(' ')]
    @property
    def layout(self):
+        warnings.filterwarnings("always", category=DeprecationWarning)
+        warnings.warn("layout property of IENetLayer is deprecated. "
+                      "Please use layout property of DataPtr instead",
+                      DeprecationWarning)
        return self.impl.layout.decode()
    @affinity.setter
    def affinity(self, target_affinity):
@@ -188,6 +222,16 @@ cdef class IENetLayer:
    def precision(self, precision: str):
        self.impl.setPrecision(precision.upper().encode())

+    @property
+    def out_data(self):
+        cdef vector[shared_ptr[C.Data]] out_data = self.impl.getOutData()
+        data = []
+        cdef DataPtr data_ptr = DataPtr()
+        for d in out_data:
+            data_ptr._ptr = d
+            data.append(data_ptr)
+        return data
+
 cdef class InputInfo:
    @property
    def precision(self):
@@ -207,9 +251,9 @@ cdef class InputInfo:
        self.impl.setPrecision(precision.encode())
    @layout.setter
    def layout(self, layout):
-        if layout.upper() not in supported_layouts:
+        if layout.upper() not in supported_layouts.values():
            raise AttributeError(
-                "Unsupported layout {}! List of supported layouts: {}".format(layout, supported_layouts))
+                "Unsupported layout {}! List of supported layouts: {}".format(layout, supported_layouts.values()))
        self.impl.setLayout(layout.encode())

 cdef class OutputInfo:
@@ -296,7 +340,7 @@ cdef class InferRequest:
        self._py_callback = py_callback
        self._py_data = py_data
        self._py_callback_used = True
-        deref(self.impl).setCyCallback(<cb_type>self.user_callback, <void *>self)
+        deref(self.impl).setCyCallback(<cb_type> self.user_callback, <void *> self)

    cpdef BlobBuffer _get_blob_buffer(self, const string & blob_name):
        cdef BlobBuffer buffer = BlobBuffer()
@@ -393,15 +437,15 @@ cdef class LayersStatsMap(dict):
        self.net_impl.setStats(c_stats_map)

 cdef class IENetwork:
-    def __cinit__(self, model: [str, bytes] ="", weights: [str, bytes] ="", init_from_buffer: bool=False,
+    def __cinit__(self, model: [str, bytes] = "", weights: [str, bytes] = "", init_from_buffer: bool = False,
                  ngraph_compatibility: bool = False):
-        cdef char* xml_buffer = <char*>malloc(len(model))
-        cdef uint8_t* bin_buffer = <uint8_t *>malloc(len(weights))
+        cdef char*xml_buffer = <char*> malloc(len(model))
+        cdef uint8_t*bin_buffer = <uint8_t *> malloc(len(weights))
        cdef string model_
        cdef string weights_
        if init_from_buffer:
            strcpy(xml_buffer, model)
-            memcpy(bin_buffer, <uint8_t *>weights, len(weights))
+            memcpy(bin_buffer, <uint8_t *> weights, len(weights))
            self.impl = C.IENetwork()
            self.impl.load_from_buffer(xml_buffer, len(model), bin_buffer, len(weights))
        else:
@@ -482,7 +526,7 @@ cdef class IENetwork:

    @classmethod
    def from_ir(cls, model: str, weights: str):
-        warnings.filterwarnings("always",category=DeprecationWarning)
+        warnings.filterwarnings("always", category=DeprecationWarning)
        warnings.warn("from_ir() method of IENetwork is deprecated. "
                      "Please use IENetwork class constructor to create valid IENetwork instance",
                      DeprecationWarning)
@@ -549,7 +593,7 @@ cdef class IEPlugin:
        cdef map[string, string] c_config
        if num_requests < 0:
            raise ValueError("Incorrect number of requests specified: {}. Expected positive integer number "
-                               "or zero for auto detection".format(num_requests))
+                             "or zero for auto detection".format(num_requests))
        if config:
            for k, v in config.items():
                c_config[to_std_string(k)] = to_std_string(v)
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.cpp
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.cpp
@@ -396,6 +396,9 @@ void InferenceEnginePython::IENetLayer::setParams(const std::map<std::string, st
    layer_ptr->params = params_map;
 }

+std::vector<InferenceEngine::DataPtr> InferenceEnginePython::IENetLayer::getOutData() {
+    return layer_ptr->outData;
+}
 std::map<std::string, InferenceEngine::Blob::Ptr> InferenceEnginePython::IENetLayer::getWeights() {
    auto w_layer = std::dynamic_pointer_cast<InferenceEngine::WeightableLayer>(layer_ptr);
    // IF current layer is weightable gather weights and biases from casted WeightableLayer and all other blobs
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.hpp
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl.hpp
@@ -46,6 +46,7 @@ struct IENetLayer {
    std::map<std::string, InferenceEngine::Blob::Ptr> getWeights();

    void setPrecision(std::string precision);
+    std::vector<InferenceEngine::DataPtr> getOutData();
 };

 struct InputInfo {
--- a/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl_defs.pxd
+++ b/inference-engine/ie_bridges/python/src/openvino/inference_engine/ie_api_impl_defs.pxd
@@ -16,6 +16,14 @@ cdef extern from "<inference_engine.hpp>" namespace "InferenceEngine":
        SizeVector& getDims()
        const Precision& getPrecision() const

+    cdef cppclass Data:
+        const Precision getPrecision() const
+        void setPrecision(const Precision& precision) const
+        const SizeVector getDims()
+        const string& getName() const
+        const Layout getLayout() const
+        const bool isInitialized() const
+
    cdef cppclass Blob:
        ctypedef shared_ptr[Blob] Ptr
        const TensorDesc& getTensorDesc() const
@@ -23,6 +31,8 @@ cdef extern from "<inference_engine.hpp>" namespace "InferenceEngine":

    cdef cppclass Precision:
        const char*name() const
+        @staticmethod
+        const Precision FromStr(const string& str)

    cdef struct apiVersion:
        int minor
@@ -33,7 +43,11 @@ cdef extern from "<inference_engine.hpp>" namespace "InferenceEngine":
        const char *description
        apiVersion apiVersion

+    cdef enum Layout:
+        pass
+
 cdef extern from "ie_api_impl.hpp" namespace "InferenceEnginePython":
+
    cdef cppclass IENetLayer:
        string name
        string type
@@ -48,6 +62,7 @@ cdef extern from "ie_api_impl.hpp" namespace "InferenceEnginePython":
        void setParams(const map[string, string] & params_map) except +
        map[string, Blob.Ptr] getWeights() except +
        void setPrecision(string precision) except +
+        vector[shared_ptr[Data]] getOutData() except +

    cdef cppclass InputInfo:
        vector[size_t] dims
--- a/inference-engine/include/ie_api.h
+++ b/inference-engine/include/ie_api.h
@@ -76,11 +76,3 @@
 #define IE_SUPPRESS_DEPRECATED_START
 #define IE_SUPPRESS_DEPRECATED_END
 #endif
-
-#ifndef ENABLE_UNICODE_PATH_SUPPORT
-    #if defined(_WIN32)
-        #define ENABLE_UNICODE_PATH_SUPPORT
-    #elif defined(__GNUC__) && (__GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 2))
-        #define ENABLE_UNICODE_PATH_SUPPORT
-    #endif
-#endif
--- a/inference-engine/include/ie_layers.h
+++ b/inference-engine/include/ie_layers.h
@@ -139,16 +139,6 @@ public:
            return res;
        }
    }
-    /**
-      * @brief serialize float with c_locale formating
-      * used for default values serializing
-      */
-    static std::string ie_serialize_float(float value) {
-        std::stringstream val_stream;
-        val_stream.imbue(std::locale("C"));
-        val_stream << value;
-        return val_stream.str();
-    }

    /**
     * @brief Gets float value for the given parameter
@@ -157,7 +147,7 @@ public:
     * @return float value
     */
    float GetParamAsFloat(const char* param, float def) const {
-        std::string val = GetParamAsString(param, ie_serialize_float(def).c_str());
+        std::string val = GetParamAsString(param, std::to_string(def).c_str());
        try {
            return ie_parse_float(val);
        } catch (...) {
--- a/inference-engine/include/ie_parameter.hpp
+++ b/inference-engine/include/ie_parameter.hpp
@@ -35,9 +35,7 @@ public:
     * @brief Move constructor
     * @param parameter Parameter object
     */
-    Parameter(Parameter &&parameter) noexcept {
-        std::swap(ptr, parameter.ptr);
-    }
+    Parameter(Parameter &&parameter) noexcept: ptr(std::move(parameter.ptr)) {}

    /**
     * @brief Copy constructor
--- a/inference-engine/include/ie_precision.hpp
+++ b/inference-engine/include/ie_precision.hpp
@@ -152,6 +152,9 @@ public:
    operator Precision::ePrecision  () const noexcept {
        return precisionInfo.value;
    }
+    constexpr uint8_t getPrecVal() const noexcept {
+        return precisionInfo.value;
+    }

    /** @brief Getter of precision name */
    const char *name() const noexcept {
--- a/inference-engine/samples/benchmark_app/CMakeLists.txt
+++ b/inference-engine/samples/benchmark_app/CMakeLists.txt
@@ -8,5 +8,4 @@ file (GLOB HDR ${CMAKE_CURRENT_SOURCE_DIR}/*.hpp)
 ie_add_sample(NAME benchmark_app
              SOURCES ${SRC}
              HEADERS ${HDR}
-              DEPENDENCIES format_reader
-              OPENCV_DEPENDENCIES imgcodecs)
+              DEPENDENCIES format_reader)
--- a/inference-engine/samples/benchmark_app/README.md
+++ b/inference-engine/samples/benchmark_app/README.md
@@ -1,18 +1,21 @@
-# Benchmark C++ Tool
+# Benchmark C++ Application

-This topic demonstrates how to use the Benchmark C++ Tool to estimate deep learning inference performance on supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).
+This topic demonstrates how to use the Benchmark Application to estimate deep learning inference performance on
+supported devices. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).

-> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Tool. For the Python* implementation, refer to [Benchmark Python* Tool](./inference-engine/tools/benchmark_tool/README.md).
+> **NOTE:** This topic describes usage of C++ implementation of the Benchmark Application. For the Python* implementation, refer to [Benchmark Application (Python*)](./inference-engine/tools/benchmark_tool/README.md).


 ## How It Works

-Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter.
+Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine
+plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend
+on the mode defined with the `-api` command-line parameter.

-> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
+> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).

 If you run the application in the synchronous mode, it creates one infer request and executes the `Infer` method.
-If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq` command-line parameter and executes the `StartAsync` method for each of them. If `-nireq` is not set, the application will use the default value for specified device.
+If you run the application in the asynchronous mode, it creates as many infer requests as specified in the `-nireq` command-line parameter and executes the `StartAsync` method for each of them. If `-nireq` is not set, the demo will use the default value for specified device.

 A number of execution steps is defined by one of the following parameters:
 * Number of iterations specified with the `-niter` command-line argument
@@ -42,19 +45,17 @@ The application also saves executable graph information serialized to a XML file
 `-exec_graph_path` parameter.


-## Run the Tool
+## Running
 Notice that the benchmark_app usually produces optimal performance for any device out of the box.

-**So in most cases you don't need to play the app options explicitly and the plain device name is enough**, for example, for CPU:
-```sh
-./benchmark_app -m <model> -i <input> -d CPU
+**So in most cases you don't need to play the app options explicitly and the plain device name is enough**, e.g.:
+```
+$benchmark_app -m <model> -i <input> -d CPU
 ```

 But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md).

 As explained in the  [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md) section, for all devices, including new [MULTI device](./docs/IE_DG/supported_plugins/MULTI.md) it is preferable to use the FP16 IR for the model.
-Also if latency of the CPU inference on the multi-socket machines is of concern, please refer to the same
-[Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md) document.

 Running the application with the `-h` option yields the following usage message:
 ```
@@ -108,74 +109,48 @@ If a model has only image input(s), please a provide folder with images or a pat
 If a model has some specific input(s) (not images), please prepare a binary file(s), which is filled with data of appropriate precision and provide a path to them as input.
 If a model has mixed input types, input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one.

-To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](./tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).
+To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).

-> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
+> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).

-## Examples of Running the Tool
+For example, to perform inference on CPU in the synchronous mode and get estimated performance metrics for AlexNet model,
+run the following command:

-This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or FPGA devices. As an input, the `car.png` file from the `<INSTALL_DIR>/deployment_tools/demo/` directory is used.  
+```sh
+./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api sync
+```

-> **NOTE:** The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment.
+For the asynchronous mode:
+```sh
+./benchmark_app -i <path_to_image>/inputImage.bmp -m <path_to_model>/alexnet_fp32.xml -d CPU -api async
+```

-1. Download the model. Go to the the Model Downloader directory and run the `downloader.py` script with specifying the model name and directory to download the model to:
-   ```sh
-   cd <INSTAL_DIR>/deployment_tools/open_model_zoo/tools/downloader
-   ```
-   ```sh
-   python3 downloader.py --name googlenet-v1 -o <models_dir>
-   ```
-2. Convert the model to the Inference Engine IR format. Go to the Model Optimizer directory and run the `mo.py` script with specifying the path to the model, model format (which must be FP32 for CPU and FPG) and output directory to generate the IR files:
-   ```sh
-   cd <INSTALL_DIR>/deployment_tools/model_optimizer
-   ```
-   ```sh
-   python3 mo.py --input_model <models_dir>/public/googlenet-v1/googlenet-v1.caffemodel --data_type FP32 --output_dir <ir_dir>
-   ```     
-3. Run the tool with specifying the `<INSTALL_DIR>/deployment_tools/demo/car.png` file as an input image, the IR of the `googlenet-v1` model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and FPGA devices:
-   
-   * On CPU:
-   ```sh
-   ./benchmark_app -m <ir_dir>/googlenet-v1.xml -d CPU -api async -i <INSTALL_DIR>/deployment_tools/demo/car.png --progress true
-   ```
-   * On FPGA:
-   ```sh
-   ./benchmark_app -m <ir_dir>/googlenet-v1.xml -d HETERO:FPGA,CPU -api async -i <INSTALL_DIR>/deployment_tools/demo/car.png --progress true
-   ```
+## Demo Output

 The application outputs the number of executed iterations, total duration of execution, latency and throughput.
-Additionally, if you set the `-report_type` parameter, the application outputs statistics report. If you set the `-pc` parameter, the application outputs performance counters. If you set `-exec_graph_path`, the application reports executable graph information serialized. All measurements including per-layer PM counters are reported in milliseconds.
+Additionally, if you set the `-report_type` parameter, the application outputs statistics report.
+If you set the `-pc` parameter, the application outputs performance counters.
+If you set `-exec_graph_path`, the application reports executable graph information serialized.

-Below are fragments of sample output for CPU and FPGA devices: 
+```
+[Step 8/9] Measuring performance (Start inference asyncronously, 60000 ms duration, 4 inference requests in parallel using 4 streams)
+Progress: [....................] 100.00% done

-* For CPU:
-   ```
-   [Step 8/9] Measuring performance (Start inference asyncronously, 60000 ms duration, 4 inference requests in parallel using 4 streams)
-   Progress: [....................] 100.00% done
+[Step 9/9] Dumping statistics report
+[ INFO ] Statistics collecting was not requested. No reports are dumped.
+Progress: [....................] 100.00% done

-   [Step 9/9] Dumping statistics report
-   [ INFO ] Statistics collecting was not requested. No reports are dumped.
-   Progress: [....................] 100.00% done
+Count:      4612 iterations
+Duration:   60110.04 ms
+Latency:    50.99 ms
+Throughput: 76.73 FPS

-   Count:      4612 iterations
-   Duration:   60110.04 ms
-   Latency:    50.99 ms
-   Throughput: 76.73 FPS
-   ```
+```

-* For FPGA:
-   ```
-   [Step 10/11] Measuring performance (Start inference asynchronously, 5 inference requests using 4 streams for CPU, limits: 120000 ms duration)
-   Progress: [....................] 100% done
+All measurements including per-layer PM counters are reported in milliseconds.

-   [Step 11/11] Dumping statistics report
-   Count:      102515 iterations
-   Duration:   120007.38 ms
-   Latency:    5.84 ms
-   Throughput: 854.24 FP
-   ```

 ## See Also
 * [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md)
 * [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
-* [Model Downloader](./tools/downloader/README.md)
+* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader)
--- a/inference-engine/samples/thirdparty/gflags/.gitmodules
+++ b/inference-engine/samples/thirdparty/gflags/.gitmodules
@@ -0,0 +1,4 @@
+[submodule "doc"]
+	path = doc
+	url = https://github.com/gflags/gflags.git
+	branch = gh-pages
--- a/inference-engine/src/cldnn_engine/cldnn_graph.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_graph.cpp
@@ -639,17 +639,15 @@ void CLDNNGraph::GetPerformanceCounts(std::map<std::string, InferenceEngine::Inf
                        impl.copy(extPerfEntry.exec_type, impl.length());
                    }

-                    pi.type_id.copy(extPerfEntry.layer_type, 256);
+                    strncpy(extPerfEntry.layer_type, pi.type_id.c_str(), 256);
                    extPerfEntry.execution_index = i++;
                    extPerfEntry.status = InferenceEngineProfileInfo::LayerStatus::EXECUTED;
                    extPerfEntry.cpu_uSec = cpuTime;
                    extPerfEntry.realTime_uSec = deviceTime;

                    if (pi.type_id == "input_layout") {
-                        const std::string input_string = "Input";
-                        const std::string undef_string = "undef";
-                        input_string.copy(extPerfEntry.layer_type, 256);
-                        undef_string.copy(extPerfEntry.exec_type, 256);
+                        strncpy(extPerfEntry.layer_type, "Input", 256);
+                        strncpy(extPerfEntry.exec_type, "undef", 256);
                    }
                }
            }
--- a/inference-engine/src/cldnn_engine/cldnn_program.cpp
+++ b/inference-engine/src/cldnn_engine/cldnn_program.cpp
@@ -1491,6 +1491,7 @@ void Program::CreateBatchNormalizationPrimitive(cldnn::topology& topology, Infer
    auto scalePrim = cldnn::scale(bnLayerName, inputPrimitives[0], weightID, biasID);

    topology.add(scalePrim);
+    return;
 #else
    cldnn::tensor blobTensor(0);
    const auto bnDims = bnLayer->outData[0]->getTensorDesc().getDims();
--- a/inference-engine/src/dumper/CMakeLists.txt
+++ b/inference-engine/src/dumper/CMakeLists.txt
@@ -0,0 +1,30 @@
+# Copyright (C) 2018-2019 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+#
+
+set(TARGET_NAME "ieDumper")
+if(ENABLE_DUMP)
+    add_definitions(-DDEBUG_DUMP)
+endif()
+
+file(GLOB SOURCES
+    ${CMAKE_CURRENT_SOURCE_DIR}/*.cpp
+)
+
+file(GLOB HEADERS
+    ${CMAKE_CURRENT_SOURCE_DIR}/*.h
+)
+
+addVersionDefines(dump_plugin.cpp CI_BUILD_NUMBER)
+
+add_definitions(-DIMPLEMENT_INFERENCE_ENGINE_PLUGIN)
+
+include_directories(
+        ${IE_MAIN_SOURCE_DIR}/include
+        ${IE_MAIN_SOURCE_DIR}/src/inference_engine
+        ${CMAKE_CURRENT_SOURCE_DIR}
+)
+
+add_library(${TARGET_NAME} SHARED ${SOURCES} ${HEADERS})
+target_link_libraries(${TARGET_NAME} inference_engine boost_system boost_filesystem)
+set_target_properties(${TARGET_NAME} PROPERTIES COMPILE_PDB_NAME ${TARGET_NAME})
--- a/inference-engine/src/dumper/dump_plugin.cpp
+++ b/inference-engine/src/dumper/dump_plugin.cpp
@@ -0,0 +1,100 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "dump_plugin.h"
+#include "ie_plugin.hpp"
+#include <description_buffer.hpp>
+
+using namespace DumpPluginNs;
+using namespace InferenceEngine;
+
+template <typename T>
+void DumpPlugin::dumpBlobTmpl(const InferenceEngine::Blob::Ptr blob, std::ofstream& file) {
+    if (file.is_open()) {
+        auto tblob = std::dynamic_pointer_cast<const InferenceEngine::TBlob<T>>(blob);
+        if (tblob != nullptr) {
+            // TODO rewrite it
+            SizeVector v = tblob->dims();
+            file << "dims: " << "(";
+            for (auto i = v.begin(); i != v.end(); ++i)
+                file << *i;
+            file << ")" << std::endl;
+            // file << "dims: " << "(" << tblob->dims() << ")"<< std::endl;
+            for (auto it = tblob->begin(); it != tblob->end(); ++it) {
+                file << std::fixed << std::setprecision(3) << *it << std::endl;
+            }
+        }
+    }
+}
+
+void DumpPlugin::dumpBlob(const InferenceEngine::Blob::Ptr blob, std::ofstream& file) noexcept {
+    switch (blob->precision()) {
+        case (InferenceEngine::Precision::FP32):
+            dumpBlobTmpl<float>(blob, file);
+            break;
+        case (InferenceEngine::Precision::FP16):
+        case (InferenceEngine::Precision::Q78):
+        case (InferenceEngine::Precision::I16):
+            dumpBlobTmpl<short>(blob, file);
+            break;
+        case (InferenceEngine::Precision::U8):
+            dumpBlobTmpl<char>(blob, file);
+            break;
+    }
+}
+
+static Version dumpPluginDescription = {
+    {2, 1},  // plugin API version
+    CI_BUILD_NUMBER,
+    "ieDumpPlugin"  // plugin description message -
+};
+
+void DumpPlugin::GetVersion(const InferenceEngine::Version *& versionInfo) noexcept {
+    versionInfo = &dumpPluginDescription;
+}
+
+std::string DumpPlugin::GetDumpDir(std::string netname) noexcept {
+    static std::string dumpDir;
+
+    if (!dumpDir.empty()) {
+        return dumpDir;
+    }
+
+    const char * dump_only = getenv("DUMP_ONLY");
+    if (dump_only && netname.find(dump_only) == std::string::npos) {
+        dumpDir = "";
+    } else {
+        dumpDir = std::string(DEBUG_DUMP_PATH) + netname;
+        boost::filesystem::path dir(dumpDir);
+
+        if (!(boost::filesystem::exists(dir))) {
+            boost::filesystem::create_directories(dir);
+            dumpDir += "/";
+        } else {
+            int x = 1;
+            std::string dumpDirx;
+
+            do {
+                dumpDirx = dumpDir + "_" + std::to_string(x);
+                boost::filesystem::path dir2(dumpDirx);
+                x++;
+                if (!boost::filesystem::exists(dir2)) {
+                    boost::filesystem::create_directories(dir2);
+                    break;
+                }
+            } while (true);
+            dumpDir = dumpDirx + "/";
+        }
+    }
+    return dumpDir;
+}
+
+INFERENCE_PLUGIN_API(StatusCode) CreateDumpPlugin(IDumpPlugin*& plugin, ResponseDesc *resp) noexcept {
+    try {
+        plugin = new DumpPlugin();
+        return OK;
+    } catch (std::exception& ex) {
+        return DescriptionBuffer(GENERAL_ERROR, resp) << ex.what();
+    }
+}
--- a/inference-engine/src/dumper/dump_plugin.h
+++ b/inference-engine/src/dumper/dump_plugin.h
@@ -0,0 +1,40 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "ie_dump_plugin.hpp"
+#include <boost/filesystem.hpp>
+#include <iomanip>
+#include <fstream>
+#include <string>
+
+#define DEBUG_DUMP_PATH "dump/"
+
+namespace DumpPluginNs {
+
+using namespace boost::filesystem;
+
+class DumpPlugin : public IDumpPlugin {
+public:
+    DumpPlugin() {}
+
+    virtual ~DumpPlugin() {}
+
+    void GetVersion(const InferenceEngine::Version *& versionInfo) noexcept override;
+
+    void Release() noexcept override {
+        delete this;
+    }
+
+    void dumpBlob(const InferenceEngine::Blob::Ptr blob, std::ofstream& file) noexcept override;
+
+    std::string GetDumpDir(std::string networkName) noexcept override;
+
+private:
+    template <typename T>
+    void dumpBlobTmpl(const InferenceEngine::Blob::Ptr blob, std::ofstream& file);
+};
+
+}  // namespace DumpPluginNs
--- a/inference-engine/src/dumper/ie_dump_plugin.hpp
+++ b/inference-engine/src/dumper/ie_dump_plugin.hpp
@@ -0,0 +1,33 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#pragma once
+
+#include "details/ie_irelease.hpp"
+#include "ie_version.hpp"
+#include "ie_common.h"
+#include "ie_blob.h"
+#include <string>
+
+class IDumpPlugin : public InferenceEngine::details::IRelease {
+public:
+    /**
+     * @brief return plugin's version information
+     * @param versionInfo pointer to version info, will be set by plugin
+     */
+    virtual void GetVersion(const InferenceEngine::Version *& versionInfo) noexcept = 0;
+
+    virtual void dumpBlob(const InferenceEngine::Blob::Ptr blob, std::ofstream& file) noexcept = 0;
+
+    virtual std::string GetDumpDir(std::string networkName) noexcept = 0;
+};
+
+#ifdef DEBUG_DUMP
+#include "ie_dump_plugin_ptr.hpp"
+static DumpPluginPtr dumper("libieDumper.so");
+#define DUMP_BLOB(blob, file) \
+    dumper->dumpBlob(blob, file);
+#else
+#define DUMP_BLOB(blob, file)
+#endif  // DEBUG_DUMP
--- a/inference-engine/src/dumper/ie_dump_plugin_ptr.hpp
+++ b/inference-engine/src/dumper/ie_dump_plugin_ptr.hpp
@@ -0,0 +1,36 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+/**
+ * @brief Convinience wrapper class for handling plugin instanciation and releasing resources.
+ * @file ie_dump_plugin_ptr.hpp
+ */
+#pragma once
+
+#include "details/ie_so_pointer.hpp"
+#include "ie_dump_plugin.hpp"
+#include <string>
+
+namespace InferenceEngine {
+namespace details {
+
+template<>
+class SOCreatorTrait<IDumpPlugin> {
+public:
+    static constexpr auto name = "CreateDumpPlugin";
+};
+
+}  // namespace details
+
+
+}  // namespace InferenceEngine
+
+
+/**
+* @typedef DumpPluginPtr
+* @brief c++ helper to work with plugin's created objects, implements different interface
+*/
+typedef InferenceEngine::details::SOPointer<IDumpPlugin> DumpPluginPtr;
+
+
--- a/inference-engine/src/extension/ext_convert.cpp
+++ b/inference-engine/src/extension/ext_convert.cpp
@@ -0,0 +1,89 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "ext_list.hpp"
+#include "ext_base.hpp"
+
+#include <cmath>
+#include <string>
+#include <vector>
+#include "ie_parallel.hpp"
+#include "ie_precision.hpp"
+
+namespace InferenceEngine {
+namespace Extensions {
+namespace Cpu {
+
+class ConvertImpl: public ExtLayerBase {
+    template<typename src_d, typename dst_d>
+    void exec_cast(const Blob::CPtr& inputs, Blob::Ptr& outputs) {
+        const src_d *src_data = inputs->cbuffer().as<src_d *>() +
+                                inputs->getTensorDesc().getBlockingDesc().getOffsetPadding();
+        dst_d* dst_data = outputs->buffer().as<dst_d *>() +
+                          outputs->getTensorDesc().getBlockingDesc().getOffsetPadding();
+        if (inputs->size() != outputs->size())
+            THROW_IE_EXCEPTION << "Input and output buffers have different sizes!";
+        parallel_for(inputs->size(), [&](size_t i) {
+            dst_data[i] = static_cast<dst_d>(src_data[i]);
+        });
+    }
+
+public:
+    explicit ConvertImpl(const CNNLayer* layer) {
+        try {
+            if (layer->insData.size() != 1 || layer->outData.empty())
+                THROW_IE_EXCEPTION << "Incorrect number of input/output edges!";
+
+            precision = layer->GetParamAsString("precision");
+
+            addConfig(layer, {{ConfLayout::PLN, false, 0}}, {{ConfLayout::PLN, false, 0}});
+        } catch (InferenceEngine::details::InferenceEngineException &ex) {
+            errorMsg = ex.what();
+        }
+    }
+
+    StatusCode execute(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs,
+                       ResponseDesc *resp) noexcept override {
+        try {
+            auto compare = getPrecisionMask(inputs[0]->getTensorDesc().getPrecision(), outputs[0]->getTensorDesc().getPrecision());
+            switch (compare) {
+                case getPrecisionMask(Precision::I32, Precision::I32):
+                    exec_cast<PrecisionTrait<Precision::I32>::value_type, PrecisionTrait<Precision::I32>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::I64, Precision::I64):
+                    exec_cast<PrecisionTrait<Precision::I64>::value_type, PrecisionTrait<Precision::I64>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::FP32, Precision::FP32):
+                    exec_cast<PrecisionTrait<Precision::FP32>::value_type, PrecisionTrait<Precision::FP32>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::I32, Precision::I64):
+                    exec_cast<PrecisionTrait<Precision::I32>::value_type, PrecisionTrait<Precision::I64>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::I32, Precision::FP32):
+                    exec_cast<PrecisionTrait<Precision::I32>::value_type, PrecisionTrait<Precision::FP32>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::FP32, Precision::I32):
+                    exec_cast<PrecisionTrait<Precision::FP32>::value_type, PrecisionTrait<Precision::I32>::value_type>(inputs[0], outputs[0]);
+                    break;
+                case getPrecisionMask(Precision::FP32, Precision::I64):
+                    exec_cast<PrecisionTrait<Precision::FP32>::value_type, PrecisionTrait<Precision::I64>::value_type>(inputs[0], outputs[0]);
+                    break;
+                default:
+                    THROW_IE_EXCEPTION << "Unsupported precisions!";
+            }
+        } catch(...) {
+            return GENERAL_ERROR;
+        }
+        return OK;
+    }
+
+private:
+    std::string precision;
+};
+
+REG_FACTORY_FOR(ImplFactory<ConvertImpl>, Convert);
+
+}  // namespace Cpu
+}  // namespace Extensions
+}  // namespace InferenceEngine
--- a/inference-engine/src/extension/ext_detectionoutput_onnx.cpp
+++ b/inference-engine/src/extension/ext_detectionoutput_onnx.cpp
@@ -97,10 +97,10 @@ void refine_boxes(const float* boxes, const float* deltas, const float* weights,
            float y1_new = pred_ctr_y + 0.5f * pred_h - coordinates_offset;

            // adjust new corner locations to be within the image region,
-            x0_new = std::max<float>(0.0f, std::min<float>(x0_new, img_W - coordinates_offset));
-            y0_new = std::max<float>(0.0f, std::min<float>(y0_new, img_H - coordinates_offset));
-            x1_new = std::max<float>(0.0f, std::min<float>(x1_new, img_W - coordinates_offset));
-            y1_new = std::max<float>(0.0f, std::min<float>(y1_new, img_H - coordinates_offset));
+            x0_new = std::max<float>(0.0f, x0_new);
+            y0_new = std::max<float>(0.0f, y0_new);
+            x1_new = std::max<float>(0.0f, x1_new);
+            y1_new = std::max<float>(0.0f, y1_new);

            // recompute new width & height
            const float box_w = x1_new - x0_new + coordinates_offset;
@@ -268,7 +268,7 @@ public:

        auto* output_boxes = outputs[OUTPUT_BOXES]->buffer().as<float *>();
        auto* output_scores = outputs[OUTPUT_SCORES]->buffer().as<float *>();
-        auto* output_classes = outputs[OUTPUT_CLASSES]->buffer().as<float *>();
+        auto* output_classes = outputs[OUTPUT_CLASSES]->buffer().as<int32_t *>();

        const float img_H = im_info[0];
        const float img_W = im_info[1];
@@ -334,9 +334,9 @@ public:
        }

        // Fill outputs.
-        memset(output_boxes, 0, max_detections_per_image_ * 4 * sizeof(float));
-        memset(output_scores, 0, max_detections_per_image_ * sizeof(float));
-        memset(output_classes, 0, max_detections_per_image_ * sizeof(float));
+        memset(output_boxes, 0, max_detections_per_image_ * 4 * sizeof(output_boxes[0]));
+        memset(output_scores, 0, max_detections_per_image_ * sizeof(output_scores[0]));
+        memset(output_classes, 0, max_detections_per_image_ * sizeof(output_classes[0]));

        int i = 0;
        for (const auto & detection : conf_index_class_map) {
@@ -348,7 +348,7 @@ public:
            output_boxes[4 * i + 2] = refined_boxes[refined_box_idx({cls, idx, 2})];
            output_boxes[4 * i + 3] = refined_boxes[refined_box_idx({cls, idx, 3})];
            output_scores[i] = score;
-            output_classes[i] = static_cast<float>(cls);
+            output_classes[i] = cls;
            ++i;
        }

--- a/inference-engine/src/extension/ext_reduce.cpp
+++ b/inference-engine/src/extension/ext_reduce.cpp
@@ -31,8 +31,9 @@ public:
            if (idx_dims.size() > 1)
                THROW_IE_EXCEPTION << layer->name << " Index vector should be 1 dimension";

-            if (layer->insData[REDUCE_DATA].lock()->getTensorDesc().getPrecision() != Precision::FP32)
-                THROW_IE_EXCEPTION << layer->name << " Incorrect input data tensor precision. Only FP32 is supported!";
+            if (layer->insData[REDUCE_DATA].lock()->getTensorDesc().getPrecision() != Precision::FP32 &&
+                layer->insData[REDUCE_DATA].lock()->getTensorDesc().getPrecision() != Precision::I32)
+                THROW_IE_EXCEPTION << layer->name << " Incorrect input data tensor precision. Only FP32 or I32 are supported!";

            if (layer->insData[REDUCE_INDEXES].lock()->getTensorDesc().getPrecision() != Precision::I32)
                THROW_IE_EXCEPTION << layer->name << " Incorrect 'axes_to_reduction' input precision. Only I32 is supported!";
@@ -125,109 +126,38 @@ public:
            }
        }

-        const float *src_data = inputs[REDUCE_DATA]->cbuffer().as<float *>() +
-            inputs[REDUCE_DATA]->getTensorDesc().getBlockingDesc().getOffsetPadding();
-        float* dst_data = outputs[0]->cbuffer().as<float *>() +
-            outputs[0]->getTensorDesc().getBlockingDesc().getOffsetPadding();
-
        size_t work_amount_dst;
        if (!dst_dims.size())
            work_amount_dst = 1;
        else
            work_amount_dst = outputs[0]->getTensorDesc().getBlockingDesc().getStrides()[0] * dst_dims[0];

-        switch (reduceMode) {
-        case Reduce::And:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 1.0f,
-                   [](float x, float y)->float { return x && y; },
-                   [](float x, float y)->float { return x && y; });
-            break;
-        case Reduce::L1:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float old, float y)->float { return old + (std::abs)(y); },
-                [](float x, float y)->float { return x + y; });
-            break;
-        case Reduce::L2:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float old, float y)->float { return old + y * y;},
-                [](float x, float y)->float { return x + y; });
-
-            parallel_for(work_amount_dst, [&](size_t i) {
-                dst_data[i] = sqrt(dst_data[i]);
-            });
-            break;
-        case Reduce::LogSum:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float x, float y)->float { return x + y; },
-                [](float x, float y)->float { return x + y; });
-
-            parallel_for(work_amount_dst, [&](size_t i) {
-                dst_data[i] = logf(dst_data[i]);
-            });
-            break;
-        case Reduce::LogSumExp:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float old, float y)->float { return old + expf(y); },
-                [](float x, float y)->float { return x + y; });
-
-            parallel_for(work_amount_dst, [&](size_t i) {
-                dst_data[i] = logf(dst_data[i]);
-            });
-            break;
-        case Reduce::Max:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, FLT_MIN,
-                [](float x, float y)->float { return x > y ? x : y; },
-                [](float x, float y)->float { return x > y ? x : y; });
-            break;
-        case Reduce::Mean:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float x, float y)->float { return x + y; },
-                [](float x, float y)->float { return x + y; });
-
-            parallel_for(work_amount_dst, [&](size_t i) {
-                dst_data[i] /= static_cast<float>(reduced_dims_work_amount);
-            });
-            break;
-        case Reduce::Min:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, FLT_MAX,
-                [](float x, float y)->float { return x < y ? x : y; },
-                [](float x, float y)->float { return x < y ? x : y; });
-            break;
-        case Reduce::Or:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                   [](float x, float y)->float { return x || y; },
-                   [](float x, float y)->float { return x || y; });
-            break;
-        case Reduce::Prod:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 1.0f,
-                [](float x, float y)->float { return x * y; },
-                [](float x, float y)->float { return x * y; });
-            break;
-        case Reduce::Sum:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float x, float y)->float { return x + y; },
-                [](float x, float y)->float { return x + y; });
-            break;
-        case Reduce::SumSquare:
-            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, 0.0f,
-                [](float old, float y)->float { return old + y * y; },
-                [](float x, float y)->float { return x + y; });
-            break;
-        default:
-            if (resp) {
-                std::string errorMsg = "Incorrect Reduce layer type";
-                errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
-            }
-            return GENERAL_ERROR;
+        auto compare = getPrecisionMask(inputs[REDUCE_DATA]->getTensorDesc().getPrecision(), outputs[0]->getTensorDesc().getPrecision());
+        switch (compare) {
+            case getPrecisionMask(Precision::FP32, Precision::FP32):
+                return reduce_type<float , float>(inputs, outputs, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims);
+            case getPrecisionMask(Precision::I32, Precision::I64):
+                return reduce_type<int32_t , int64_t>(inputs, outputs, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims);
+            case getPrecisionMask(Precision::I32, Precision::FP32):
+                return reduce_type<int32_t , float>(inputs, outputs, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims);
+            case getPrecisionMask(Precision::I32, Precision::I32):
+                return reduce_type<int32_t , int32_t>(inputs, outputs, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims);
+            default:
+                if (resp) {
+                    std::string errorMsg = "Incorrect Reduce layer type";
+                    errorMsg.copy(resp->msg, sizeof(resp->msg) - 1);
+                }
+                return GENERAL_ERROR;
        }
-        return OK;
    }

 private:
-    template <typename F1, typename F2>
-    void reduce(const float *src_data, float* dst_data, size_t work_amount_dst, size_t reduced_dims_work_amount,
-        SizeVector axes_for_reduction, SizeVector dst_dims, float init_value, F1 func1, F2 func2);
-
+    template <typename src_d, typename dst_t, typename F1, typename F2>
+    void reduce(const src_d *src_data, dst_t* dst_data, size_t work_amount_dst, size_t reduced_dims_work_amount,
+        SizeVector axes_for_reduction, SizeVector dst_dims, dst_t init_value, F1 func1, F2 func2);
+    template <typename src_d, typename dst_t>
+    StatusCode reduce_type(std::vector<Blob::Ptr>& inputs, std::vector<Blob::Ptr>& outputs, size_t work_amount_dst, size_t reduced_dims_work_amount,
+                SizeVector axes_for_reduction, SizeVector dst_dims);
    enum class Reduce { And, L1, L2, LogSum, LogSumExp, Max, Mean, Min, Or, Prod, Sum, SumSquare };

    const size_t REDUCE_DATA = 0;
@@ -240,15 +170,114 @@ private:
    SizeVector srcStrides;
 };

-template <typename F1, typename F2>
+template <typename src_d, typename dst_t>
+StatusCode ReduceImpl::reduce_type(
+        std::vector<Blob::Ptr>& inputs,
+        std::vector<Blob::Ptr>& outputs,
+        size_t       work_amount_dst,
+        size_t       reduced_dims_work_amount,
+        SizeVector   axes_for_reduction,
+        SizeVector   our_dims
+) {
+    const src_d *src_data = inputs[REDUCE_DATA]->cbuffer().as<src_d *>() +
+                            inputs[REDUCE_DATA]->getTensorDesc().getBlockingDesc().getOffsetPadding();
+    dst_t* dst_data = outputs[0]->cbuffer().as<dst_t *>() +
+                      outputs[0]->getTensorDesc().getBlockingDesc().getOffsetPadding();
+
+    switch (reduceMode) {
+        case Reduce::And:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(1),
+                   [](dst_t x, src_d y)->dst_t { return x && y; },
+                   [](dst_t x, src_d y)->dst_t { return x && y; });
+            break;
+        case Reduce::L1:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t old, src_d y)->dst_t { return old + (std::abs)(y); },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+            break;
+        case Reduce::L2:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t old, src_d y)->dst_t { return old + y * y;},
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+
+            parallel_for(work_amount_dst, [&](size_t i) {
+                dst_data[i] = sqrt(dst_data[i]);
+            });
+            break;
+        case Reduce::LogSum:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t x, src_d y)->dst_t { return x + y; },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+
+            parallel_for(work_amount_dst, [&](size_t i) {
+                dst_data[i] = logf(dst_data[i]);
+            });
+            break;
+        case Reduce::LogSumExp:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t old, src_d y)->dst_t { return old + expf(y); },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+
+            parallel_for(work_amount_dst, [&](size_t i) {
+                dst_data[i] = logf(dst_data[i]);
+            });
+            break;
+        case Reduce::Max:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims,
+                                 (std::numeric_limits<dst_t>::min)(),
+                   [](dst_t x, src_d y)->dst_t { return x > y ? x : y; },
+                   [](dst_t x, src_d y)->dst_t { return x > y ? x : y; });
+            break;
+        case Reduce::Mean:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t x, src_d y)->dst_t { return x + y; },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+
+            parallel_for(work_amount_dst, [&](size_t i) {
+                dst_data[i] /= static_cast<dst_t>(reduced_dims_work_amount);
+            });
+            break;
+        case Reduce::Min:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims,
+                                 (std::numeric_limits<dst_t>::max)(),
+                   [](dst_t x, src_d y)->dst_t { return x < y ? x : y; },
+                   [](dst_t x, src_d y)->dst_t { return x < y ? x : y; });
+            break;
+        case Reduce::Or:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t x, src_d y)->dst_t { return x || y; },
+                   [](dst_t x, src_d y)->dst_t { return x || y; });
+            break;
+        case Reduce::Prod:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(1),
+                   [](dst_t x, src_d y)->dst_t { return x * y; },
+                   [](dst_t x, src_d y)->dst_t { return x * y; });
+            break;
+        case Reduce::Sum:
+            reduce(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t x, src_d y)->dst_t { return x + y; },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+            break;
+        case Reduce::SumSquare:
+            reduce<src_d, dst_t>(src_data, dst_data, work_amount_dst, reduced_dims_work_amount, axes_for_reduction, our_dims, static_cast<dst_t>(0),
+                   [](dst_t old, src_d y)->dst_t { return old + y * y; },
+                   [](dst_t x, src_d y)->dst_t { return x + y; });
+            break;
+        default:
+            return GENERAL_ERROR;
+    }
+    return OK;
+}
+
+template <typename src_d, typename dst_t, typename F1, typename F2>
 void ReduceImpl::reduce(
-    const float *src_data,
-    float       *dst_data,
+    const src_d *src_data,
+    dst_t       *dst_data,
    size_t       work_amount_dst,
    size_t       reduced_dims_work_amount,
    SizeVector   axes_for_reduction,
    SizeVector   dst_dims,
-    float        init_value,
+    dst_t        init_value,
    F1           func1,
    F2           func2
 ) {
@@ -264,7 +293,7 @@ void ReduceImpl::reduce(
                i /= dst_dims[j];
            }
            for (size_t src_idx, dst_idx = start; dst_idx < end; ++dst_idx) {
-                float reduce_prod = init_value;
+                dst_t reduce_prod = init_value;
                bool update_idx = true;
                SizeVector src_counters = dst_counters;
                for (i = 0; i < reduced_dims_work_amount; ++i) {
@@ -297,7 +326,7 @@ void ReduceImpl::reduce(
            }
        });
    } else {
-        std::vector<float> reduce_prod((nthr * work_amount_dst), init_value);
+        std::vector<dst_t> reduce_prod((nthr * work_amount_dst), init_value);
        if (work_amount_dst == 1) {
            parallel_nt(nthr, [&](const int ithr, const int nthr) {
                size_t i, start = 0, end = 0;
--- a/inference-engine/src/gna_plugin/gna_pass_manager.cpp
+++ b/inference-engine/src/gna_plugin/gna_pass_manager.cpp
@@ -897,6 +897,7 @@ void SubstituteScaleShiftBroadCastPass::run() {
 }

 void UnrollLSTMCellPass::run() {
+    // TODO: iefode: refactor this code
    InferenceEngine::NetPass::UnrollRNN_if(*getPassManager()->getNetwork(), [] (const RNNCellBase& rnn) -> bool {
        if (rnn.clip != 0.0f)
            return true;
--- a/inference-engine/src/inference_engine/CMakeLists.txt
+++ b/inference-engine/src/inference_engine/CMakeLists.txt
@@ -111,7 +111,8 @@ endif()

 # Properties->C/C++->General->Additional Include Directories
 target_include_directories(${TARGET_NAME} PUBLIC ${PUBLIC_HEADERS_DIR}
-                                          PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}")
+                                          PRIVATE "${CMAKE_CURRENT_SOURCE_DIR}"
+                                                  "${IE_MAIN_SOURCE_DIR}/src/dumper")

 target_include_directories(${TARGET_NAME} SYSTEM PRIVATE "${IE_MAIN_SOURCE_DIR}/thirdparty/pugixml/src")
 target_include_directories(${TARGET_NAME} SYSTEM PRIVATE "${IE_MAIN_SOURCE_DIR}/thirdparty/ngraph/src")
--- a/inference-engine/src/inference_engine/cnn_network_int8_normalizer.cpp
+++ b/inference-engine/src/inference_engine/cnn_network_int8_normalizer.cpp
@@ -224,6 +224,24 @@ CNNLayer::Ptr CNNStatisticHelper::getLatestInFuse(CNNLayer::Ptr layer) const {
 void CNNStatisticHelper::NormalizeStatistic() {
    StatsMap newMap;

+    // In case when we have statistics in negative range when min clamped value is 0,
+    // we are changing statistics here to non negative. This is not fully correct behaviour since
+    // it can extend range and affect accuracy, but this approach works quite well
+    std::vector<CNNLayerPtr> sortedLayersRC = CNNNetSortTopologically(network_);
+    for (auto l : sortedLayersRC) {
+        if (CNNNetworkInt8Normalizer::isReLULikeClamp(l)) {
+            if (l->outData.size() == 1) {
+                size_t outputChannels = l->outData[0]->getTensorDesc().getDims()[1];
+                auto oldStat = internalNodesStats_.find(l->name);
+                if ((oldStat != internalNodesStats_.end()) && outputChannels > 1) {
+                    for (size_t q = 0; q < oldStat->second->_minOutputs.size(); q++) {
+                        oldStat->second->_minOutputs[q] = 0.f;
+                    }
+                }
+            }
+        }
+    }
+
    float dummy = 0.0f;

    std::vector<CNNLayerPtr> sortedLayers = CNNNetSortTopologically(network_);
@@ -380,7 +398,8 @@ void CNNStatisticHelper::NormalizeStatistic() {


            if (l->outData.size() == 1) {
-                size_t outputChannels = l->outData[0]->getTensorDesc().getDims()[1];
+                size_t ch_indx = l->outData[0]->getTensorDesc().getDims().size() > 1 ? 1 : 0;
+                size_t outputChannels = l->outData[0]->getTensorDesc().getDims()[ch_indx];
                auto oldStat = internalNodesStats_.find(l->name);
                if ((oldStat != internalNodesStats_.end()) && outputChannels > 1 && oldStat->second->_minOutputs.size() == 1) {
                    auto min = oldStat->second->_minOutputs[0];
@@ -926,14 +945,80 @@ void CNNNetworkInt8Normalizer::QuantizeConvolutionOrFullyConnected(CNNLayer::Ptr
        size_t outChannelSize = weights->getTensorDesc().getDims().back() / W_CO / group;

        // Calculating weights normalization scale factor (w-scale)
-        float *weight_convolution;
-        size_t co;
-        for (co = 0, weight_convolution = &newWeights[0]; co < outputChannels; co++, weight_convolution += outChannelSize) {
-            float max = FLT_MIN;
-            DataStats::GetDataAbsMax(weight_convolution, outChannelSize, max);

-            float scaler = static_cast<float>(statHelper.getMaxSignValue())/ max;
-            weightScalers.push_back(scaler);
+        std::set<double> individualsG;
+        size_t co;
+        float *weight_convolution;
+        bool bwquantized = false;
+        double symQuant = 0.f;
+
+        for (co = 0, weight_convolution = &newWeights[0]; co < outputChannels; co++, weight_convolution += outChannelSize) {
+            for (size_t i = 0; i < outChannelSize; i++) {
+                individualsG.insert(static_cast<double>(weight_convolution[i]));
+            }
+        }
+        // If we have 256 quantums for all filters in convolution, it can be already int8 quantized weights
+        // We can support symmetric quantization
+        // Below conditions verify if weights are symmetric quantized around 0, what are min/max borders
+        // These parameters are required to repeat exactly the same quantum as model was trained
+        // The algorithm of restoring min/max parameters has couple assumptions which might not work for 100%
+        // cases. We want to explicitly define them. We assume that
+        // 1. All convolutions have 1st quantum either from positive or negative side. See how we calculate symQuant
+        // 2. If quantization is not symmetric, there should be quant on one of the side which demonstrate this
+        if (individualsG.size() < 256) {
+            // going over weights and verify that weights stay on quant positions
+            std::set<double> intervals;
+            double prev = 0.f;
+            for (auto it = individualsG.begin(); it != individualsG.end(); it++) {
+                if (prev) {
+                  intervals.insert(*it - prev);
+                }
+                prev = *it;
+            }
+            symQuant = *(intervals.begin());
+            std::set<double> divs;
+            prev = 0.f;
+            for (auto it = individualsG.begin(); it != individualsG.end(); it++) {
+                if (prev) {
+                    divs.insert((*it - prev)/ symQuant);
+                }
+                prev = *it;
+            }
+
+            bwquantized = true;
+            for (auto it3 = divs.begin(); it3 != divs.end(); it3++) {
+                if (fabs(round(*it3) - *it3) > 0.001) {
+                    bwquantized = false;
+                }
+            }
+
+            // we want to make sure that quantization is symmetric. this way we are looking for the
+            // value in weights matching to the quant (positive or negative
+            if (bwquantized) {
+                // take the minimal and maximum values on calculated symQuant and compare with data from individuals
+                double minCalc = symQuant * -128.0f;
+                double maxCalc = symQuant * 128.0f;
+                for (auto it = individualsG.begin(); it != individualsG.end(); it++) {
+                    if (*it < minCalc || *it > maxCalc) {
+                        bwquantized = false;
+                    }
+                }
+            }
+        }
+        if (bwquantized && symQuant != 0.0f) {
+            float max = symQuant * 127.0f;
+            for (co = 0, weight_convolution = &newWeights[0]; co < outputChannels; co++, weight_convolution += outChannelSize) {
+                float scaler = static_cast<float>(statHelper.getMaxSignValue())/ max;
+                weightScalers.push_back(scaler);
+            }
+        } else {
+            for (co = 0, weight_convolution = &newWeights[0]; co < outputChannels; co++, weight_convolution += outChannelSize) {
+                float max = FLT_MIN;
+                DataStats::GetDataAbsMax(weight_convolution, outChannelSize, max);
+
+                float scaler = static_cast<float>(statHelper.getMaxSignValue())/ max;
+                weightScalers.push_back(scaler);
+            }
        }

        std::shared_ptr<Data> wScaleData = std::shared_ptr<Data>(new Data("w-scale", { outputChannels }, Precision::FP32, Layout::C));
--- a/inference-engine/src/inference_engine/ie_format_parser.cpp
+++ b/inference-engine/src/inference_engine/ie_format_parser.cpp
@@ -10,6 +10,7 @@
 #include "ie_blob_proxy.hpp"
 #include <fstream>
 #include <sstream>
+#include "ie_dump_plugin.hpp"
 #include "ie_icnn_network_stats.hpp"

 using namespace InferenceEngine;
@@ -485,6 +486,16 @@ Blob::Ptr FormatParser::GetBlobFromSegment(const TBlob<uint8_t>::Ptr& weights, c
 }

 void FormatParser::SetWeights(const TBlob<uint8_t>::Ptr& weights) {
+#ifdef DEBUG_DUMP
+    char netname[1024] = {};
+    _network->getName(netname, sizeof(netname));
+    std::string netname_s = netname;
+    std::string dumpDir = dumper->GetDumpDir(netname_s);
+
+    std::ofstream weights_data_file(dumpDir + "blob0.txt"/*weights*/);
+    std::ofstream biases_data_file(dumpDir + "blob1.txt"/*biases*/);
+#endif
+
    for (auto& kvp : _network->allLayers()) {
        auto fit = layersParseInfo.find(kvp.second->name);
        // todo: may check that earlier - while parsing...
@@ -508,10 +519,12 @@ void FormatParser::SetWeights(const TBlob<uint8_t>::Ptr& weights) {
                    pWL->_weights = GetBlobFromSegment(weights, lprms.blobs["weights"]);
                }
                pWL->blobs["weights"] = pWL->_weights;
+                DUMP_BLOB(pWL->_weights, weights_data_file);
            }
            if (lprms.blobs.find("biases") != lprms.blobs.end()) {
                pWL->_biases  = GetBlobFromSegment(weights, lprms.blobs["biases"]);
                pWL->blobs["biases"] = pWL->_biases;
+                DUMP_BLOB(pWL->_biases, biases_data_file);
            }
        }
        auto pGL = kvp.second.get();
--- a/inference-engine/src/inference_engine/net_pass.cpp
+++ b/inference-engine/src/inference_engine/net_pass.cpp
@@ -635,9 +635,9 @@ static CNNLayerPtr _pwr(std::string name, Precision prc, SizeVector dims, float
    res->power = 1.0;
    res->scale = scale;
    res->offset = shift;
-    res->params["power"] = CNNLayer::ie_serialize_float(res->power);
-    res->params["scale"] = CNNLayer::ie_serialize_float(res->scale);
-    res->params["shift"] = CNNLayer::ie_serialize_float(res->offset);
+    res->params["power"] = std::to_string(res->power);
+    res->params["scale"] = std::to_string(res->scale);
+    res->params["shift"] = std::to_string(res->offset);

    res->insData.resize(1);
    res->outData.resize(1);
@@ -747,8 +747,8 @@ static void _link_with_clip(CNNLayerPtr src, CNNLayerPtr dst, const float clip_v
        auto clip_prc = dst->precision;
        auto clip_shape = src->outData[src_port]->getTensorDesc().getDims();
        auto clip = _act(clip_name, clip_prc, clip_shape, "clamp");
-        clip->params["min"] = CNNLayer::ie_serialize_float(-clip_val);
-        clip->params["max"] = CNNLayer::ie_serialize_float(clip_val);
+        clip->params["min"] = std::to_string(-clip_val);
+        clip->params["max"] = std::to_string(clip_val);

        _link(src, clip, src_port, 0);
        _link(clip, dst, 0, dst_port);
--- a/inference-engine/src/inference_engine/shape_infer/const_infer/broadcast_offset.hpp
+++ b/inference-engine/src/inference_engine/shape_infer/const_infer/broadcast_offset.hpp
@@ -16,56 +16,57 @@
 namespace InferenceEngine {
 namespace ShapeInfer {
 class BroadcastOffset {
-        SizeVector dims;
-        SizeVector offset_v;
+    SizeVector dims;
+    SizeVector offset_v;

-        SizeVector getDims(const SizeVector& originDims, const SizeVector& outputDims) {
-            SizeVector d(outputDims.size(), 1);
-            for (int i = 0; i < originDims.size(); i++) {
-                d[d.size() - 1 - i] = originDims[originDims.size() - 1 - i];
-            }
-            return d;
+    SizeVector getDims(const SizeVector &originDims, const SizeVector &outputDims) {
+        SizeVector d(outputDims.size(), 1);
+        for (int i = 0; i < originDims.size(); i++) {
+            d[d.size() - 1 - i] = originDims[originDims.size() - 1 - i];
        }
+        return d;
+    }

-        SizeVector getOffset(const SizeVector& originDims, const SizeVector& outDims) {
-            SizeVector o(originDims.size());
-            if (originDims.size() != outDims.size())
-                THROW_IE_EXCEPTION << "Cannot calculate offsets! Incorrect patameters for eltwise broadcast!";
-            int k = 1;
-            for (int i = originDims.size() - 1; i >= 0; i--) {
-                o[i] = (originDims[i] == outDims[i]) ? k : 0;
-                k *= originDims[i];
-            }
-            return o;
+    SizeVector getOffset(const SizeVector &originDims, const SizeVector &outDims) {
+        SizeVector o(originDims.size());
+        if (originDims.size() != outDims.size())
+            THROW_IE_EXCEPTION << "Cannot calculate offsets! Incorrect patameters for eltwise broadcast!";
+        int k = 1;
+        for (int i = originDims.size() - 1; i >= 0; i--) {
+            o[i] = (originDims[i] == outDims[i]) ? k : 0;
+            k *= originDims[i];
        }
+        return o;
+    }

-    public:
-        BroadcastOffset(const SizeVector& originDims, const SizeVector& outputDims) {
-            dims = getDims(originDims, outputDims);
-            offset_v = getOffset(dims, outputDims);
-        }
+public:
+    BroadcastOffset(const SizeVector &originDims, const SizeVector &outputDims) {
+        dims = getDims(originDims, outputDims);
+        offset_v = getOffset(dims, outputDims);
+    }

-        size_t offset(const SizeVector& v) const {
-            size_t off = 0;
-            if (v.size() != offset_v.size())
-                THROW_IE_EXCEPTION << "Cannot calculate offsets! Incorrect patameters for eltwise broadcast!";
-            for (size_t i = 0; i < v.size(); i++) {
-                off += v[i] * offset_v[i];
-            }
-            return off;
+    size_t offset(const SizeVector &v) const {
+        size_t off = 0;
+        if (v.size() != offset_v.size())
+            THROW_IE_EXCEPTION << "Cannot calculate offsets! Incorrect patameters for eltwise broadcast!";
+        for (size_t i = 0; i < v.size(); i++) {
+            off += v[i] * offset_v[i];
        }
+        return off;
+    }

-        SizeVector offset_dims(size_t l) const {
-            size_t n_dims = dims.size();
-            SizeVector pos(n_dims);
-            for (int rd = 1; rd <= n_dims; ++rd) {
-                const size_t d = n_dims - rd;
-                const size_t cur_dim = dims[d];
-                pos[d] = l % cur_dim;
-                l /= cur_dim;
-            }
-            return pos;
+    SizeVector offset_dims(size_t l) const {
+        size_t n_dims = dims.size();
+        SizeVector pos(n_dims);
+        for (int rd = 1; rd <= n_dims; ++rd) {
+            const size_t d = n_dims - rd;
+            const size_t cur_dim = dims[d];
+            pos[d] = l % cur_dim;
+            l /= cur_dim;
        }
+        return pos;
+    }
 };
 }  // namespace ShapeInfer
-}  // namespace InferenceEngine
+}  // namespace InferenceEngine
+
--- a/inference-engine/src/mkldnn_plugin/mkldnn_exec_network.h
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_exec_network.h
@@ -29,7 +29,7 @@ public:
    MKLDNNExecNetwork(const InferenceEngine::ICNNNetwork &network, const Config &cfg,
                      const MKLDNNExtensionManager::Ptr& extMgr);

-    virtual ~MKLDNNExecNetwork() {
+    ~MKLDNNExecNetwork() {
        graphs.clear();
        extensionManager.reset();
    }
--- a/inference-engine/src/mkldnn_plugin/mkldnn_node.h
+++ b/inference-engine/src/mkldnn_plugin/mkldnn_node.h
@@ -541,7 +541,7 @@ private:
    Type type;
    int execIndex = -1;
    int socket;
-    bool weight_caching = false;
+    bool weight_caching;

    std::string typeToStr(Type type);

--- a/inference-engine/src/mkldnn_plugin/nodes/mkldnn_split_node.cpp
+++ b/inference-engine/src/mkldnn_plugin/nodes/mkldnn_split_node.cpp
@@ -57,9 +57,6 @@ void MKLDNNSplitNode::initSupportedPrimitiveDescriptors() {
    config.inConfs[0].desc = MKLDNNMemoryDesc(srcDims, inputDataType, memory::format::any);
    config.outConfs.resize(outDims.size());

-    if (srcDims.ndims() < 2)
-        THROW_IE_EXCEPTION << "Split " << getName() << " isn't supported 1d blobs";
-
    std::vector<memory::format> outFormats;

    auto axis_size = 0;
--- a/inference-engine/src/vpu/graph_transformer/include/vpu/frontend/frontend.hpp
+++ b/inference-engine/src/vpu/graph_transformer/include/vpu/frontend/frontend.hpp
@@ -125,6 +125,9 @@ public:
    void parseFloor(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);
    void parseTopK(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);
    void parseSelect(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);
+    void parseExpDetectionOutput(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);
+    void parseNonMaxSuppression(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);
+    void parseROIFeatureExtractor(const Model::Ptr& model, const ie::CNNLayerPtr& layer, const DataVector& inputs, const DataVector& outputs);

    //
    // Special layers
--- a/inference-engine/src/vpu/graph_transformer/include/vpu/model/stage.hpp
+++ b/inference-engine/src/vpu/graph_transformer/include/vpu/model/stage.hpp
@@ -141,6 +141,9 @@ VPU_DECLARE_ENUM(StageType,
    Floor = 102,
    TopK = 104,
    ReduceMin = 105,
+    ExpDetectionOutput = 106,  // ExperimentalDetectronDetectionOutput
+    NonMaxSuppression = 107,
+    ROIFeatureExtractor = 108,
 )

 //
--- a/inference-engine/src/vpu/graph_transformer/src/frontend/frontend.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/frontend/frontend.cpp
@@ -25,67 +25,71 @@ typedef void (FrontEnd::*parser_t)(
        const DataVector& outputs);

 ie::details::caseless_map<std::string, parser_t> g_parsers = {
-    {"Convolution",        &FrontEnd::parseConvolution},
-    {"Pooling",            &FrontEnd::parsePooling},
-    {"ReLU",               &FrontEnd::parseReLU},
-    {"Clamp",              &FrontEnd::parseClamp},
-    {"FullyConnected",     &FrontEnd::parseFullyConnected},
-    {"SoftMax",            &FrontEnd::parseSoftMax},
-    {"GRN",                &FrontEnd::parseGRN},
-    {"MVN",                &FrontEnd::parseMVN},
-    {"Norm",               &FrontEnd::parseNorm},
-    {"Concat",             &FrontEnd::parseConcat},
-    {"Eltwise",            &FrontEnd::parseEltwise},
-    {"Split",              &FrontEnd::parseSplit},
-    {"Sigmoid",            &FrontEnd::parseSigmoid},
-    {"TanH",               &FrontEnd::parseTanH},
-    {"PReLU",              &FrontEnd::parsePReLU},
-    {"Bias",               &FrontEnd::parseBias},
-    // Caffe Slice is transformed to Split by IE
-    {"Slice",              &FrontEnd::parseSplit},
-    {"BatchNormalization", &FrontEnd::parseBatchNorm},
-    {"ScaleShift",         &FrontEnd::parseScale},
-    {"Deconvolution",      &FrontEnd::parseDeconvolution},
-    {"Power",              &FrontEnd::parsePower},
-    {"Copy",               &FrontEnd::parseCopy},
-    {"Reshape",            &FrontEnd::parseReshape},
-    {"ELU",                &FrontEnd::parseELU},
+    {"Convolution",                              &FrontEnd::parseConvolution},
+    {"Pooling",                                  &FrontEnd::parsePooling},
+    {"ReLU",                                     &FrontEnd::parseReLU},
+    {"Clamp",                                    &FrontEnd::parseClamp},
+    {"FullyConnected",                           &FrontEnd::parseFullyConnected},
+    {"SoftMax",                                  &FrontEnd::parseSoftMax},
+    {"GRN",                                      &FrontEnd::parseGRN},
+    {"MVN",                                      &FrontEnd::parseMVN},
+    {"Norm",                                     &FrontEnd::parseNorm},
+    {"Concat",                                   &FrontEnd::parseConcat},
+    {"Eltwise",                                  &FrontEnd::parseEltwise},
+    {"Split",                                    &FrontEnd::parseSplit},
+    {"Sigmoid",                                  &FrontEnd::parseSigmoid},
+    {"TanH",                                     &FrontEnd::parseTanH},
+    {"PReLU",                                    &FrontEnd::parsePReLU},
+    {"Bias",                                     &FrontEnd::parseBias},
+    {"Slice",                                    &FrontEnd::parseSplit},  // Caffe Slice is transformed to Split by IE
+    {"BatchNormalization",                       &FrontEnd::parseBatchNorm},
+    {"ScaleShift",                               &FrontEnd::parseScale},
+    {"Deconvolution",                            &FrontEnd::parseDeconvolution},
+    {"Power",                                    &FrontEnd::parsePower},
+    {"Copy",                                     &FrontEnd::parseCopy},
+    {"ELU",                                      &FrontEnd::parseELU},
+
    // Flatten, Squeeze and Unsqueeze are represented as Reshape in VPU model
-    {"Flatten",            &FrontEnd::parseReshape},
-    {"Squeeze",            &FrontEnd::parseReshape},
-    {"Unsqueeze",          &FrontEnd::parseReshape},
-    {"Crop",               &FrontEnd::parseCrop},
-    {"Tile",               &FrontEnd::parseTile},
-    {"Normalize",          &FrontEnd::parseNormalize},
-    {"PriorBox",           &FrontEnd::parsePriorBox},
-    {"PriorBoxClustered",  &FrontEnd::parsePriorBoxClustered},
-    {"Permute",            &FrontEnd::parsePermute},
-    {"DetectionOutput",    &FrontEnd::parseDetectionOutput},
-    {"RegionYolo",         &FrontEnd::parseRegionYolo},
-    {"ReorgYolo",          &FrontEnd::parseReorgYolo},
-    {"CTCGreedyDecoder",   &FrontEnd::parseCTCDecoder},
-    {"Proposal",           &FrontEnd::parseProposal},
-    {"ROIPooling",         &FrontEnd::parseROIPooling},
-    {"PSROIPooling",       &FrontEnd::parsePSROIPooling},
-    {"Interp",             &FrontEnd::parseInterp},
-    {"Custom",             &FrontEnd::parseCustom},
-    {"MTCNN",              &FrontEnd::parseMTCNN},
-    {"LSTMCell",           &FrontEnd::parseLSTMCell},
-    {"Pad",                &FrontEnd::parsePad},
-    {"Resample",           &FrontEnd::parseResample},
-    {"ArgMax",             &FrontEnd::parseArgMax},
-    {"LSTMSequence",       &FrontEnd::parseRNN},
-    {"GEMM",               &FrontEnd::parseGEMM},
-    {"Log",                &FrontEnd::parseLog},
-    {"Exp",                &FrontEnd::parseExp},
-    {"ReverseSequence",    &FrontEnd::parseReverseSequence},
-    {"Gather",             &FrontEnd::parseGather},
-    {"ReduceAnd",          &FrontEnd::parseReduce},
-    {"Floor",              &FrontEnd::parseFloor},
-    {"TopK",               &FrontEnd::parseTopK},
-    {"ReduceMin",          &FrontEnd::parseReduce},
-    {"StridedSlice",       &FrontEnd::parseStridedSlice},
-    {"Select",             &FrontEnd::parseSelect},
+    {"Reshape",                                  &FrontEnd::parseReshape},
+    {"Flatten",                                  &FrontEnd::parseReshape},
+    {"Squeeze",                                  &FrontEnd::parseReshape},
+    {"Unsqueeze",                                &FrontEnd::parseReshape},
+
+    {"Crop",                                     &FrontEnd::parseCrop},
+    {"Tile",                                     &FrontEnd::parseTile},
+    {"Normalize",                                &FrontEnd::parseNormalize},
+    {"PriorBox",                                 &FrontEnd::parsePriorBox},
+    {"PriorBoxClustered",                        &FrontEnd::parsePriorBoxClustered},
+    {"Permute",                                  &FrontEnd::parsePermute},
+    {"DetectionOutput",                          &FrontEnd::parseDetectionOutput},
+    {"RegionYolo",                               &FrontEnd::parseRegionYolo},
+    {"ReorgYolo",                                &FrontEnd::parseReorgYolo},
+    {"CTCGreedyDecoder",                         &FrontEnd::parseCTCDecoder},
+    {"Proposal",                                 &FrontEnd::parseProposal},
+    {"ROIPooling",                               &FrontEnd::parseROIPooling},
+    {"PSROIPooling",                             &FrontEnd::parsePSROIPooling},
+    {"Interp",                                   &FrontEnd::parseInterp},
+    {"Custom",                                   &FrontEnd::parseCustom},
+    {"MTCNN",                                    &FrontEnd::parseMTCNN},
+    {"LSTMCell",                                 &FrontEnd::parseLSTMCell},
+    {"Pad",                                      &FrontEnd::parsePad},
+    {"Resample",                                 &FrontEnd::parseResample},
+    {"ArgMax",                                   &FrontEnd::parseArgMax},
+    {"LSTMSequence",                             &FrontEnd::parseRNN},
+    {"GEMM",                                     &FrontEnd::parseGEMM},
+    {"Log",                                      &FrontEnd::parseLog},
+    {"Exp",                                      &FrontEnd::parseExp},
+    {"ReverseSequence",                          &FrontEnd::parseReverseSequence},
+    {"Gather",                                   &FrontEnd::parseGather},
+    {"ReduceAnd",                                &FrontEnd::parseReduce},
+    {"Floor",                                    &FrontEnd::parseFloor},
+    {"TopK",                                     &FrontEnd::parseTopK},
+    {"ReduceMin",                                &FrontEnd::parseReduce},
+    {"StridedSlice",                             &FrontEnd::parseStridedSlice},
+    {"Select",                                   &FrontEnd::parseSelect},
+    {"ExperimentalDetectronDetectionOutput",     &FrontEnd::parseExpDetectionOutput},
+    {"NonMaxSuppression",                        &FrontEnd::parseNonMaxSuppression},
+    {"ExperimentalDetectronROIFeatureExtractor", &FrontEnd::parseROIFeatureExtractor},
 };

 std::atomic<int> g_counter(0);
--- a/inference-engine/src/vpu/graph_transformer/src/passes/merge_eltwise_and_relu.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/passes/merge_eltwise_and_relu.cpp
@@ -52,6 +52,16 @@ void PassImpl::run(const Model::Ptr& model) {
            continue;
        }

+        const bool allInputsAreFP16 = std::all_of(eltwiseStage->inputs().begin(), eltwiseStage->inputs().end(),
+            [](const Data& data) { return data->desc().type() == DataType::FP16; });
+
+        const bool allOutputsAreFP16 = std::all_of(eltwiseStage->outputs().begin(), eltwiseStage->outputs().end(),
+            [](const Data& data) { return data->desc().type() == DataType::FP16; });
+
+        if (!allInputsAreFP16 || !allOutputsAreFP16) {
+            continue;
+        }
+
        if (auto reluStage = getNextStage(eltwiseStage, {StageType::Relu, StageType::LeakyRelu, StageType::Clamp})) {
            auto reluInput = reluStage->input(0);
            auto reluOutput = reluStage->output(0);
--- a/inference-engine/src/vpu/graph_transformer/src/stages/copy.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/copy.cpp
@@ -66,7 +66,8 @@ protected:
    }

    void initialCheckImpl() const override {
-        assertInputsOutputsTypes(this, {{DataType::FP16}}, {{DataType::FP16}});
+        const auto& type = input(0)->desc().type();
+        assertInputsOutputsTypes(this, {{type}}, {{type}});
    }

    void serializeParamsImpl(BlobSerializer&) const override {
--- a/inference-engine/src/vpu/graph_transformer/src/stages/eltwise.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/eltwise.cpp
@@ -173,23 +173,43 @@ private:
    }

    void initialCheckImpl() const override {
-        assertInputsOutputsTypes(this, {{DataType::FP16}, {DataType::FP16}, {DataType::FP16}}, {{DataType::FP16}});
+        const auto& operation = type();
+        const auto& dataType = input(0)->desc().type();
+
+        auto supportedDataTypes = EnumSet<DataType>{DataType::FP16};
+        if (operation == StageType::Sum) {
+            supportedDataTypes.insert(DataType::S32);
+        }
+        IE_ASSERT(supportedDataTypes.find(dataType) != supportedDataTypes.end());
+
+        assertInputsOutputsTypes(this, {{dataType}, {dataType}, {dataType}}, {{dataType}});
    }

    void serializeParamsImpl(BlobSerializer& serializer) const override {
-        auto coeff1 = attrs().getOrDefault<float>("coeff1", 1.0f);
-        auto coeff2 = attrs().getOrDefault<float>("coeff2", 1.0f);
-        auto postOperation = attrs().getOrDefault<StageType>("postOperation", StageType::Empty);
-        auto negativeSlope = attrs().getOrDefault<float>("negativeSlope", 0.0f);
-        auto min_value = attrs().getOrDefault<float>("min_value", 0.0f);
-        auto max_value = attrs().getOrDefault<float>("max_value", 1.0f);
+        const auto& type = input(0)->desc().type();

-        serializer.append(static_cast<float>(coeff1));
-        serializer.append(static_cast<float>(coeff2));
+        if (type == DataType::FP16) {
+            serializer.append(attrs().getOrDefault<float>("coeff1", 1.0f));
+            serializer.append(attrs().getOrDefault<float>("coeff2", 1.0f));
+        } else if (type == DataType::S32) {
+            serializer.append(attrs().getOrDefault<std::int32_t>("coeff1", 1));
+            serializer.append(attrs().getOrDefault<std::int32_t>("coeff2", 1));
+        } else {
+             THROW_IE_EXCEPTION << type << " isn't supported";
+        }
+
+        auto postOperation = attrs().getOrDefault<StageType>("postOperation", StageType::Empty);
        serializer.append(static_cast<int>(postOperation));
-        serializer.append(static_cast<float>(negativeSlope));
-        serializer.append(static_cast<float>(min_value));
-        serializer.append(static_cast<float>(max_value));
+
+        if (type == DataType::FP16) {
+            serializer.append(attrs().getOrDefault<float>("negativeSlope", 0.0f));
+            serializer.append(attrs().getOrDefault<float>("min_value", 0.0f));
+            serializer.append(attrs().getOrDefault<float>("max_value", 1.0f));
+        } else {
+            serializer.append(attrs().getOrDefault<std::int32_t>("negativeSlope", 0));
+            serializer.append(attrs().getOrDefault<std::int32_t>("min_value", 0));
+            serializer.append(attrs().getOrDefault<std::int32_t>("max_value", 1));
+        }
    }

    void serializeDataImpl(BlobSerializer& serializer) const override {
@@ -218,14 +238,14 @@ void FrontEnd::parseEltwise(
    IE_ASSERT(outputs.size() == 1);

    auto stageType = StageType::None;
-    auto subCoefficient = 1.0f;
+    auto subCoefficient = 1;

    if (layer->_operation == ie::EltwiseLayer::eOperation::Sub) {
        if (inputs.size() != 2) {
            VPU_THROW_EXCEPTION << "Eltwise operation: " << layer->_operation << " with multiple inputs is not supported";
        }
        stageType = StageType::Sum;
-        subCoefficient = -1.f;
+        subCoefficient = -1;
    } else if (layer->_operation == ie::EltwiseLayer::eOperation::Mean) {
        if (inputs.size() != 2) {
            VPU_THROW_EXCEPTION << "Eltwise operation: " << layer->_operation << " with multiple inputs is not supported";
@@ -269,22 +289,32 @@ void FrontEnd::parseEltwise(
        tempInputs,
        {tempOutput});

+    const auto& type = inputs.front()->desc().type();
+    IE_ASSERT(type == DataType::FP16 || type == DataType::S32);
+
    if (layer->_operation == ie::EltwiseLayer::eOperation::Mean) {
+        // Mean supports only FP16
+        IE_ASSERT(type == DataType::FP16);
        stage->attrs().set<float>("coeff1",  0.5);
        stage->attrs().set<float>("coeff2",  0.5);
    } else {
        if (layer->coeff.size() > 0) {
-            stage->attrs().set<float>("coeff1", layer->coeff[0]);
+            if (type == DataType::FP16) {
+                stage->attrs().set<float>("coeff1", layer->coeff[0]);
+            } else {
+                stage->attrs().set<std::int32_t>("coeff1", layer->coeff[0]);
+            }
        }
-        if (layer->coeff.size() > 1 || subCoefficient != 1.0f) {
-            stage->attrs().set<float>("coeff2", subCoefficient * (layer->coeff.size() > 1 ? layer->coeff[1] : 1.0f));
+        if (layer->coeff.size() > 1 || subCoefficient != 1) {
+            if (type == DataType::FP16) {
+                stage->attrs().set<float>("coeff2", subCoefficient * (layer->coeff.size() > 1 ? layer->coeff[1] : 1.0f));
+            } else {
+                stage->attrs().set<std::int32_t>("coeff2", subCoefficient * (layer->coeff.size() > 1 ? layer->coeff[1] : 1));
+            }
        }
    }

    stage->attrs().set<StageType>("postOperation", StageType::Empty);
-    stage->attrs().set<float>("negativeSlope", 0.0f);
-    stage->attrs().set<float>("min_value", 0.0f);
-    stage->attrs().set<float>("max_value", 1.0f);

    tempInputs[0] = tempOutput;
    for (int ind = 2; ind < inputs.size(); ++ind) {
--- a/inference-engine/src/vpu/graph_transformer/src/stages/exp_detectionoutput.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/exp_detectionoutput.cpp
@@ -0,0 +1,151 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <vpu/frontend/frontend.hpp>
+
+#include <memory>
+
+namespace vpu {
+
+namespace {
+
+const int numDeltasWeights = 4;
+
+VPU_PACKED(ExpDetectionOutputParams {
+    float   deltas_weights[numDeltasWeights];
+    float   max_delta_log_wh;
+    float   nms_threshold;
+    float   score_threshold;
+    int32_t max_detections_per_image;
+    int32_t num_classes;
+    int32_t post_nms_count;
+    int32_t class_agnostic_box_regression;
+};)
+
+class ExpDetectionOutputStage final : public StageNode {
+private:
+    StagePtr cloneImpl() const override {
+        return std::make_shared<ExpDetectionOutputStage>(*this);
+    }
+
+    void propagateDataOrderImpl(StageDataInfo<DimsOrder>& orderInfo) override {
+    }
+
+    void getDataStridesRequirementsImpl(StageDataInfo<StridesRequirement>& stridesInfo) override {
+        for (const auto& inEdge : inputEdges()) {
+            stridesInfo.setInput(inEdge, StridesRequirement::compact());
+        }
+        for (const auto& outEdge : outputEdges()) {
+            stridesInfo.setOutput(outEdge, StridesRequirement::compact());
+        }
+    }
+
+    void finalizeDataLayoutImpl() override {
+    }
+
+    void getBatchSupportInfoImpl(StageDataInfo<BatchSupport>& batchInfo) override {
+    }
+
+    void initialCheckImpl() const override {
+        assertInputsOutputsTypes(this,
+             {{DataType::FP16}, {DataType::FP16}, {DataType::FP16}, {DataType::FP16}},
+             {{DataType::FP16}, {DataType::S32}, {DataType::FP16}});
+    }
+
+    void serializeParamsImpl(BlobSerializer& serializer) const override {
+        const auto& params = attrs().get<ExpDetectionOutputParams>("params");
+
+        serializer.append(params);
+    }
+
+    void serializeDataImpl(BlobSerializer& serializer) const override {
+        auto inputBoxes = _inputEdges[0]->input();
+        auto inputDeltas = _inputEdges[1]->input();
+        auto inputScores = _inputEdges[2]->input();
+        auto inputIMinfo = _inputEdges[3]->input();
+        auto outputBoxes = _outputEdges[0]->output();
+        auto outputClasses = _outputEdges[1]->output();
+        auto outputScores = _outputEdges[2]->output();
+
+        inputBoxes->serializeNewBuffer(serializer);
+        inputDeltas->serializeNewBuffer(serializer);
+        inputScores->serializeNewBuffer(serializer);
+        inputIMinfo->serializeNewBuffer(serializer);
+        outputBoxes->serializeNewBuffer(serializer);
+        outputClasses->serializeNewBuffer(serializer);
+        outputScores->serializeNewBuffer(serializer);
+    }
+};
+
+}  // namespace
+
+void FrontEnd::parseExpDetectionOutput(
+        const Model::Ptr& model,
+        const ie::CNNLayerPtr& layer,
+        const DataVector& inputs,
+        const DataVector& outputs) {
+    IE_ASSERT(inputs.size() == 4);
+    IE_ASSERT(outputs.size() == 3);
+
+    ExpDetectionOutputParams params;
+
+    const auto deltas_weights = layer->GetParamAsFloats("deltas_weights", {0.0f, 0.0f, 0.0f, 0.0f});
+    IE_ASSERT(deltas_weights.size() == numDeltasWeights);
+    for (int i = 0; i < numDeltasWeights; ++i)
+        params.deltas_weights[i] = deltas_weights[i];
+
+    params.max_delta_log_wh = layer->GetParamAsFloat("max_delta_log_wh", 0.0f);
+    params.nms_threshold = layer->GetParamAsFloat("nms_threshold", 0.0f);
+    params.score_threshold = layer->GetParamAsFloat("score_threshold", 0.0f);
+    params.max_detections_per_image = layer->GetParamAsFloat("max_detections_per_image", 0);
+    params.num_classes = layer->GetParamAsFloat("num_classes", 0);
+    params.post_nms_count = layer->GetParamAsFloat("post_nms_count", 0);
+    params.class_agnostic_box_regression = layer->GetParamAsFloat("class_agnostic_box_regression", 0) ? 1 : 0;
+
+    auto inputBoxes    = inputs[0];   // [numRois][4]
+    auto inputDeltas   = inputs[1];   // [numRois]([numClasses][4])
+    auto inputScores   = inputs[2];   // [numRois][numClasses]
+    auto inputIMinfo   = inputs[3];   // [2]
+    auto outputBoxes   = outputs[0];  // [maxDetections][4]
+    auto outputClasses = outputs[1];  // [maxDetections]
+    auto outputScores  = outputs[2];  // [maxDetections]
+
+    // from layer point of view, they are not N or C at all; but layout/order require:
+    //  2-dim => NC [N][C] [input Boxes, Deltas, Scores, IMinfo; output Boxes]
+    //  1-dim => C  [C]    [output Classes, Scores]
+
+    const int numRois       = inputBoxes->desc().dim(Dim::N);
+    const int numClasses    = inputScores->desc().dim(Dim::C);
+    const int maxDetections = params.max_detections_per_image;
+
+    IE_ASSERT((inputBoxes->desc().dims().size() == 2) &&
+              (inputBoxes->desc().dim(Dim::C) == 4));
+    IE_ASSERT((inputDeltas->desc().dims().size() == 2) &&
+              (inputDeltas->desc().dim(Dim::N) == numRois) &&
+              (inputDeltas->desc().dim(Dim::C) == numClasses * 4));
+    IE_ASSERT((inputScores->desc().dims().size() == 2) &&
+              (inputScores->desc().dim(Dim::N) == numRois));
+    IE_ASSERT((inputIMinfo->desc().dims().size() == 2) &&
+              (inputIMinfo->desc().dim(Dim::N) == 1) &&
+              (inputIMinfo->desc().dim(Dim::C) >= 2));
+
+    IE_ASSERT((outputBoxes->desc().dims().size() == 2) &&
+              (outputBoxes->desc().dim(Dim::N) >= maxDetections) &&
+              (outputBoxes->desc().dim(Dim::C) == 4));
+    IE_ASSERT((outputClasses->desc().dims().size() == 1) &&
+              (outputClasses->desc().dim(Dim::C) >= maxDetections));
+    IE_ASSERT((outputScores->desc().dims().size() == 1) &&
+              (outputScores->desc().dim(Dim::C) >= maxDetections));
+
+    auto stage = model->addNewStage<ExpDetectionOutputStage>(
+        layer->name,
+        StageType::ExpDetectionOutput,
+        layer,
+        inputs,
+        outputs);
+
+    stage->attrs().set("params", params);
+}
+
+}  // namespace vpu
--- a/inference-engine/src/vpu/graph_transformer/src/stages/gather.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/gather.cpp
@@ -73,11 +73,12 @@ protected:
    }

    StageSHAVEsRequirements getSHAVEsRequirementsImpl() const override {
-        return StageSHAVEsRequirements::NotNeeded;
+        return StageSHAVEsRequirements::OnlyOne;
    }

    void initialCheckImpl() const override {
-        assertInputsOutputsTypes(this, {{DataType::FP16}, {DataType::FP16}}, {{DataType::FP16}});
+        const auto& srcType = input(0)->desc().type();
+        assertInputsOutputsTypes(this, {{srcType}, {DataType::FP16, DataType::S32}}, {{srcType}});
    }

    void serializeParamsImpl(BlobSerializer& serializer) const override {
--- a/inference-engine/src/vpu/graph_transformer/src/stages/nms.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/nms.cpp
@@ -0,0 +1,102 @@
+// Copyright (C) 2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <vpu/frontend/frontend.hpp>
+#include <memory>
+#include <set>
+
+namespace vpu {
+
+namespace {
+
+class NonMaxSuppression final : public StageNode {
+private:
+    StagePtr cloneImpl() const override {
+        return std::make_shared<NonMaxSuppression>(*this);
+    }
+
+    void propagateDataOrderImpl(StageDataInfo<DimsOrder>& orderInfo) override {
+    }
+
+    void getDataStridesRequirementsImpl(StageDataInfo<StridesRequirement>& stridesInfo) override {
+    }
+
+    void finalizeDataLayoutImpl() override {
+    }
+
+    void getBatchSupportInfoImpl(StageDataInfo<BatchSupport>& batchInfo) override {
+    }
+
+    StageSHAVEsRequirements getSHAVEsRequirementsImpl() const override {
+        return StageSHAVEsRequirements::OnlyOne;
+    }
+
+    void initialCheckImpl() const override {
+        assertInputsOutputsTypes(this,
+                                 {{DataType::FP16},
+                                  {DataType::FP16},
+                                  {DataType::S32},
+                                  {DataType::FP16},
+                                  {DataType::FP16}},
+                                 {{DataType::S32}});
+    }
+
+    void finalCheckImpl() const override {
+    }
+
+    void serializeParamsImpl(BlobSerializer& serializer) const override {
+        bool center_point_box = attrs().get<bool>("center_point_box");
+
+        serializer.append(static_cast<int32_t>(center_point_box));
+    }
+
+    void serializeDataImpl(BlobSerializer& serializer) const override {
+        IE_ASSERT(_inputEdges.size() >= 2 && _inputEdges.size() <= 5);
+        IE_ASSERT(_outputEdges.size() == 1);
+
+        auto input1 = _inputEdges[0]->input();
+        auto input2 = _inputEdges[1]->input();
+        auto input3 = _inputEdges[2]->input();
+        auto input4 = _inputEdges[3]->input();
+        auto input5 = _inputEdges[4]->input();
+        auto output = _outputEdges[0]->output();
+
+        input1->serializeNewBuffer(serializer);
+        input2->serializeNewBuffer(serializer);
+        output->serializeNewBuffer(serializer);
+        input3->serializeNewBuffer(serializer);
+        input4->serializeNewBuffer(serializer);
+        input5->serializeNewBuffer(serializer);
+    }
+};
+
+}  // namespace
+
+void FrontEnd::parseNonMaxSuppression(
+        const Model::Ptr& model,
+        const ie::CNNLayerPtr& _layer,
+        const DataVector& inputs,
+        const DataVector& outputs) {
+    auto layer = std::dynamic_pointer_cast<ie::NonMaxSuppressionLayer>(_layer);
+    IE_ASSERT(layer != nullptr);
+
+    IE_ASSERT(inputs.size() >= 2 && inputs.size() <= 5);
+    IE_ASSERT(outputs.size() == 1);
+
+    DataVector tempInputs = inputs;
+    for (size_t fake = inputs.size(); fake < 5; fake++) {
+        tempInputs.push_back(model->addFakeData());
+    }
+
+    auto stage = model->addNewStage<NonMaxSuppression>(
+        layer->name,
+        StageType::NonMaxSuppression,
+        layer,
+        tempInputs,
+        outputs);
+
+    stage->attrs().set<bool>("center_point_box", layer->center_point_box);
+}
+
+}  // namespace vpu
--- a/inference-engine/src/vpu/graph_transformer/src/stages/roi_feature_extractor.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/roi_feature_extractor.cpp
@@ -0,0 +1,179 @@
+// Copyright (C) 2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <vpu/frontend/frontend.hpp>
+
+#include <vector>
+#include <string>
+#include <unordered_set>
+#include <memory>
+#include <set>
+
+namespace vpu {
+
+namespace {
+
+#define MAX_PYRAMID_LEVELS 16
+
+typedef SmallVector<int32_t, MAX_PYRAMID_LEVELS> PyramidLevelsVector;
+
+class ROIFeatureExtractorStage final : public StageNode {
+private:
+    StagePtr cloneImpl() const override {
+        return std::make_shared<ROIFeatureExtractorStage>(*this);
+    }
+
+    void propagateDataOrderImpl(StageDataInfo<DimsOrder>& orderInfo) override {
+        auto output = outputEdge(0)->output();
+
+        auto levels_num = attrs().get<int>("levels_num");
+        for (int i = 1; i < levels_num + 1; i++) {
+            orderInfo.setInput(inputEdge(i), inputEdge(i)->input()->desc().dimsOrder().createMovedDim(Dim::C, 2));
+        }
+        orderInfo.setOutput(outputEdge(0), output->desc().dimsOrder().createMovedDim(Dim::C, 2));
+    }
+
+    void getDataStridesRequirementsImpl(StageDataInfo<StridesRequirement>& stridesInfo) override {
+        for (const auto& inEdge : inputEdges()) {
+            stridesInfo.setInput(inEdge, StridesRequirement::compact());
+        }
+        for (const auto& outEdge : outputEdges()) {
+            stridesInfo.setOutput(outEdge, StridesRequirement::compact());
+        }
+    }
+
+    void finalizeDataLayoutImpl() override {
+    }
+
+    void getBatchSupportInfoImpl(StageDataInfo<BatchSupport>& batchInfo) override {
+    }
+
+    void initialCheckImpl() const override {
+        auto levels_num = attrs().get<int>("levels_num");
+        IE_ASSERT(numInputs() == levels_num + 1);
+        IE_ASSERT(numOutputs() == 1 || numOutputs() == 2);
+
+        assertAllInputsOutputsTypes(this, DataType::FP16, DataType::FP16);
+    }
+
+    void serializeParamsImpl(BlobSerializer& serializer) const override {
+        auto pooled_w = attrs().get<int>("pooled_w");
+        auto pooled_h = attrs().get<int>("pooled_h");
+        auto sampling_ratio = attrs().get<int>("sampling_ratio");
+        auto levels_num = attrs().get<int>("levels_num");
+        auto use_output_rois = attrs().get<int>("use_output_rois");
+        auto pyramid_scales = attrs().get<PyramidLevelsVector>("pyramid_scales");
+
+        serializer.append(static_cast<uint32_t>(pooled_w));
+        serializer.append(static_cast<uint32_t>(pooled_h));
+        serializer.append(static_cast<uint32_t>(sampling_ratio));
+        serializer.append(static_cast<uint32_t>(levels_num));
+        serializer.append(static_cast<uint32_t>(use_output_rois));
+
+        for (int i = 0; i < pyramid_scales.size(); i++) {
+            serializer.append(static_cast<int32_t>(pyramid_scales[i]));
+        }
+    }
+
+    void serializeDataImpl(BlobSerializer& serializer) const override {
+        auto levels_num = attrs().get<int>("levels_num");
+
+        IE_ASSERT(numInputs() == levels_num + 1);
+        IE_ASSERT(numOutputs() == 1 || numOutputs() == 2);
+
+        for (int i = 0; i < levels_num + 1; i++) {
+            inputEdge(i)->input()->serializeNewBuffer(serializer);
+        }
+
+        for (auto i = 0; i < numOutputs(); i++) {
+            auto output = outputEdge(i)->output();
+            output->serializeNewBuffer(serializer);
+        }
+
+        tempBuffer(0)->serializeNewBuffer(serializer);
+    }
+};
+
+}  // namespace
+
+void FrontEnd::parseROIFeatureExtractor(
+        const Model::Ptr& model,
+        const ie::CNNLayerPtr& layer,
+        const DataVector& inputs,
+        const DataVector& outputs) {
+    IE_ASSERT(inputs.size() > 1);
+    IE_ASSERT(outputs.size() == 1 || outputs.size() == 2);
+    auto levels_num = inputs.size() - 1;
+
+    auto stage = model->addNewStage<ROIFeatureExtractorStage>(
+        layer->name,
+        StageType::ROIFeatureExtractor,
+        layer,
+        inputs,
+        outputs);
+
+    auto output_dim_ = layer->GetParamAsInt("output_size");
+    auto pyramid_scales_ = layer->GetParamAsInts("pyramid_scales");
+    auto sampling_ratio_ = layer->GetParamAsInt("sampling_ratio");
+    auto pooled_height_ = output_dim_;
+    auto pooled_width_ = output_dim_;
+
+    auto rois = inputs[0];
+    auto num_rois = rois->desc().dim(Dim::N);
+    auto channels_num = inputs[1]->desc().dim(Dim::C);
+
+    stage->attrs().set<int>("levels_num", levels_num);
+    stage->attrs().set<int>("pooled_w", pooled_width_);
+    stage->attrs().set<int>("pooled_h", pooled_height_);
+    stage->attrs().set<int>("sampling_ratio", sampling_ratio_);
+    stage->attrs().set<int>("use_output_rois", outputs.size() == 2);
+
+    IE_ASSERT(pyramid_scales_.size() <= MAX_PYRAMID_LEVELS);
+
+    PyramidLevelsVector pyramidScales(MAX_PYRAMID_LEVELS, 1);
+    for (int i = 0; i < pyramid_scales_.size(); i++) {
+        pyramidScales[i] = pyramid_scales_[i];
+    }
+    stage->attrs().set<PyramidLevelsVector>("pyramid_scales", pyramidScales);
+
+    const int feaxels_per_roi = pooled_height_ * pooled_width_ * channels_num;
+
+    const int roi_height_max = 320; const int roi_width_max = 320;
+    int roi_bin_grid_h = (sampling_ratio_ > 0) ? sampling_ratio_ : static_cast<int>(ceil(roi_height_max / pooled_height_));
+    int roi_bin_grid_w = (sampling_ratio_ > 0) ? sampling_ratio_ : static_cast<int>(ceil(roi_width_max / pooled_width_));
+
+    struct PreCalc {
+      int pos1;
+      int pos2;
+      int pos3;
+      int pos4;
+      float w1;
+      float w2;
+      float w3;
+      float w4;
+    };
+
+    int ALIGN_VALUE = 64;
+    int size_levels_id_buf = sizeof(int) * num_rois + ALIGN_VALUE;
+    int size_reordered_rois_buf = sizeof(int16_t) * 4 * num_rois + ALIGN_VALUE;
+    int size_original_rois_mapping_buf = sizeof(int) * num_rois + ALIGN_VALUE;
+    int size_output_rois_features_temp_buf = sizeof(int16_t) * feaxels_per_roi * num_rois + ALIGN_VALUE;
+    int size_rois_per_level_buf = (levels_num + 1) * sizeof(int) + ALIGN_VALUE;
+    int size_dummy_mapping_buf = sizeof(int) * num_rois + ALIGN_VALUE;
+    int size_pre_calc_buf = sizeof(PreCalc) * roi_bin_grid_h * roi_bin_grid_w * pooled_width_ * pooled_height_ + ALIGN_VALUE;
+
+    int buffer_size = size_levels_id_buf +
+                      size_reordered_rois_buf +
+                      size_original_rois_mapping_buf +
+                      size_output_rois_features_temp_buf +
+                      size_rois_per_level_buf +
+                      size_dummy_mapping_buf +
+                      size_pre_calc_buf;
+
+    model->addTempBuffer(
+        stage,
+        DataDesc({buffer_size}));
+}
+
+}  // namespace vpu
--- a/inference-engine/src/vpu/graph_transformer/src/stages/split.cpp
+++ b/inference-engine/src/vpu/graph_transformer/src/stages/split.cpp
@@ -242,6 +242,12 @@ void FrontEnd::parseSplit(

        _stageBuilder->addSplitStage(model, layer->name, layer, axis, input, onlyUsedOutputs);
    }
+
+    for (const auto& output : outputs) {
+        if (output->origData()->getInputTo().empty()) {
+            model->removeUnusedData(output);
+        }
+    }
 }

 Stage StageBuilder::addSplitStage(
--- a/inference-engine/tests/helpers/single_layer_common.cpp
+++ b/inference-engine/tests/helpers/single_layer_common.cpp
@@ -115,6 +115,30 @@ void BufferWrapper::insert(size_t index, float value) {
    }
 }

+void CompareCommonExact(const InferenceEngine::Blob::Ptr &actual,
+                        const InferenceEngine::Blob::Ptr &expected) {
+    ASSERT_NE(actual, nullptr);
+    ASSERT_NE(expected, nullptr);
+    const int32_t* res_ptr = actual->cbuffer().as<const int32_t*>();
+    const int32_t* ref_ptr = expected->cbuffer().as<const int32_t*>();
+    bool differ = false;
+    size_t actualFirstErrIdx = 0;
+    size_t expectedFirstErrIdx = 0;
+    std::function<void(size_t, size_t)> exactErrorUpdater = [&](size_t actualIdx, size_t expectedIdx) {
+        auto actual = res_ptr[actualIdx];
+        auto expected = ref_ptr[expectedIdx];
+        if ((actual != expected) && !differ) {
+            actualFirstErrIdx = actualIdx;
+            expectedFirstErrIdx = expectedIdx;
+            differ = true;
+        }
+    };
+    CompareCommon(actual, expected, exactErrorUpdater);
+    ASSERT_EQ(differ, false)
+        << "expectedFirstErrIdx = " << expectedFirstErrIdx
+        << " actualFirstErrIdx = " << actualFirstErrIdx;
+}
+
 void CompareCommonAbsolute(const Blob::Ptr& actual, const Blob::Ptr& expected, float tolerance) {
    ASSERT_NE(actual, nullptr);
    ASSERT_NE(expected, nullptr);
@@ -134,7 +158,7 @@ void CompareCommonAbsolute(const Blob::Ptr& actual, const Blob::Ptr& expected, f
            expectedMaxErrId = expectedIdx;
        }
    };
-    CompareCommon(actual, expected, tolerance, absoluteErrorUpdater);
+    CompareCommon(actual, expected, absoluteErrorUpdater);

    ASSERT_NEAR(ref_ptr[expectedMaxErrId], res_ptr[actualMaxErrId], tolerance)
                        << "expectedMaxErrId = " << expectedMaxErrId
@@ -161,7 +185,7 @@ void CompareCommonRelative(const Blob::Ptr& actual, const Blob::Ptr& expected, f
            expectedMaxErrId = expectedIdx;
        }
    };
-    CompareCommon(actual, expected, tolerance, relatedErrorUpdater);
+    CompareCommon(actual, expected, relatedErrorUpdater);

    float abs_threshold = fabsf(ref_ptr[expectedMaxErrId]) * tolerance;
    ASSERT_NEAR(ref_ptr[expectedMaxErrId], res_ptr[actualMaxErrId], abs_threshold)
@@ -169,7 +193,7 @@ void CompareCommonRelative(const Blob::Ptr& actual, const Blob::Ptr& expected, f
                        << " actualMaxErrId = " << actualMaxErrId;
 }

-void CompareCommon(const Blob::Ptr& actual, const Blob::Ptr& expected, float tolerance,
+void CompareCommon(const Blob::Ptr& actual, const Blob::Ptr& expected,
                   const std::function<void(size_t, size_t)>& errorUpdater) {
    ASSERT_NE(actual, nullptr);
    ASSERT_NE(expected, nullptr);
--- a/inference-engine/tests/helpers/single_layer_common.hpp
+++ b/inference-engine/tests/helpers/single_layer_common.hpp
@@ -11,7 +11,6 @@
 #include <xml_net_builder.hpp>
 #include <xml_helper.hpp>
 #include <common_layers_params.hpp>
-#include <tests_common.hpp>

 #ifndef USE_BOOST_RE

@@ -25,13 +24,13 @@
 #define FIND_STR(SRC, PATTERN) boost::regex_search(SRC, boost::regex(PATTERN))
 #endif

-#define REPLACE_WITH_NUM(SRC, PATTERN, NUM) REPLACE_WITH_STR(SRC, PATTERN, to_string_c_locale(NUM))
+#define REPLACE_WITH_NUM(SRC, PATTERN, NUM) REPLACE_WITH_STR(SRC, PATTERN, std::to_string(NUM))
 #define REPLACE_WITH_NUM_VECTOR(SRC, PATTERN, NUMS) \
 	{ std::string result; \
        if (NUMS.size() > 0) { \
-            result += to_string_c_locale(NUMS[0]); \
+            result += std::to_string(NUMS[0]); \
            for (int i = 1; i < NUMS.size(); i++) { \
-                    result += "," + to_string_c_locale(NUMS[i]); \
+                    result += "," + std::to_string(NUMS[i]); \
            } \
        } \
 	REPLACE_WITH_STR(SRC, PATTERN, result); }
@@ -39,9 +38,9 @@
 	{ std::string result; \
        auto nums_size = NUMS.size(); \
        if (nums_size > 0) { \
-            result += to_string_c_locale(NUMS[nums_size - 1]); \
+            result += std::to_string(NUMS[nums_size - 1]); \
            for (int i = 2; i <= nums_size; i++) { \
-                    result += "," + to_string_c_locale(NUMS[nums_size - i]); \
+                    result += "," + std::to_string(NUMS[nums_size - i]); \
            } \
        } \
 	REPLACE_WITH_STR(SRC, PATTERN, result); }
@@ -136,9 +135,11 @@ public:

 void CompareCommon(const InferenceEngine::Blob::Ptr &actual,
                   const InferenceEngine::Blob::Ptr &expected,
-                   float tolerance,
                   const std::function<void(size_t, size_t)> &errorUpdater);

+void CompareCommonExact(const InferenceEngine::Blob::Ptr &actual,
+                        const InferenceEngine::Blob::Ptr &expected);
+
 void CompareCommonAbsolute(const InferenceEngine::Blob::Ptr &actual,
                           const InferenceEngine::Blob::Ptr &expected,
                           float tolerance);
--- a/inference-engine/tests/helpers/tests_common.hpp
+++ b/inference-engine/tests/helpers/tests_common.hpp
@@ -27,14 +27,6 @@
 # include "Psapi.h"
 #endif

-template <class T>
-inline std::string to_string_c_locale(T value) {
-    std::stringstream val_stream;
-    val_stream.imbue(std::locale("C"));
-    val_stream << value;
-    return val_stream.str();
-}
-
 class BaseTestCreator {
 protected:
    std::string _type;
@@ -377,17 +369,17 @@ public:
    }

    std::string replace(std::string& str, const std::string& from, const int& to) {
-        replace(str, from, to_string_c_locale(to));
+        replace(str, from, std::to_string(to));
        return str;
    }

    std::string replace(std::string& str, const std::string& from, const size_t& to) {
-        replace(str, from, to_string_c_locale(to));
+        replace(str, from, std::to_string(to));
        return str;
    }

    std::string replace(std::string& str, const std::string& from, const float& to) {
-        replace(str, from, to_string_c_locale(to));
+        replace(str, from, std::to_string(to));
        return str;
    }
    // trim from both ends (in place)
--- a/inference-engine/tests/unit/cnn_network/parameters.h
+++ b/inference-engine/tests/unit/cnn_network/parameters.h
@@ -214,13 +214,13 @@ public:
                }
                case ParametersValues::FLOAT_POSITIVE: {
                    for (int j = 0; j < magicNumber; ++j) {
-                        paramsValues.push_back(to_string_c_locale(distFloatPositive(gen)));
+                        paramsValues.push_back(std::to_string(distFloatPositive(gen)));
                    }
                    break;
                }
                case ParametersValues::FLOAT_NEGATIVE: {
                    for (int j = 0; j < magicNumber; ++j) {
-                        paramsValues.push_back(to_string_c_locale(distFloatNegative(gen)));
+                        paramsValues.push_back(std::to_string(distFloatNegative(gen)));
                    }
                    break;
                }
--- a/inference-engine/tests/unit/engines/gna/test_irs.cpp
+++ b/inference-engine/tests/unit/engines/gna/test_irs.cpp
@@ -3777,8 +3777,6 @@ std::string LSTMCellOnlyModel() {
    )V0G0N";
 };

-
-
 std::string TIModelWithLSTMCell1() {
    return R"V0G0N(
 <?xml version="1.0" ?>
--- a/inference-engine/tests/unit/engines/mkldnn/graph/layers/extensions/math_tests.cpp
+++ b/inference-engine/tests/unit/engines/mkldnn/graph/layers/extensions/math_tests.cpp
@@ -194,17 +194,17 @@ class MKLDNNCPUExtMathTests: public TestsCommon, public WithParamInterface<math_
        REPLACE_WITH_STR(model, "_MATH_FUNCTION_", p.math_function);

        if (p.alpha.size()) {
-            alpha = "alpha=\"" + to_string_c_locale(p.alpha[0]) + "\"";
+            alpha = "alpha=\"" + std::to_string(p.alpha[0]) + "\"";
        }
        REPLACE_WITH_STR(model, "_ALPHA_", alpha);

        if (p.beta.size()) {
-            beta = "beta=\"" + to_string_c_locale(p.beta[0]) + "\"";
+            beta = "beta=\"" + std::to_string(p.beta[0]) + "\"";
        }
        REPLACE_WITH_STR(model, "_BETA_", beta);

        if (p.gamma.size()) {
-            gamma = "gamma=\"" + to_string_c_locale(p.gamma[0]) + "\"";
+            gamma = "gamma=\"" + std::to_string(p.gamma[0]) + "\"";
        }
        REPLACE_WITH_STR(model, "_GAMMA_", gamma);
        return model;
--- a/inference-engine/tests/unit/engines/mkldnn/graph/layers/extensions/reduce_tests.cpp
+++ b/inference-engine/tests/unit/engines/mkldnn/graph/layers/extensions/reduce_tests.cpp
@@ -22,6 +22,7 @@ struct reduce_test_params {
    std::string                 reduce_type;
    bool                        keep_dims;
    InferenceEngine::SizeVector in_shape;
+    std::string                 inType;
    std::vector<float>          input_tensor;
    std::vector<int32_t>        axes_for_reduction;
    InferenceEngine::SizeVector out_shape;
@@ -30,15 +31,15 @@ struct reduce_test_params {
    std::vector<std::function<void(MKLDNNPlugin::PrimitiveDescInfo)>> comp;
 };

-template <typename F>
+template <typename src_t, typename dst_t, typename F>
 void reduce(
-    const float *src_data,
+    const src_t *src_data,
    InferenceEngine::SizeVector src_dims,
    InferenceEngine::SizeVector srcStrides,
-    float* dst_data,
+    dst_t* dst_data,
    InferenceEngine::SizeVector dst_dims,
    InferenceEngine::SizeVector dstStrides,
-    float init_value,
+    dst_t init_value,
    bool keep_dims,
    InferenceEngine::SizeVector skip_dims,
    F func
@@ -64,19 +65,20 @@ void reduce(
    }
 }

+template <typename src_t, typename dst_t>
 void ref_reduce(
    std::string reduce_type,
-    InferenceEngine::TBlob<float> &src,
+    InferenceEngine::TBlob<src_t> &src,
    bool keep_dims,
    std::vector<int32_t> axes_for_reduction,
-    InferenceEngine::TBlob<float> &dst,
+    InferenceEngine::TBlob<dst_t> &dst,
    InferenceEngine::SizeVector &out_dims
 ) {
    size_t i, src_idx, dst_idx;
-    const float *src_data = src.data();
+    const src_t *src_data = src.data();
    InferenceEngine::SizeVector src_dims = src.getTensorDesc().getDims();
    InferenceEngine::SizeVector srcStrides = src.getTensorDesc().getBlockingDesc().getStrides();
-    float* dst_data = dst.data();
+    dst_t* dst_data = dst.data();
    InferenceEngine::SizeVector dst_dims = dst.getTensorDesc().getDims();
    InferenceEngine::SizeVector dstStrides = dst.getTensorDesc().getBlockingDesc().getStrides();
    InferenceEngine::SizeVector skip_dims;
@@ -116,26 +118,26 @@ void ref_reduce(

    if (reduce_type == "ReduceAnd") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 1.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x && y; } );
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 1, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x && y; } );
        } else {
-            dst_data[0] = 1.0f;
+            dst_data[0] = 1;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] = dst_data[0] && src_data[src_idx];
        }
    } else if (reduce_type == "ReduceL1") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                   [](float x, float y)->float { return x + (std::abs)(y); } );
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                   [](dst_t x, src_t y)->dst_t { return x + (std::abs)(y); } );
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += (std::abs)(src_data[src_idx]);
        }
    } else if (reduce_type == "ReduceL2") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + y * y; } );
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + y * y; } );

            for (i = 0; i < dstStrides[0] * dst_dims[0]; ++i)
                dst_data[i] = (std::sqrt)(dst_data[i]);
@@ -147,43 +149,43 @@ void ref_reduce(
        }
    } else if (reduce_type == "ReduceLogSum") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + y; });

            for (i = 0; i < dstStrides[0] * dst_dims[0]; ++i)
                dst_data[i] = logf(dst_data[i]);
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += src_data[src_idx];
            dst_data[0] = logf(dst_data[0]);
        }
    } else if (reduce_type == "ReduceLogSumExp") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + expf(y); });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + expf(y); });

            for (i = 0; i < dstStrides[0] * dst_dims[0]; ++i)
                dst_data[i] = logf(dst_data[i]);
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += expf(src_data[src_idx]);
            dst_data[0] = logf(dst_data[0]);
        }
    } else if (reduce_type == "ReduceMax") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, FLT_MIN, keep_dims, skip_dims,
-                [](float x, float y)->float { return x > y ? x : y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, (std::numeric_limits<dst_t>::min)(), keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x > y ? x : y; });
        } else {
-            dst_data[0] = FLT_MIN;
+            dst_data[0] = (std::numeric_limits<dst_t>::min)();
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] = dst_data[0] > src_data[src_idx] ? dst_data[0] : src_data[src_idx];
        }
    } else if (reduce_type == "ReduceMean") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + y; });
            float reduced_dims_work_amount = 1.f;
            for (size_t axis : axes_for_reduction) {
                reduced_dims_work_amount *= static_cast<float>(src_dims[axis]);
@@ -191,24 +193,24 @@ void ref_reduce(
            for (i = 0; i < dstStrides[0] * dst_dims[0]; ++i)
                dst_data[i] /= reduced_dims_work_amount;
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += src_data[src_idx];
            dst_data[0] /= static_cast<float>(srcStrides[0] * src_dims[0]);
        }
    } else if (reduce_type == "ReduceMin") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, FLT_MAX, keep_dims, skip_dims,
-                [](float x, float y)->float { return x < y ? x : y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, (std::numeric_limits<dst_t>::max)(), keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x < y ? x : y; });
        } else {
-            dst_data[0] = FLT_MAX;
+            dst_data[0] = (std::numeric_limits<dst_t>::max)();
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] = dst_data[0] < src_data[src_idx] ? dst_data[0] : src_data[src_idx];
        }
    } else if (reduce_type == "ReduceOr") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                   [](float x, float y)->float { return x || y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                   [](dst_t x, src_t y)->dst_t { return x || y; });
        } else {
            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
@@ -216,39 +218,39 @@ void ref_reduce(
        }
    } else if (reduce_type == "ReduceProd") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 1.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x * y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 1, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x * y; });
        } else {
-            dst_data[0] = 1.0f;
+            dst_data[0] = 1;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] *= src_data[src_idx];
        }
    } else if (reduce_type == "ReduceSum") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + y; });
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += src_data[src_idx];
        }
    } else if (reduce_type == "ReduceSumSquare") {
        if (out_dims.size()) {
-            reduce(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0.0f, keep_dims, skip_dims,
-                [](float x, float y)->float { return x + y * y; });
+            reduce<src_t, dst_t>(src_data, src_dims, srcStrides, dst_data, dst_dims, dstStrides, 0, keep_dims, skip_dims,
+                [](dst_t x, src_t y)->dst_t { return x + y * y; });
        } else {
-            dst_data[0] = 0.0f;
+            dst_data[0] = 0;
            for (src_idx = 0; src_idx < srcStrides[0] * src_dims[0]; ++src_idx)
                dst_data[0] += src_data[src_idx] * src_data[src_idx];
        }
    }
 }

-class MKLDNNCPUExtReduceTests : public TestsCommon, public WithParamInterface<reduce_test_params> {
+class MKLDNNCPUExtReducesTests : public TestsCommon, public WithParamInterface<reduce_test_params> {
    std::string model_t = R"V0G0N(
 <net Name="Reduce_net" version="2" precision="FP32" batch="1">
    <layers>
-        <layer name="input" type="Input" precision="FP32" id="1">
+        <layer name="input" type="Input" precision="_IP_" id="1">
            <output>
                <port id="1">
                    _IN_
@@ -262,18 +264,18 @@ class MKLDNNCPUExtReduceTests : public TestsCommon, public WithParamInterface<re
                </port>
            </output>
        </layer>
-        <layer name="output" id="2" type="_REDUCE_TYPE_" precision="FP32">
+        <layer name="output" id="2" type="_REDUCE_TYPE_">
            <data keep_dims="_KEEP_DIMS_" />
            <input>
-                <port id="1">
+                <port id="1" precision="_IP_">
                    _IN_
                </port>
-                <port id="2">
+                <port id="2" precision="I32">
                    <dim>_DIM_SIZE_</dim>
                </port>
            </input>
            <output>
-                <port id="3">
+                <port id="3" precision="_OP_">
                    _OUT_
                </port>
            </output>
@@ -296,6 +298,8 @@ class MKLDNNCPUExtReduceTests : public TestsCommon, public WithParamInterface<re
            in_shape += std::to_string(p.in_shape[i]) + "</dim>\n";
        }
        REPLACE_WITH_STR(model, "_IN_", in_shape);
+        REPLACE_WITH_STR(model, "_IP_", p.inType);
+        REPLACE_WITH_STR(model, "_OP_", p.inType);
        REPLACE_WITH_NUM(model, "_DIM_SIZE_", p.axes_for_reduction.size());
        REPLACE_WITH_STR(model, "_REDUCE_TYPE_", p.reduce_type);
        REPLACE_WITH_NUM(model, "_KEEP_DIMS_", p.keep_dims);
@@ -312,7 +316,8 @@ protected:
    virtual void TearDown() {
    }

-    static void fill_data_dbgval(float *data, size_t size) {
+    template <typename T>
+    static void fill_data_dbgval(T *data, size_t size) {
        for (size_t i = 0; i < size; i++) {
            data[i] = i + 1;
        }
@@ -341,29 +346,11 @@ protected:

            std::pair<std::string, InferenceEngine::DataPtr> item = *out.begin();

-            InferenceEngine::TBlob<float>::Ptr output;
-            output = InferenceEngine::make_shared_blob<float>(item.second->getTensorDesc());
-            output->allocate();
-            outputBlobs[item.first] = output;
-
-            // Output Reference
-            InferenceEngine::TBlob<float> dst_ref(item.second->getTensorDesc());
-            dst_ref.allocate();
-
            // Input Data
            InferenceEngine::Blob::Ptr src;
-            src = InferenceEngine::make_shared_blob<float>({ InferenceEngine::Precision::FP32, p.in_shape, InferenceEngine::TensorDesc::getLayoutByDims(p.in_shape) });
-            src->allocate();
-            if(p.input_tensor.size())
-                memcpy(src->buffer(), &p.input_tensor[0], sizeof(float)*p.input_tensor.size());
-            else
-                fill_data_dbgval(src->buffer(), src->size());
-            auto * srcPtr = dynamic_cast<InferenceEngine::TBlob<float>*>(src.get());
-            if (srcPtr == nullptr)
-                FAIL() << "Cannot cast blob to TBlob<float>.";
+            InferenceEngine::SizeVector out_dims;

            InferenceEngine::BlobMap srcs;
-            srcs.insert(std::pair<std::string, InferenceEngine::Blob::Ptr>("input", src));

            InferenceEngine::Blob::Ptr seq_lengthsIdx;
            InferenceEngine::SizeVector seq_lengths_dim(1, p.axes_for_reduction.size());
@@ -376,109 +363,181 @@ protected:
                FAIL() << "Cannot cast blob to TBlob<int32_t>.";

            srcs.insert(std::pair<std::string, InferenceEngine::Blob::Ptr>("axes_for_reduction", seq_lengthsIdx));
+            if (p.inType == "FP32") {
+                InferenceEngine::TBlob<float>::Ptr output;
+                output = InferenceEngine::make_shared_blob<float>(item.second->getTensorDesc());
+                output->allocate();
+                outputBlobs[item.first] = output;

+                InferenceEngine::TBlob<float> dst_ref(item.second->getTensorDesc());
+                dst_ref.allocate();
+
+                src = InferenceEngine::make_shared_blob<float>({InferenceEngine::Precision::FP32, p.in_shape,
+                                                                InferenceEngine::TensorDesc::getLayoutByDims(p.in_shape)});
+                src->allocate();
+                if (p.input_tensor.size())
+                    for (int i = 0; i < p.input_tensor.size(); i++) {
+                        static_cast<float*>(src->buffer())[i] = static_cast<float>(p.input_tensor[i]);
+                    }
+                else
+                    fill_data_dbgval<float>(src->buffer(), src->size());
+                auto *srcPtr = dynamic_cast<InferenceEngine::TBlob<float> *>(src.get());
+                if (srcPtr == nullptr)
+                    FAIL() << "Cannot cast blob to TBlob<float>.";
+
+                ref_reduce<float, float>(p.reduce_type, *srcPtr, p.keep_dims, p.axes_for_reduction, dst_ref, out_dims);
+                if (p.reference.size())
+                    if (memcmp(dst_ref.data(), &p.reference[0], p.reference.size() * sizeof(float)) != 0)
+                        FAIL() << "Wrong result with compare reference vector!";
+                // Infer
+                srcs.insert(std::pair<std::string, InferenceEngine::Blob::Ptr>("input", src));
+                graph.Infer(srcs, outputBlobs);
+                compare(*output, dst_ref);
+            } else if (p.inType == "I32") {
+                InferenceEngine::TBlob<int32_t>::Ptr output;
+                output = InferenceEngine::make_shared_blob<int32_t>(item.second->getTensorDesc());
+                output->allocate();
+                outputBlobs[item.first] = output;
+
+                InferenceEngine::TBlob<int32_t> dst_ref({ InferenceEngine::Precision::I32, p.out_shape, InferenceEngine::TensorDesc::getLayoutByDims(p.out_shape) });
+                dst_ref.allocate();
+
+                src = InferenceEngine::make_shared_blob<int32_t>({InferenceEngine::Precision::I32, p.in_shape,
+                                                                  InferenceEngine::TensorDesc::getLayoutByDims(p.in_shape)});
+                src->allocate();
+                if (p.input_tensor.size())
+                    for (int i = 0; i < p.input_tensor.size(); i++) {
+                        static_cast<int32_t*>(src->buffer())[i] = static_cast<int32_t>(p.input_tensor[i]);
+                    }
+                else
+                    fill_data_dbgval<int32_t>(src->buffer(), src->size());
+                auto *srcPtr = dynamic_cast<InferenceEngine::TBlob<int32_t> *>(src.get());
+                if (srcPtr == nullptr)
+                    FAIL() << "Cannot cast blob to TBlob<int32_t>.";
+
+                ref_reduce<int32_t, int32_t>(p.reduce_type, *srcPtr, p.keep_dims, p.axes_for_reduction, dst_ref, out_dims);
+                if (p.reference.size()) {
+                    for (int i = 0; i < p.reference.size(); i++) {
+                        if (dst_ref.data()[i] != p.reference[i])
+                            FAIL() << "Wrong result with compare reference vector!";
+                        //std::cout << p.reference[i] << " " << dst_ref.data()[i] << std::endl;
+                    }
+                }
+
+                // Infer
+                srcs.insert(std::pair<std::string, InferenceEngine::Blob::Ptr>("input", src));
+                graph.Infer(srcs, outputBlobs);
+                compare(*output, dst_ref);
+            }
            // Check results
-            InferenceEngine::SizeVector out_dims;
-            ref_reduce(p.reduce_type, *srcPtr, p.keep_dims, p.axes_for_reduction, dst_ref, out_dims);
            if (out_dims.size() != p.out_shape.size())
                FAIL() << "Wrong out_shape size!";
            for (size_t i = 0; i < p.out_shape.size(); i++) {
                if (out_dims[i] != p.out_shape[i])
                    FAIL() << "Wrong out_shape dimensions!";
            }
-            if (p.reference.size())
-                if (memcmp(dst_ref.data(), &p.reference[0], p.reference.size() * sizeof(float)) != 0)
-                    FAIL() << "Wrong result with compare reference vector!";

-            // Infer
-            graph.Infer(srcs, outputBlobs);
-            compare(*output, dst_ref);
        } catch (const InferenceEngine::details::InferenceEngineException &e) {
            FAIL() << e.what();
        }
    }
 };

-TEST_P(MKLDNNCPUExtReduceTests, TestsReduceSum) {}
+TEST_P(MKLDNNCPUExtReducesTests, TestsReduceSum) {}

 INSTANTIATE_TEST_CASE_P(
-    TestsReduceSum, MKLDNNCPUExtReduceTests,
+    TestsReduceSum, MKLDNNCPUExtReducesTests,
    ::testing::Values(
-        // Params: reduce_type, keep_dims, in_shape, input_tensor, axes_for_reduction, out_shape, reference
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 0 },{ 1, 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ -3 },{ 1, 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 2 },{ 2, 3, 1 },{ 10, 26, 42, 58, 74, 90 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ -1 },{ 2, 3, 1 },{ 10, 26, 42, 58, 74, 90 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 0, 2 },{ 1, 3, 1 },{ 68, 100, 132 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 1, 2 },{ 2, 1, 1 },{ 78, 222 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 2, 1 },{ 2, 1, 1 },{ 78, 222 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 0, 1, 2 },{ 1, 1, 1 },{ 300 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 0, -2, 2 },{ 1, 1, 1 },{ 300 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 2, 2, 2, 2, 2, 2 },{},{ 0, 1, 2, 3, 4, 5, 6 },{ 1, 1, 1, 1, 1, 1, 1 },{ 8256 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 2, 2, 2, 2, 2, 2 },{},{ 6, 3, 1, 4, 0 },{ 1, 1, 2, 1, 1, 2, 1 },{ 1776, 1840, 2288, 2352 } },
-        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },{},{ 2, 2, 0, 2, 0 },{ 1, 3, 1 },{ 68, 100, 132 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 0 },{ 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ -3 },{ 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 2 },{ 2, 3 },{ 10, 26, 42, 58, 74, 90 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ -1 },{ 2, 3 },{ 10, 26, 42, 58, 74, 90 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 0, 2 },{ 3 },{ 68, 100, 132 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 1, 2 },{ 2 },{ 78, 222 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 2, 1 },{ 2 },{ 78, 222 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 0, 1, 2 },{},{ 300 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 0, -2, 2 },{},{ 300 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 2, 2, 2, 2, 2, 2 },{},{ 0, 1, 2, 3, 4, 5, 6 },{},{ 8256 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },{},{ 2, 2, 0, 2, 0 },{ 3 },{ 68, 100, 132 } },
-        reduce_test_params{ "ReduceSum", false,{ 2, 2, 2, 2, 2, 2, 2 },{},{ 6, 3, 1, 4, 0 },{ 2, 2 },{ 1776, 1840, 2288, 2352 } },
-        reduce_test_params{ "ReduceSum", true,{ 1, 2, 3, 4, 1 },{},{ 1 },{ 1, 1, 3, 4, 1 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
-        reduce_test_params{ "ReduceSum", false,{ 1, 2, 3, 4, 1 },{},{ 1 },{ 1, 3, 4, 1 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } }
+        // Params: reduce_type, keep_dims, in_shape, inType, input_tensor, axes_for_reduction, out_shape, reference
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 0 },{ 1, 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ -3 },{ 1, 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 2 },{ 2, 3, 1 },{ 10, 26, 42, 58, 74, 90 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ -1 },{ 2, 3, 1 },{ 10, 26, 42, 58, 74, 90 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 0, 2 },{ 1, 3, 1 },{ 68, 100, 132 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 1, 2 },{ 2, 1, 1 },{ 78, 222 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 2, 1 },{ 2, 1, 1 },{ 78, 222 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 0, 1, 2 },{ 1, 1, 1 },{ 300 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 0, -2, 2 },{ 1, 1, 1 },{ 300 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 2, 2, 2, 2, 2, 2 },"FP32",{},{ 0, 1, 2, 3, 4, 5, 6 },{ 1, 1, 1, 1, 1, 1, 1 },{ 8256 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 2, 2, 2, 2, 2, 2 },"FP32",{},{ 6, 3, 1, 4, 0 },{ 1, 1, 2, 1, 1, 2, 1 },{ 1776, 1840, 2288, 2352 } },
+        reduce_test_params{ "ReduceSum", true,{ 2, 3, 4 },"FP32",{},{ 2, 2, 0, 2, 0 },{ 1, 3, 1 },{ 68, 100, 132 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 0 },{ 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ -3 },{ 3, 4 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 2 },{ 2, 3 },{ 10, 26, 42, 58, 74, 90 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ -1 },{ 2, 3 },{ 10, 26, 42, 58, 74, 90 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 0, 2 },{ 3 },{ 68, 100, 132 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 1, 2 },{ 2 },{ 78, 222 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 2, 1 },{ 2 },{ 78, 222 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 0, 1, 2 },{},{ 300 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 0, -2, 2 },{},{ 300 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 2, 2, 2, 2, 2, 2 },"FP32",{},{ 0, 1, 2, 3, 4, 5, 6 },{},{ 8256 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"FP32",{},{ 2, 2, 0, 2, 0 },{ 3 },{ 68, 100, 132 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 2, 2, 2, 2, 2, 2 },"FP32",{},{ 6, 3, 1, 4, 0 },{ 2, 2 },{ 1776, 1840, 2288, 2352 } },
+        reduce_test_params{ "ReduceSum", true,{ 1, 2, 3, 4, 1 },"FP32",{},{ 1 },{ 1, 1, 3, 4, 1 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+        reduce_test_params{ "ReduceSum", false,{ 1, 2, 3, 4, 1 },"FP32",{},{ 1 },{ 1, 3, 4, 1 },{ 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36 } },
+// I32 tests
+        reduce_test_params{ "ReduceAnd", true,{ 2, 2, 2 },"I32",{1, 0, 1, 1, 0, 1, 1, 0},{ 2 },{ 2, 2, 1 },{ 0, 1, 0, 0} },
+        reduce_test_params{ "ReduceL1", true, { 3, 2, 2 },"I32",{},{ 2 },{ 3, 2, 1 },{ 3, 7, 11, 15, 19, 23 } },
+        reduce_test_params{ "ReduceL1", false, { 3, 2, 2 },"I32",{},{ 0, 1, 2 },{ },{ 78 } },
+        reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },"I32",{},{ 2 },{ 3, 2 },{ 2, 5, 7, 10, 13, 16 } },
+        reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },"I32",{},{ 0, 1, 2 },{ },{ 25 } },
+        reduce_test_params{ "ReduceLogSum", true,{ 10, 10, 2 },"I32",{},{ 2 },{ 10, 10, 1 },{} },
+        reduce_test_params{ "ReduceLogSumExp", true,{ 5, 5, 2 },"I32",{},{ 2 },{ 5, 5, 1 },{} },
+        reduce_test_params{ "ReduceMax", true,{ 3, 2, 2 },"I32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 20, 2, 40, 2, 60, 2 } },
+        reduce_test_params{ "ReduceMean", true, { 3, 2, 2 },"I32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 12, 1, 35, 1, 57, 1 } },
+        reduce_test_params{ "ReduceMin", false,{ 3, 2, 2 },"I32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 5, 1, 30, 1, 55, 1 } },
+        reduce_test_params{ "ReduceOr", true,{ 2, 2, 2 },"I32",{1, 0, 1, 1, 0, 0, 1, 0},{ 2 },{ 2, 2, 1 },{1, 1, 0, 1 } },
+        reduce_test_params{ "ReduceProd", true,{ 3, 2, 2 },"I32",{},{ 1 },{ 3, 1, 2 },{ 3, 8, 35, 48, 99, 120 } },
+        reduce_test_params{ "ReduceSum", false,{ 2, 3, 4 },"I32",{},{ 2, 2, 0, 2, 0 },{ 3 },{ 68, 100, 132 } },
+        reduce_test_params{ "ReduceSumSquare", true, { 3, 2, 2 },"I32",{},{ 1 },{ 3, 1, 2 },{ 10, 20, 74, 100, 202, 244 } },
+        reduce_test_params{ "ReduceSumSquare", false, { 3, 2, 2 },"I32",{},{ 0, 1, 2 },{ },{ 650 } }
 ));


-TEST_P(MKLDNNCPUExtReduceTests, TestsReduce) {}
+TEST_P(MKLDNNCPUExtReducesTests, TestsReduceAll) {}

 INSTANTIATE_TEST_CASE_P(
-    TestsReduce, MKLDNNCPUExtReduceTests,
+    TestsReduceAll, MKLDNNCPUExtReducesTests,
            ::testing::Values(
-// Params: reduce_type, keep_dims, in_shape, input_tensor, axes_for_reduction, out_shape, reference
-                reduce_test_params{ "ReduceAnd", true,{ 2, 2, 2 },{1, 0, 1, 1, 0, 1, 1, 0},{ 2 },{ 2, 2, 1 },{ 0, 1, 0, 0} },
-                reduce_test_params{ "ReduceAnd", false, { 2, 2, 2 },{1, 0, 1, 1, 0, 1, 1, 0},{ 0, 1, 2 },{ },{ 0 } },
-                reduce_test_params{ "ReduceL1", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{ } },
-                reduce_test_params{ "ReduceL1", true, { 3, 2, 2 },{},{ 2 },{ 3, 2, 1 },{ 3, 7, 11, 15, 19, 23 } },
-                reduce_test_params{ "ReduceL1", false, { 3, 2, 2 },{},{ 2 },{ 3, 2 },{ 3, 7, 11, 15, 19, 23 } },
-                reduce_test_params{ "ReduceL1", false, { 3, 2, 2 },{},{ 0, 1, 2 },{ },{ 78 } },
-                reduce_test_params{ "ReduceL2", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceL2", true,{ 3, 2, 2 },{},{ 2 },{ 3, 2, 1 },{ 2.23606798f, 5.f, 7.81024968f, 10.63014581f, 13.45362405f, 16.2788206f } },
-                reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },{},{ 2 },{ 3, 2 },{ 2.23606798f, 5.f, 7.81024968f, 10.63014581f, 13.45362405f, 16.2788206f } },
-                reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },{},{ 0, 1, 2 },{ },{ 25.49509757f } },
-                reduce_test_params{ "ReduceLogSum", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceLogSum", true,{ 3, 2, 2 },{ },{ 1 },{ 3, 1, 2 },{ } },
-                reduce_test_params{ "ReduceLogSum", false,{ 3, 2, 2 },{ },{ 1 },{ 3, 2 },{ } },
-                reduce_test_params{ "ReduceLogSum", false,{ 3, 2, 2 },{ },{ 0, 1, 2 },{},{ } },
-                reduce_test_params{ "ReduceLogSumExp", true,{ 5, 5, 2 },{},{ 2 },{ 5, 5, 1 },{} },
-                reduce_test_params{ "ReduceLogSumExp", true,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 20.f, 2.31326175f, 40.00004578f, 2.31326175f, 60.00671387f, 2.31326175f } },
-                reduce_test_params{ "ReduceLogSumExp", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 20.f, 2.31326175f, 40.00004578f, 2.31326175f, 60.00671387f, 2.31326175f } },
-                reduce_test_params{ "ReduceLogSumExp", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 60.00671387f } },
-                reduce_test_params{ "ReduceMax", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceMax", true,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 20, 2, 40, 2, 60, 2 } },
-                reduce_test_params{ "ReduceMax", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 20, 2, 40, 2, 60, 2 } },
-                reduce_test_params{ "ReduceMax", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 60 } },
-                reduce_test_params{ "ReduceMean", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceMean", true, { 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 12.5f, 1.5f, 35.f, 1.5f, 57.5f, 1.5f } },
-                reduce_test_params{ "ReduceMean", false, { 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 12.5f, 1.5f, 35.f, 1.5f, 57.5f, 1.5f } },
-                reduce_test_params{ "ReduceMean", false, { 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{ },{ 18.25f } },
-                reduce_test_params{ "ReduceMin", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceMin", true,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 5, 1, 30, 1, 55, 1 } },
-                reduce_test_params{ "ReduceMin", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 5, 1, 30, 1, 55, 1 } },
-                reduce_test_params{ "ReduceMin", false,{ 3, 2, 2 },{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 1 } },
-                reduce_test_params{ "ReduceOr", true,{ 2, 2, 2 },{1, 0, 1, 1, 0, 0, 1, 0},{ 2 },{ 2, 2, 1 },{1, 1, 0, 1 } },
-                reduce_test_params{ "ReduceOr", false, { 2, 2, 2 },{},{ 0, 1, 2 },{ },{ 1 } },
-                reduce_test_params{ "ReduceProd", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceProd", true,{ 3, 2, 2 },{},{ 1 },{ 3, 1, 2 },{ 3, 8, 35, 48, 99, 120 } },
-                reduce_test_params{ "ReduceProd", false,{ 3, 2, 2 },{},{ 1 },{ 3, 2 },{ 3, 8, 35, 48, 99, 120 } },
-                reduce_test_params{ "ReduceProd", false,{ 3, 2, 2 },{},{ 0, 1, 2 },{ },{ 4.790016e+08 } },
-                reduce_test_params{ "ReduceSumSquare", true,{ 10, 10, 2 },{},{ 2 },{ 10, 10, 1 },{} },
-                reduce_test_params{ "ReduceSumSquare", true, { 3, 2, 2 },{},{ 1 },{ 3, 1, 2 },{ 10, 20, 74, 100, 202, 244 } },
-                reduce_test_params{ "ReduceSumSquare", false, { 3, 2, 2 },{},{ 1 },{ 3, 2 },{ 10, 20, 74, 100, 202, 244 } },
-                reduce_test_params{ "ReduceSumSquare", false, { 3, 2, 2 },{},{ 0, 1, 2 },{ },{ 650 } }
+// Params: reduce_type, keep_dims, in_shape, inType, input_tensor, axes_for_reduction, out_shape, reference
+                reduce_test_params{ "ReduceAnd", true,{ 2, 2, 2 },"FP32",{1, 0, 1, 1, 0, 1, 1, 0},{ 2 },{ 2, 2, 1 },{ 0, 1, 0, 0} },
+                reduce_test_params{ "ReduceAnd", false, { 2, 2, 2 },"FP32",{1, 0, 1, 1, 0, 1, 1, 0},{ 0, 1, 2 },{ },{ 0 } },
+                reduce_test_params{ "ReduceL1", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{ } },
+                reduce_test_params{ "ReduceL1", true, { 3, 2, 2 },"FP32",{},{ 2 },{ 3, 2, 1 },{ 3, 7, 11, 15, 19, 23 } },
+                reduce_test_params{ "ReduceL1", false, { 3, 2, 2 },"FP32",{},{ 2 },{ 3, 2 },{ 3, 7, 11, 15, 19, 23 } },
+                reduce_test_params{ "ReduceL1", false, { 3, 2, 2 },"FP32",{},{ 0, 1, 2 },{ },{ 78 } },
+                reduce_test_params{ "ReduceL2", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceL2", true,{ 3, 2, 2 },"FP32",{},{ 2 },{ 3, 2, 1 },{ 2.23606798f, 5.f, 7.81024968f, 10.63014581f, 13.45362405f, 16.2788206f } },
+                reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },"FP32",{},{ 2 },{ 3, 2 },{ 2.23606798f, 5.f, 7.81024968f, 10.63014581f, 13.45362405f, 16.2788206f } },
+                reduce_test_params{ "ReduceL2", false,{ 3, 2, 2 },"FP32",{},{ 0, 1, 2 },{ },{ 25.49509757f } },
+                reduce_test_params{ "ReduceLogSum", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceLogSum", true,{ 3, 2, 2 },"FP32",{ },{ 1 },{ 3, 1, 2 },{ } },
+                reduce_test_params{ "ReduceLogSum", false,{ 3, 2, 2 },"FP32",{ },{ 1 },{ 3, 2 },{ } },
+                reduce_test_params{ "ReduceLogSum", false,{ 3, 2, 2 },"FP32",{ },{ 0, 1, 2 },{},{ } },
+                reduce_test_params{ "ReduceLogSumExp", true,{ 5, 5, 2 },"FP32",{},{ 2 },{ 5, 5, 1 },{} },
+                reduce_test_params{ "ReduceLogSumExp", true,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 20.f, 2.31326175f, 40.00004578f, 2.31326175f, 60.00671387f, 2.31326175f } },
+                reduce_test_params{ "ReduceLogSumExp", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 20.f, 2.31326175f, 40.00004578f, 2.31326175f, 60.00671387f, 2.31326175f } },
+                reduce_test_params{ "ReduceLogSumExp", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 60.00671387f } },
+                reduce_test_params{ "ReduceMax", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceMax", true,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 20, 2, 40, 2, 60, 2 } },
+                reduce_test_params{ "ReduceMax", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 20, 2, 40, 2, 60, 2 } },
+                reduce_test_params{ "ReduceMax", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 60 } },
+                reduce_test_params{ "ReduceMean", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceMean", true, { 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 12.5f, 1.5f, 35.f, 1.5f, 57.5f, 1.5f } },
+                reduce_test_params{ "ReduceMean", false, { 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 12.5f, 1.5f, 35.f, 1.5f, 57.5f, 1.5f } },
+                reduce_test_params{ "ReduceMean", false, { 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{ },{ 18.25f } },
+                reduce_test_params{ "ReduceMin", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceMin", true,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 1, 2 },{ 5, 1, 30, 1, 55, 1 } },
+                reduce_test_params{ "ReduceMin", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 1 },{ 3, 2 },{ 5, 1, 30, 1, 55, 1 } },
+                reduce_test_params{ "ReduceMin", false,{ 3, 2, 2 },"FP32",{ 5, 1, 20, 2, 30, 1, 40, 2, 55, 1, 60, 2 },{ 0, 1, 2 },{},{ 1 } },
+                reduce_test_params{ "ReduceOr", true,{ 2, 2, 2 },"FP32",{1, 0, 1, 1, 0, 0, 1, 0},{ 2 },{ 2, 2, 1 },{1, 1, 0, 1 } },
+                reduce_test_params{ "ReduceOr", false, { 2, 2, 2 },"FP32",{},{ 0, 1, 2 },{ },{ 1 } },
+                reduce_test_params{ "ReduceProd", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceProd", true,{ 3, 2, 2 },"FP32",{},{ 1 },{ 3, 1, 2 },{ 3, 8, 35, 48, 99, 120 } },
+                reduce_test_params{ "ReduceProd", false,{ 3, 2, 2 },"FP32",{},{ 1 },{ 3, 2 },{ 3, 8, 35, 48, 99, 120 } },
+                reduce_test_params{ "ReduceProd", false,{ 3, 2, 2 },"FP32",{},{ 0, 1, 2 },{ },{ 4.790016e+08 } },
+                reduce_test_params{ "ReduceSumSquare", true,{ 10, 10, 2 },"FP32",{},{ 2 },{ 10, 10, 1 },{} },
+                reduce_test_params{ "ReduceSumSquare", true, { 3, 2, 2 },"FP32",{},{ 1 },{ 3, 1, 2 },{ 10, 20, 74, 100, 202, 244 } },
+                reduce_test_params{ "ReduceSumSquare", false, { 3, 2, 2 },"FP32",{},{ 1 },{ 3, 2 },{ 10, 20, 74, 100, 202, 244 } },
+                reduce_test_params{ "ReduceSumSquare", false, { 3, 2, 2 },"FP32",{},{ 0, 1, 2 },{ },{ 650 } }
 ));
-
--- a/inference-engine/tests/unit/engines/mkldnn/graph/layers/internal/graph_activation_test.cpp
+++ b/inference-engine/tests/unit/engines/mkldnn/graph/layers/internal/graph_activation_test.cpp
@@ -165,16 +165,16 @@ protected:

        string P1, P2;
        if (p.alg == eltwise_relu) {
-            P1 = string("negative_slope=\"") + to_string_c_locale(p.alpha) + string("\"");
-            P2 = string("beta=\"") + to_string_c_locale(p.beta) + string("\"");
+            P1 = string("negative_slope=\"") + to_string(p.alpha) + string("\"");
+            P2 = string("beta=\"") + to_string(p.beta) + string("\"");
        } else if (p.alg == eltwise_bounded_relu) {
-            P1 = string("n=\"") + to_string_c_locale(p.alpha) + string("\"");
-            P2 = string("beta=\"") + to_string_c_locale(p.beta) + string("\"");
+            P1 = string("n=\"") + to_string(p.alpha) + string("\"");
+            P2 = string("beta=\"") + to_string(p.beta) + string("\"");
        } else if (p.alg == eltwise_tanh) {
            P1 = string("type=\"tanh\"");
        } else {
-            P1 = string("alpha=\"") + to_string_c_locale(p.alpha) + string("\"");
-            P2 = string("beta=\"") + to_string_c_locale(p.beta) + string("\"");
+            P1 = string("alpha=\"") + to_string(p.alpha) + string("\"");
+            P2 = string("beta=\"") + to_string(p.beta) + string("\"");
        }
        REPLACE_WITH_STR(model, "_P1_", P1);
        REPLACE_WITH_STR(model, "_P2_", P2);
--- a/inference-engine/tests/unit/engines/mkldnn/graph/layers/internal/graph_eltwise_test.cpp
+++ b/inference-engine/tests/unit/engines/mkldnn/graph/layers/internal/graph_eltwise_test.cpp
@@ -47,7 +47,7 @@ void ref_eltwise(const std::vector<InferenceEngine::TBlob<data_t>> &src, Inferen
        std::istringstream stream(prm.scales);
        std::string str;
        while (getline(stream, str, ',')) {
-            float val = InferenceEngine::CNNLayer::ie_parse_float(str);
+            float val = std::stof(str);
            scales.push_back(val);
        }
    } else {
@@ -344,7 +344,7 @@ protected:

        std::string scale;
        if (!p.scales.empty()) {
-            scale = std::string("coeff=\"") + to_string_c_locale(p.scales) + std::string("\"");
+            scale = std::string("coeff=\"") + p.scales + std::string("\"");
        }
        REPLACE_WITH_STR(model, "_OP_", op);
        REPLACE_WITH_STR(model, "_COEFF_", scale);
@@ -617,7 +617,7 @@ protected:

        std::string scale;
        if (!p.scales.empty()) {
-            scale = std::string("coeff=\"") + to_string_c_locale(p.scales) + std::string("\"");
+            scale = std::string("coeff=\"") + p.scales + std::string("\"");
        }
        REPLACE_WITH_STR(model, "_OP_", op);
        REPLACE_WITH_STR(model, "_COEFF_", scale);
--- a/inference-engine/tests/unit/graph_tools/graph_tools_functional_tests.cpp
+++ b/inference-engine/tests/unit/graph_tools/graph_tools_functional_tests.cpp
@@ -36,4 +36,3 @@ public:
        }
    }
 };
-
--- a/inference-engine/tests/unit/inference_engine_tests/local_test.cpp
+++ b/inference-engine/tests/unit/inference_engine_tests/local_test.cpp
@@ -6,7 +6,6 @@
 #include <single_layer_common.hpp>

 #include <cpp/ie_cnn_net_reader.h>
-#include <net_pass.h>

 using namespace ::testing;
 using namespace std;
@@ -97,103 +96,6 @@ class LocaleTests : public ::testing::Test {
 </net>
 )V0G0N";

-
-    std::string _model_LSTM = R"V0G0N(
- <net batch="1" name="model" version="2">
-    <layers>
-        <layer id="0" name="Input" precision="FP32" type="Input">
-            <output>
-                <port id="0">
-                    <dim>1</dim>
-                    <dim>30</dim>
-                </port>
-            </output>
-        </layer>
-        <layer id="1" name="Split" precision="FP32" type="Split">
-            <data axis="1" />
-            <input>
-                <port id="0">
-                    <dim>1</dim>
-                    <dim>30</dim>
-                </port>
-            </input>
-            <output>
-                <port id="1">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="2">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="3">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-            </output>
-        </layer>
-        <layer id="2" name="LSTMCell" precision="FP32" type="LSTMCell">
-            <data hidden_size="10" clip="0.2"/>
-            <input>
-                <port id="0">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="1">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="2">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-            </input>
-            <output>
-                <port id="3">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="4">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-            </output>
-            <blobs>
-                <weights offset="0" size="3200"/>
-                <biases offset="3200" size="160"/>
-            </blobs>
-        </layer>
-        <layer name="Eltwise" type="Eltwise" id="3" precision="FP32">
-            <data operation="sum" />
-            <input>
-                <port id="0">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-                <port id="1">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-            </input>
-            <output>
-                <port id="2">
-                    <dim>1</dim>
-                    <dim>10</dim>
-                </port>
-            </output>
-        </layer>
-        </layers>
-        <edges>
-            <edge from-layer="0" from-port="0" to-layer="1" to-port="0"/>
-            <edge from-layer="1" from-port="1" to-layer="2" to-port="0"/>
-            <edge from-layer="1" from-port="2" to-layer="2" to-port="1"/>
-            <edge from-layer="1" from-port="3" to-layer="2" to-port="2"/>
-            <edge from-layer="2" from-port="3" to-layer="3" to-port="0"/>
-            <edge from-layer="2" from-port="4" to-layer="3" to-port="1"/>
-        </edges>
-    </net>
-)V0G0N";
-
 protected:
    std::string getModel() const {
        std::string model = _model;
@@ -206,35 +108,28 @@ protected:
        return model;
    }

-    void testBody(bool isLSTM = false) const {
+    void testBody() const {
        CNNNetReader reader;

        // This model contains layers with float attributes.
        // Conversion from string may be affected by locale.
-        std::string model = isLSTM ? _model_LSTM : getModel();
+        auto model = getModel();
        reader.ReadNetwork(model.data(), model.length());
        auto net = reader.getNetwork();

-        if (!isLSTM) {
-            auto power_layer = dynamic_pointer_cast<PowerLayer>(net.getLayerByName("power"));
-            ASSERT_EQ(power_layer->scale, 0.75f);
-            ASSERT_EQ(power_layer->offset, 0.35f);
-            ASSERT_EQ(power_layer->power, 0.5f);
+        auto power_layer = dynamic_pointer_cast<PowerLayer>(net.getLayerByName("power"));
+        ASSERT_EQ(power_layer->scale, 0.75f);
+        ASSERT_EQ(power_layer->offset, 0.35f);
+        ASSERT_EQ(power_layer->power, 0.5f);

-            auto sum_layer = dynamic_pointer_cast<EltwiseLayer>(net.getLayerByName("sum"));
-            std::vector<float> ref_coeff{0.77f, 0.33f};
-            ASSERT_EQ(sum_layer->coeff, ref_coeff);
+        auto sum_layer = dynamic_pointer_cast<EltwiseLayer>(net.getLayerByName("sum"));
+        std::vector<float> ref_coeff {0.77f, 0.33f};
+        ASSERT_EQ(sum_layer->coeff, ref_coeff);

-            auto info = net.getInputsInfo();
-            auto preproc = info.begin()->second->getPreProcess();
-            ASSERT_EQ(preproc[0]->stdScale, 0.1f);
-            ASSERT_EQ(preproc[0]->meanValue, 104.006f);
-        } else {
-            InferenceEngine::NetPass::UnrollRNN_if(net, [] (const RNNCellBase& rnn) -> bool { return true; });
-            auto lstmcell_layer = dynamic_pointer_cast<LSTMCell>(net.getLayerByName("LSTMCell"));
-            float ref_coeff(0.2f);
-            ASSERT_EQ(lstmcell_layer->clip, ref_coeff);
-        }
+        auto info = net.getInputsInfo();
+        auto preproc = info.begin()->second->getPreProcess();
+        ASSERT_EQ(preproc[0]->stdScale, 0.1f);
+        ASSERT_EQ(preproc[0]->meanValue, 104.006f);
    }
 };

@@ -250,18 +145,6 @@ TEST_F(LocaleTests, WithUSLocale) {
    setlocale(LC_ALL, "");
 }

-TEST_F(LocaleTests, WithRULocaleOnLSTM) {
-    setlocale(LC_ALL, "ru_RU.UTF-8");
-    testBody(true);
-    setlocale(LC_ALL, "");
-}
-
-TEST_F(LocaleTests, WithUSLocaleOnLSTM) {
-    setlocale(LC_ALL, "en_US.UTF-8");
-    testBody(true);
-    setlocale(LC_ALL, "");
-}
-
 TEST_F(LocaleTests, DISABLED_WithRULocaleCPP) {
    auto prev = std::locale();
    std::locale::global(std::locale("ru_RU.UTF-8"));
--- a/inference-engine/thirdparty/clDNN/kernel_selector/core/auto_tuner.cpp
+++ b/inference-engine/thirdparty/clDNN/kernel_selector/core/auto_tuner.cpp
@@ -93,7 +93,6 @@ void AutoTuner::StoreKernel(const std::string& cacheFilePath,
    std::ofstream cachedKernelsFile(cacheFilePath);
    rapidjson::StringBuffer buffer(0, 1024);
    rapidjson::PrettyWriter<rapidjson::StringBuffer> writer(buffer);
-    writer.SetFormatOptions(rapidjson::PrettyFormatOptions::kFormatSingleLineArray);
    onlineCache->Accept(writer);
    auto temp = buffer.GetString();
    cachedKernelsFile << temp;
--- a/inference-engine/thirdparty/clDNN/kernel_selector/core/cache/cache.json
+++ b/inference-engine/thirdparty/clDNN/kernel_selector/core/cache/cache.json
--- a/inference-engine/thirdparty/clDNN/kernel_selector/core/common/training_params.h
+++ b/inference-engine/thirdparty/clDNN/kernel_selector/core/common/training_params.h
@@ -27,8 +27,8 @@ struct training_params : public weight_bias_params {
    explicit training_params(KernelType kt) : weight_bias_params(kt) {}

    bool use_momentum = false;
-    float weights_decay = 0.0;
-    float momentum_factor = 0.0;
+    float weights_decay;
+    float momentum_factor;

    ParamsKey GetParamsKey() const override;
 };
--- a/inference-engine/thirdparty/clDNN/src/gpu/ocl_queue_wrapper.h
+++ b/inference-engine/thirdparty/clDNN/src/gpu/ocl_queue_wrapper.h
@@ -59,20 +59,6 @@ public:
          _last_barrier_ev(other._last_barrier_ev),
          _output_event(other._output_event) {}

-    gpu_queue& operator=(gpu_queue&& other) {
-        if (this != &other) {
-            id = other.id;
-            _context = std::move(other._context);
-            _command_queue = std::move(other._command_queue);
-            _queue_counter = std::move(other._queue_counter.load());
-            _last_barrier = std::move(other._last_barrier.load());
-            _events_pool = std::move(std::move(other._events_pool));
-            _last_barrier_ev = std::move(other._last_barrier_ev);
-            _output_event = std::move(other._output_event);
-        }
-        return *this;
-    }
-
    ~gpu_queue() = default;

    void sync_events(std::vector<event_impl::ptr> const& deps);
--- a/inference-engine/thirdparty/clDNN/src/gpu/ocl_toolkit.cpp
+++ b/inference-engine/thirdparty/clDNN/src/gpu/ocl_toolkit.cpp
@@ -204,13 +204,9 @@ void gpu_toolkit::set_output_event(uint16_t queue_id, bool out_event) {
 std::ofstream& gpu_toolkit::open_log() {
    if (!_logger->_log_file.is_open()) {
        _logger->_log_file.open(_configuration.log, std::ios::out | std::ios::trunc);
-        if (!_logger->_log_file.good()) {
-            _logger->_log_file.close();
+        if (!_logger->_log_file.good())
            throw std::runtime_error("Could not initialize ocl_toolkit log file");
-        }
-
        if (!_logger->_log_file.is_open()) {
-            _logger->_log_file.close();
            throw std::runtime_error("Could not open ocl_toolkit log file '" + _configuration.log + "' for writing");
        }
    }
--- a/inference-engine/thirdparty/clDNN/src/graph_optimizer/prepare_primitive_fusing.cpp
+++ b/inference-engine/thirdparty/clDNN/src/graph_optimizer/prepare_primitive_fusing.cpp
@@ -511,13 +511,9 @@ void prepare_conv_eltw_fusing::run(program_impl& p) {
    }

    // fuse conv + eltwise after activations
-    auto conv_itr = conv_nodes.begin();
-    while (conv_itr != conv_nodes.end()) {
-        auto node_itr = conv_itr++;
-
-        if (node_itr == conv_nodes.end())
-            break;
-
+    itr = conv_nodes.begin();
+    while (itr != conv_nodes.end()) {
+        auto node_itr = itr++;
        auto& node = (*node_itr);

        fuse_conv_eltwise(p, node);
--- a/inference-engine/thirdparty/clDNN/src/layout_optimizer.cpp
+++ b/inference-engine/thirdparty/clDNN/src/layout_optimizer.cpp
@@ -375,7 +375,7 @@ layout layout_optimizer::get_expected_layout(layout const& current_layout,
        expected_tensor = current_layout.size;
        expected_format = cldnn::format::bfzyx;
    } else if ((_optimization_attributes.bfyx_f16_network &&
-                convolution_bfyx_f16_opt(node.get_dependency(0).get_output_layout(), output_or_weights_layout, prim)) ||
+                convolution_bfyx_f16_opt(current_layout, output_or_weights_layout, prim)) ||
                node.get_dependency(0).get_output_layout().format == format::bfyx_f16) {
        expected_tensor = current_layout.size;
        expected_format = cldnn::format::bfyx_f16;
--- a/inference-engine/thirdparty/clDNN/src/program.cpp
+++ b/inference-engine/thirdparty/clDNN/src/program.cpp
@@ -1041,13 +1041,17 @@ void program_impl::dump_program(const char* stage,
                                bool with_full_info,
                                std::function<bool(program_node const&)> const& filter) const {
    std::string path = get_dir_path(options);
-    if (path.empty() || !with_full_info) {
+    if (path.empty()) {
        return;
    }

    std::ofstream graph(path + "cldnn_program_" + std::to_string(prog_id) + "_" + stage + ".graph");
    dump_graph_init(graph, *this, filter);

+    if (!with_full_info) {
+        return;
+    }
+
    graph.open(path + "cldnn_program_" + std::to_string(prog_id) + "_" + stage + ".info");
    dump_graph_info(graph, *this, filter);

--- a/inference-engine/thirdparty/mkl-dnn/src/cpu/ref_depthwise.cpp
+++ b/inference-engine/thirdparty/mkl-dnn/src/cpu/ref_depthwise.cpp
@@ -74,9 +74,7 @@ void ref_depthwise_fwd_t<data_type>::execute_forward() const {

    parallel_nd(MB, C, D, H, W,
        [&](int n, int c, int d, int h, int w) {
-        size_t data_off = data_d.ndims() == 3
-                        ? data_d.off(n, c, d)
-                        : data_d.ndims() == 4
+        size_t data_off = data_d.ndims() == 4
                        ? data_d.off(n, c, h, w)
                        : data_d.ndims() == 5
                            ? data_d.off(n, c, d, h, w)
--- a/inference-engine/thirdparty/movidius/WinPthread/win_pthread.c
+++ b/inference-engine/thirdparty/movidius/WinPthread/win_pthread.c
@@ -23,6 +23,7 @@ int pthread_mutex_unlock(pthread_mutex_t *mutex)
 int pthread_mutex_init(pthread_mutex_t *mutex,
                    pthread_mutexattr_t *attr)
 {
+    (void)attr;
    InitializeCriticalSection(mutex);

    return 0;
@@ -60,6 +61,7 @@ int pthread_attr_init(pthread_attr_t *attr)

 int pthread_attr_destroy(pthread_attr_t *attr)
 {
+    (void)attr;
    return 0;
 }

--- a/inference-engine/thirdparty/movidius/XLink/XLink.cmake
+++ b/inference-engine/thirdparty/movidius/XLink/XLink.cmake
@@ -30,7 +30,11 @@ set(XLINK_SOURCES
        ${MV_COMMON_BASE}/XLink/pc/XLinkPlatform.c
        ${MV_COMMON_BASE}/XLink/pc/usb_boot.c
        ${MV_COMMON_BASE}/XLink/pc/pcie_host.c
-        ${MV_COMMON_BASE}/XLink/shared/XLink.c
+        ${MV_COMMON_BASE}/XLink/shared/XLinkDeprecated.c
+        ${MV_COMMON_BASE}/XLink/shared/XLinkPrivateFields.c
+        ${MV_COMMON_BASE}/XLink/shared/XLinkDispatcherImpl.c
+        ${MV_COMMON_BASE}/XLink/shared/XLinkDevice.c
+        ${MV_COMMON_BASE}/XLink/shared/XLinkStream.c
        ${MV_COMMON_BASE}/XLink/shared/XLinkDispatcher.c
        ${MV_COMMON_BASE}/shared/src/mvStringUtils.c
        )
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLink.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLink.c
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLink.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLink.h
@@ -16,85 +16,139 @@ extern "C"
 {
 #endif

-// Set global common time out for all XLink operations.
-XLinkError_t XLinkSetCommonTimeOutMsec(unsigned int msec);
+// ------------------------------------
+// Device management. Begin.
+// ------------------------------------

-// Set global device open time out for all XLink operations.
-XLinkError_t XLinkSetDeviceOpenTimeOutMsec(unsigned int msec);
-
-// Set global allocate graph time out for all XLink operations.
-XLinkError_t XLinkSetAllocateGraphTimeOutMsec(unsigned int msec);
-
-// Initializes XLink and scheduler
+/**
+ * @brief Initializes XLink and scheduler
+ * @param handler[in] XLink global communication parameters
+ * Now XLink can work with PCIe and USB simultaneously.
+ */
 XLinkError_t XLinkInitialize(XLinkGlobalHandler_t* handler);

-// Connects to specific device, starts dispatcher and pings remote
-XLinkError_t XLinkConnect(XLinkHandler_t* handler);
-
-// Opens a stream in the remote that can be written to by the local
-// Allocates stream_write_size (aligned up to 64 bytes) for that stream
-streamId_t XLinkOpenStream(linkId_t id, const char* name, int stream_write_size);
-
-// Close stream for any further data transfer
-// Stream will be deallocated when all pending data has been released
-XLinkError_t XLinkCloseStream(streamId_t streamId);
-
-// Currently useless
-XLinkError_t XLinkGetAvailableStreams(linkId_t id);
-
 /**
 * @brief Return Myriad device description which meets the requirements
+ * @param[in]   state - state of device enum (booted, not booted or any state)
+ * @param[in]   in_deviceRequirements - structure with device requirements (protocol, platform).
+ * @note        If in_deviceRequirements has device name specified,
+ *                  this function tries to get a device with that exact name
+ *                  and fails if such device is unavailable
+ * @param[out]  out_foundDevice - found device description
 */
 XLinkError_t XLinkFindFirstSuitableDevice(XLinkDeviceState_t state,
                                          const deviceDesc_t in_deviceRequirements,
                                          deviceDesc_t *out_foundDevice);

 /**
- * @brief Return Myriad device description which meets the requirements
+ * @brief Return all Myriad devices description which meets the requirements
+ * @param[in]      state - state of device enum (booted, not booted or any state)
+ * @param[in]      in_deviceRequirements - structure with device requirements (protocol, platform).
+ * @param[in,out]  out_foundDevicesPtr - pointer to array with all found devices descriptions
+ * @param[out]     out_foundDevicesCount - amount of found devices
 */
 XLinkError_t XLinkFindAllSuitableDevices(XLinkDeviceState_t state,
                                         const deviceDesc_t in_deviceRequirements,
                                         deviceDesc_t *out_foundDevicesPtr,
                                         const unsigned int devicesArraySize,
-                                         unsigned int *out_amountOfFoundDevices);
+                                         unsigned int *out_foundDevicesCount);

-// Send a package to initiate the writing of data to a remote stream
-// Note that the actual size of the written data is ALIGN_UP(size, 64)
-XLinkError_t XLinkWriteData(streamId_t streamId, const uint8_t* buffer, int size);
+/**
+ * @brief Connects to specific device, starts dispatcher and pings remote
+ * @param[in,out] handler – XLink communication parameters (file path name for underlying layer)
+ */
+XLinkError_t XLinkConnect(XLinkHandler_t* handler);

-// Send a package to initiate the writing of data to a remote stream with specific timeout
-// Note that the actual size of the written data is ALIGN_UP(size, 64)
-XLinkError_t XLinkWriteDataWithTimeout(streamId_t streamId, const uint8_t* buffer, int size, unsigned int timeout);
-
-// Currently useless
-XLinkError_t XLinkAsyncWriteData();
-
-// Read data from local stream. Will only have something if it was written
-// to by the remote
-XLinkError_t XLinkReadData(streamId_t streamId, streamPacketDesc_t** packet);
-XLinkError_t XLinkReadDataWithTimeOut(streamId_t streamId, streamPacketDesc_t** packet, unsigned int timeout);
-
-// Release data from stream - This should be called after ReadData
-XLinkError_t XLinkReleaseData(streamId_t streamId);
-
-//Read fill level
-XLinkError_t XLinkGetFillLevel(streamId_t streamId, int isRemote, int* fillLevel);
-
-// Boot the remote (This is intended as an interface to boot the Myriad from PC)
+/**
+ * @brief Boots specified firmware binary to the remote device
+ * @param deviceDesc - device description structure, obtained from XLinkFind* functions call
+ * @param binaryPath - path to the *.mvcmd file
+ */
 XLinkError_t XLinkBoot(deviceDesc_t* deviceDesc, const char* binaryPath);

-// Reset the remote
+/**
+ * @brief Reset the remote device and close all open local handles for this device
+ * @warning This function should be used in a host application
+ * @param[in] id – link Id obtained from XLinkConnect in the handler parameter
+ */
 XLinkError_t XLinkResetRemote(linkId_t id);

-// Close all and release all memory
+/**
+ * @brief Close all and release all memory
+ */
 XLinkError_t XLinkResetAll();

-// Profiling funcs - keeping them global for now
+/**
+ * @brief Profiling funcs - keeping them global for now
+ */
 XLinkError_t XLinkProfStart();
 XLinkError_t XLinkProfStop();
 XLinkError_t XLinkProfPrint();

-XLinkError_t XLinkWriteGraphData(streamId_t streamId, const uint8_t* buffer, int size);
+// ------------------------------------
+// Device management. End.
+// ------------------------------------
+
+
+
+
+// ------------------------------------
+// Device streams management. Begin.
+// ------------------------------------
+
+/**
+ * @brief Opens a stream in the remote that can be written to by the local
+ *        Allocates stream_write_size (aligned up to 64 bytes) for that stream
+ * @param[in] id – link Id obtained from XLinkConnect in the handler parameter
+ * @param[in] name – stream name
+ * @param[in] stream_write_size – stream buffer size
+ */
+streamId_t XLinkOpenStream(linkId_t id, const char* name, int stream_write_size);
+
+/**
+ * @brief Close stream for any further data transfer
+ *        Stream will be deallocated when all pending data has been released
+ * @param[in] streamId - link Id obtained from XLinkOpenStream call
+ */
+XLinkError_t XLinkCloseStream(streamId_t streamId);
+
+/**
+ * @brief Send a package to initiate the writing of data to a remote stream
+ * @warning Actual size of the written data is ALIGN_UP(size, 64)
+ * @param[in] streamId – stream link Id obtained from XLinkOpenStream call
+ * @param[in] buffer – data buffer to be transmitted
+ * @param[in] size – size of the data to be transmitted
+ */
+XLinkError_t XLinkWriteData(streamId_t streamId, const uint8_t* buffer, int size);
+
+/**
+ * @brief Read data from local stream. Will only have something if it was written to by the remote
+ * @param[in]   streamId – stream link Id obtained from XLinkOpenStream call
+ * @param[out]  packet – structure containing output data buffer and received size
+ */
+XLinkError_t XLinkReadData(streamId_t streamId, streamPacketDesc_t** packet);
+
+/**
+ * @brief Release data from stream - This should be called after the data obtained from
+ *  XlinkReadData is processed
+ * @param[in] streamId – stream link Id obtained from XLinkOpenStream call
+ */
+XLinkError_t XLinkReleaseData(streamId_t streamId);
+
+/**
+ * @brief Read fill level of the local or remote queues
+ * @param[in]   streamId – stream link Id obtained from XLinkOpenStream call
+ * @param[in]   isRemote – 0 – local queue; any other value – remote queue
+ * @param[out]  fillLevel – fill level of the selected queue
+ */
+XLinkError_t XLinkGetFillLevel(streamId_t streamId, int isRemote, int* fillLevel);
+
+// ------------------------------------
+// Device streams management. End.
+// ------------------------------------
+
+
+

 // ------------------------------------
 // Deprecated API. Begin.
@@ -104,9 +158,18 @@ XLinkError_t XLinkGetDeviceName(int index, char* name, int nameSize);
 XLinkError_t XLinkGetDeviceNameExtended(int index, char* name, int nameSize, int pid);

 XLinkError_t XLinkBootRemote(const char* deviceName, const char* binaryPath);
-
 XLinkError_t XLinkDisconnect(linkId_t id);

+XLinkError_t XLinkGetAvailableStreams(linkId_t id);
+
+XLinkError_t XLinkWriteDataWithTimeout(streamId_t streamId, const uint8_t* buffer, int size, unsigned int timeout);
+XLinkError_t XLinkAsyncWriteData();
+
+XLinkError_t XLinkReadDataWithTimeOut(streamId_t streamId, streamPacketDesc_t** packet, unsigned int timeout);
+
+XLinkError_t XLinkSetDeviceOpenTimeOutMsec(unsigned int msec);
+XLinkError_t XLinkSetCommonTimeOutMsec(unsigned int msec);
+
 // ------------------------------------
 // Deprecated API. End.
 // ------------------------------------
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDeprecated.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDeprecated.c
@@ -0,0 +1,171 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "string.h"
+#include "stdlib.h"
+
+#include "XLink.h"
+#include "XLinkTool.h"
+#include "XLinkPlatform.h"
+#include "XLinkPublicDefines.h"
+#include "XLinkPrivateFields.h"
+
+#ifdef MVLOG_UNIT_NAME
+#undef MVLOG_UNIT_NAME
+#define MVLOG_UNIT_NAME xLink
+#endif
+#include "mvLog.h"
+#include "mvStringUtils.h"
+
+// ------------------------------------
+// Deprecated API. Begin.
+// ------------------------------------
+
+XLinkError_t getDeviceName(int index, char* name, int nameSize, XLinkPlatform_t platform, XLinkDeviceState_t state)
+{
+    ASSERT_X_LINK(name != NULL);
+    ASSERT_X_LINK(index >= 0);
+    ASSERT_X_LINK(nameSize >= 0 && nameSize <= XLINK_MAX_NAME_SIZE);
+
+    deviceDesc_t in_deviceRequirements = {};
+    in_deviceRequirements.protocol = glHandler != NULL ? glHandler->protocol : USB_VSC;
+    in_deviceRequirements.platform = platform;
+    memset(name, 0, nameSize);
+
+    if(index == 0)
+    {
+        deviceDesc_t deviceToBoot = {};
+        XLinkError_t rc =
+            XLinkFindFirstSuitableDevice(state, in_deviceRequirements, &deviceToBoot);
+        if(rc != X_LINK_SUCCESS)
+        {
+            return rc;
+        }
+
+        return mv_strcpy(name, nameSize, deviceToBoot.name) == EOK ? X_LINK_SUCCESS : X_LINK_ERROR;
+    }
+    else
+    {
+        deviceDesc_t deviceDescArray[XLINK_MAX_DEVICES] = {};
+        unsigned int numberOfDevices = 0;
+        XLinkError_t rc =
+            XLinkFindAllSuitableDevices(state, in_deviceRequirements,
+                                        deviceDescArray, XLINK_MAX_DEVICES, &numberOfDevices);
+        if(rc != X_LINK_SUCCESS)
+        {
+            return rc;
+        }
+
+        if((unsigned int)index >= numberOfDevices)
+        {
+            return X_LINK_DEVICE_NOT_FOUND;
+        }
+
+        return mv_strcpy(name, nameSize, deviceDescArray[index].name) == EOK ? X_LINK_SUCCESS : X_LINK_ERROR;
+    }
+}
+
+XLinkError_t XLinkGetDeviceName(int index, char* name, int nameSize)
+{
+    return getDeviceName(index, name, nameSize, X_LINK_ANY_PLATFORM, X_LINK_ANY_STATE);
+}
+
+XLinkError_t XLinkGetDeviceNameExtended(int index, char* name, int nameSize, int pid)
+{
+    XLinkDeviceState_t state = XLinkPlatformPidToState(pid);
+    XLinkPlatform_t platform = XLinkPlatformPidToPlatform(pid);
+
+    return getDeviceName(index, name, nameSize, platform, state);
+}
+
+XLinkError_t XLinkBootRemote(const char* deviceName, const char* binaryPath)
+{
+    ASSERT_X_LINK(deviceName != NULL);
+    ASSERT_X_LINK(binaryPath != NULL);
+
+    deviceDesc_t deviceDesc = {};
+    deviceDesc.protocol = glHandler != NULL ? glHandler->protocol : USB_VSC;
+    mv_strcpy(deviceDesc.name, XLINK_MAX_NAME_SIZE, deviceName);
+
+    return XLinkBoot(&deviceDesc, binaryPath);
+}
+
+XLinkError_t XLinkDisconnect(linkId_t id)
+{
+    xLinkDesc_t* link = getLinkById(id);
+    ASSERT_X_LINK(link != NULL);
+
+    link->hostClosedFD = 1;
+    return XLinkPlatformCloseRemote(&link->deviceHandle);
+}
+
+XLinkError_t XLinkGetAvailableStreams(linkId_t id)
+{
+    (void)id;
+    return X_LINK_NOT_IMPLEMENTED;
+}
+
+XLinkError_t XLinkWriteDataWithTimeout(streamId_t streamId, const uint8_t* buffer,
+                                       int size, unsigned int timeout)
+{
+    (void)timeout;
+    return XLinkWriteData(streamId, buffer, size);
+}
+
+XLinkError_t XLinkReadDataWithTimeOut(streamId_t streamId, streamPacketDesc_t** packet, unsigned int timeout)
+{
+    (void)timeout;
+    return XLinkReadData(streamId, packet);
+}
+
+XLinkError_t XLinkAsyncWriteData()
+{
+    return X_LINK_NOT_IMPLEMENTED;
+}
+
+XLinkError_t XLinkSetDeviceOpenTimeOutMsec(unsigned int msec)  {
+    (void)msec;
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkSetCommonTimeOutMsec(unsigned int msec) {
+    (void)msec;
+    return X_LINK_SUCCESS;
+}
+
+// ------------------------------------
+// Deprecated API. End.
+// ------------------------------------
+
+// ------------------------------------
+// Public helpers. Begin.
+// ------------------------------------
+
+const char* XLinkErrorToStr(XLinkError_t rc) {
+    switch (rc) {
+        case X_LINK_SUCCESS:
+            return "X_LINK_SUCCESS";
+        case X_LINK_ALREADY_OPEN:
+            return "X_LINK_ALREADY_OPEN";
+        case X_LINK_COMMUNICATION_NOT_OPEN:
+            return "X_LINK_COMMUNICATION_NOT_OPEN";
+        case X_LINK_COMMUNICATION_FAIL:
+            return "X_LINK_COMMUNICATION_FAIL";
+        case X_LINK_COMMUNICATION_UNKNOWN_ERROR:
+            return "X_LINK_COMMUNICATION_UNKNOWN_ERROR";
+        case X_LINK_DEVICE_NOT_FOUND:
+            return "X_LINK_DEVICE_NOT_FOUND";
+        case X_LINK_TIMEOUT:
+            return "X_LINK_TIMEOUT";
+        case X_LINK_OUT_OF_MEMORY:
+            return "X_LINK_OUT_OF_MEMORY";
+        case X_LINK_ERROR:
+        default:
+            return "X_LINK_ERROR";
+    }
+}
+
+// ------------------------------------
+// Public helpers. End.
+// ------------------------------------
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDevice.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDevice.c
@@ -0,0 +1,372 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "stdio.h"
+#include "stdint.h"
+#include "string.h"
+#include "stdlib.h"
+
+#include "XLink.h"
+#include "XLinkTool.h"
+
+#include "XLinkPlatform.h"
+#include "XLinkPrivateFields.h"
+#include "XLinkDispatcherImpl.h"
+
+#ifdef MVLOG_UNIT_NAME
+#undef MVLOG_UNIT_NAME
+#define MVLOG_UNIT_NAME xLink
+#endif
+#include "mvLog.h"
+#include "mvStringUtils.h"
+
+#define MAX_PATH_LENGTH (255)
+
+// ------------------------------------
+// Global fields. Begin.
+// ------------------------------------
+
+XLinkGlobalHandler_t* glHandler; //TODO need to either protect this with semaphor
+                                 //or make profiling data per device
+
+xLinkDesc_t availableXLinks[MAX_LINKS];
+sem_t  pingSem; //to b used by myriad
+DispatcherControlFunctions controlFunctionTbl;
+linkId_t nextUniqueLinkId = 0; //incremental number, doesn't get decremented.
+
+// ------------------------------------
+// Global fields. End.
+// ------------------------------------
+
+
+
+// ------------------------------------
+// Helpers declaration. Begin.
+// ------------------------------------
+
+static XLinkError_t parseUsbLinkPlatformError(xLinkPlatformErrorCode_t rc);
+static int getNextAvailableLinkIndex();
+static linkId_t getNextAvailableLinkUniqueId();
+
+// ------------------------------------
+// Helpers declaration. End.
+// ------------------------------------
+
+
+
+// ------------------------------------
+// API implementation. Begin.
+// ------------------------------------
+
+XLinkError_t XLinkInitialize(XLinkGlobalHandler_t* handler)
+{
+#ifndef __PC__
+    mvLogLevelSet(MVLOG_FATAL);
+    mvLogDefaultLevelSet(MVLOG_FATAL);
+#endif
+
+    ASSERT_X_LINK(handler);
+    ASSERT_X_LINK(XLINK_MAX_STREAMS <= MAX_POOLS_ALLOC);
+    glHandler = handler;
+    if (sem_init(&pingSem,0,0)) {
+        mvLog(MVLOG_ERROR, "Can't create semaphore\n");
+    }
+    int i;
+
+    XLinkPlatformInit();
+
+    //Using deprecated fields. Begin.
+    int loglevel = handler->loglevel;
+    int protocol = handler->protocol;
+    //Using deprecated fields. End.
+
+    memset((void*)handler, 0, sizeof(XLinkGlobalHandler_t));
+
+    //Using deprecated fields. Begin.
+    handler->loglevel = loglevel;
+    handler->protocol = protocol;
+    //Using deprecated fields. End.
+
+    //initialize availableStreams
+    xLinkDesc_t* link;
+    for (i = 0; i < MAX_LINKS; i++) {
+        link = &availableXLinks[i];
+        link->id = INVALID_LINK_ID;
+        link->deviceHandle.xLinkFD = NULL;
+        link->peerState = XLINK_NOT_INIT;
+        int stream;
+        for (stream = 0; stream < XLINK_MAX_STREAMS; stream++)
+            link->availableStreams[stream].id = INVALID_STREAM_ID;
+    }
+
+    controlFunctionTbl.eventReceive = &dispatcherEventReceive;
+    controlFunctionTbl.eventSend = &dispatcherEventSend;
+    controlFunctionTbl.localGetResponse = &dispatcherLocalEventGetResponse;
+    controlFunctionTbl.remoteGetResponse = &dispatcherRemoteEventGetResponse;
+    controlFunctionTbl.closeLink = &dispatcherCloseLink;
+    controlFunctionTbl.closeDeviceFd = &dispatcherCloseDeviceFd;
+
+    if (dispatcherInitialize(&controlFunctionTbl))
+    {
+#ifdef __PC__
+        return X_LINK_TIMEOUT;
+#endif
+    }
+
+#ifndef __PC__
+    int index = getNextAvailableLinkIndex();
+    if (index == -1)
+        return X_LINK_COMMUNICATION_NOT_OPEN;
+
+    link = &availableXLinks[index];
+    link->deviceHandle.xLinkFD = NULL;
+    link->id = nextUniqueLinkId++;
+    link->peerState = XLINK_UP;
+
+    sem_wait(&pingSem);
+#endif
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkFindFirstSuitableDevice(XLinkDeviceState_t state,
+                                          const deviceDesc_t in_deviceRequirements,
+                                          deviceDesc_t *out_foundDevice)
+{
+    ASSERT_X_LINK(out_foundDevice);
+
+    xLinkPlatformErrorCode_t rc;
+    rc = XLinkPlatformFindDeviceName(state, in_deviceRequirements, out_foundDevice);
+    return parseUsbLinkPlatformError(rc);
+}
+
+XLinkError_t XLinkFindAllSuitableDevices(XLinkDeviceState_t state,
+                                         deviceDesc_t in_deviceRequirements,
+                                         deviceDesc_t *out_foundDevicesPtr,
+                                         const unsigned int devicesArraySize,
+                                         unsigned int* out_foundDevicesCount) {
+    ASSERT_X_LINK(out_foundDevicesPtr);
+    ASSERT_X_LINK(devicesArraySize > 0);
+    ASSERT_X_LINK(out_foundDevicesCount);
+
+    xLinkPlatformErrorCode_t rc;
+    rc = XLinkPlatformFindArrayOfDevicesNames(
+        state, in_deviceRequirements,
+        out_foundDevicesPtr, devicesArraySize, out_foundDevicesCount);
+
+    return parseUsbLinkPlatformError(rc);
+}
+
+//Called only from app - per device
+XLinkError_t XLinkConnect(XLinkHandler_t* handler)
+{
+    ASSERT_X_LINK(handler);
+    if (strnlen(handler->devicePath, MAX_PATH_LENGTH) < 2) {
+        mvLog(MVLOG_ERROR, "Device path is incorrect");
+        return X_LINK_ERROR;
+    }
+
+    int index = getNextAvailableLinkIndex();
+    ASSERT_X_LINK(index != -1);
+
+    xLinkDesc_t* link = &availableXLinks[index];
+    mvLog(MVLOG_DEBUG,"%s() device name %s glHandler %p protocol %d\n", __func__, handler->devicePath, glHandler, handler->protocol);
+
+    link->deviceHandle.protocol = handler->protocol;
+    if (XLinkPlatformConnect(handler->devicePath2, handler->devicePath,
+                             link->deviceHandle.protocol, &link->deviceHandle.xLinkFD) < 0) {
+        return X_LINK_ERROR;
+    }
+
+    if (dispatcherStart(&link->deviceHandle))
+        return X_LINK_TIMEOUT;
+
+    xLinkEvent_t event = {0};
+
+    event.header.type = XLINK_PING_REQ;
+    event.deviceHandle = link->deviceHandle;
+    dispatcherAddEvent(EVENT_LOCAL, &event);
+
+    if (dispatcherWaitEventComplete(&link->deviceHandle)) {
+        dispatcherClean(link->deviceHandle.xLinkFD);
+        return X_LINK_TIMEOUT;
+    }
+
+    link->id = getNextAvailableLinkUniqueId();
+    link->peerState = XLINK_UP;
+    link->hostClosedFD = 0;
+    handler->linkId = link->id;
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkBoot(deviceDesc_t* deviceDesc, const char* binaryPath)
+{
+    if (XLinkPlatformBootRemote(deviceDesc, binaryPath) == 0)
+        return X_LINK_SUCCESS;
+    else
+        return X_LINK_COMMUNICATION_FAIL;
+}
+
+XLinkError_t XLinkResetRemote(linkId_t id)
+{
+    xLinkDesc_t* link = getLinkById(id);
+    ASSERT_X_LINK(link != NULL);
+    if (getXLinkState(link) != XLINK_UP)
+    {
+        mvLog(MVLOG_WARN, "Link is down, close connection to device without reset");
+        XLinkPlatformCloseRemote(&link->deviceHandle);
+        return X_LINK_COMMUNICATION_NOT_OPEN;
+    }
+
+    // Add event to reset device. After sending it, dispatcher will close fd link
+    xLinkEvent_t event = {0};
+    event.header.type = XLINK_RESET_REQ;
+    event.deviceHandle = link->deviceHandle;
+    mvLog(MVLOG_DEBUG, "sending reset remote event\n");
+    dispatcherAddEvent(EVENT_LOCAL, &event);
+    if (dispatcherWaitEventComplete(&link->deviceHandle))
+        return X_LINK_TIMEOUT;
+
+    return X_LINK_SUCCESS;
+}
+
+
+XLinkError_t XLinkResetAll()
+{
+#if defined(NO_BOOT)
+    mvLog(MVLOG_INFO, "Devices will not be restarted for this configuration (NO_BOOT)");
+#else
+    int i;
+    for (i = 0; i < MAX_LINKS; i++) {
+        if (availableXLinks[i].id != INVALID_LINK_ID) {
+            xLinkDesc_t* link = &availableXLinks[i];
+            int stream;
+            for (stream = 0; stream < XLINK_MAX_STREAMS; stream++) {
+                if (link->availableStreams[stream].id != INVALID_STREAM_ID) {
+                    streamId_t streamId = link->availableStreams[stream].id;
+                    mvLog(MVLOG_DEBUG,"%s() Closing stream (stream = %d) %d on link %d\n",
+                          __func__, stream, (int) streamId, (int) link->id);
+                    COMBIN_IDS(streamId, link->id);
+                    if (XLinkCloseStream(streamId) != X_LINK_SUCCESS) {
+                        mvLog(MVLOG_WARN,"Failed to close stream");
+                    }
+                }
+            }
+            if (XLinkResetRemote(link->id) != X_LINK_SUCCESS) {
+                mvLog(MVLOG_WARN,"Failed to reset");
+            }
+        }
+    }
+#endif
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkProfStart()
+{
+    glHandler->profEnable = 1;
+    glHandler->profilingData.totalReadBytes = 0;
+    glHandler->profilingData.totalWriteBytes = 0;
+    glHandler->profilingData.totalWriteTime = 0;
+    glHandler->profilingData.totalReadTime = 0;
+    glHandler->profilingData.totalBootCount = 0;
+    glHandler->profilingData.totalBootTime = 0;
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkProfStop()
+{
+    glHandler->profEnable = 0;
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkProfPrint()
+{
+    printf("XLink profiling results:\n");
+    if (glHandler->profilingData.totalWriteTime)
+    {
+        printf("Average write speed: %f MB/Sec\n",
+               glHandler->profilingData.totalWriteBytes /
+               glHandler->profilingData.totalWriteTime /
+               1024.0 /
+               1024.0 );
+    }
+    if (glHandler->profilingData.totalReadTime)
+    {
+        printf("Average read speed: %f MB/Sec\n",
+               glHandler->profilingData.totalReadBytes /
+               glHandler->profilingData.totalReadTime /
+               1024.0 /
+               1024.0);
+    }
+    if (glHandler->profilingData.totalBootCount)
+    {
+        printf("Average boot speed: %f sec\n",
+               glHandler->profilingData.totalBootTime /
+               glHandler->profilingData.totalBootCount);
+    }
+    return X_LINK_SUCCESS;
+}
+
+// ------------------------------------
+// API implementation. End.
+// ------------------------------------
+
+
+// ------------------------------------
+// Helpers implementation. Begin.
+// ------------------------------------
+
+static XLinkError_t parseUsbLinkPlatformError(xLinkPlatformErrorCode_t rc) {
+    switch (rc) {
+        case X_LINK_PLATFORM_SUCCESS:
+            return X_LINK_SUCCESS;
+        case X_LINK_PLATFORM_DEVICE_NOT_FOUND:
+            return X_LINK_DEVICE_NOT_FOUND;
+        case X_LINK_PLATFORM_TIMEOUT:
+            return X_LINK_TIMEOUT;
+        default:
+            return X_LINK_ERROR;
+    }
+}
+
+static int getNextAvailableLinkIndex()
+{
+    int i;
+    for (i = 0; i < MAX_LINKS; i++)
+        if (availableXLinks[i].id == INVALID_LINK_ID)
+            return i;
+
+    mvLog(MVLOG_ERROR,"%s():- no next available link!\n", __func__);
+    return -1;
+}
+
+static linkId_t getNextAvailableLinkUniqueId()
+{
+    linkId_t start = nextUniqueLinkId;
+    do
+    {
+        int i;
+        for (i = 0; i < MAX_LINKS; i++)
+        {
+            if (availableXLinks[i].id != INVALID_LINK_ID &&
+                availableXLinks[i].id == nextUniqueLinkId)
+                break;
+        }
+        if (i >= MAX_LINKS)
+        {
+            return nextUniqueLinkId;
+        }
+        nextUniqueLinkId++;
+        if (nextUniqueLinkId == INVALID_LINK_ID)
+        {
+            nextUniqueLinkId = 0;
+        }
+    } while (start != nextUniqueLinkId);
+    mvLog(MVLOG_ERROR, "%s():- no next available link!\n", __func__);
+    return INVALID_LINK_ID;
+}
+
+// ------------------------------------
+// Helpers implementation. End.
+// ------------------------------------
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcher.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcher.c
@@ -31,7 +31,7 @@
 #include "XLinkDispatcher.h"
 #include "XLinkPrivateDefines.h"
 #include "XLink.h"
-#include "XLink_tool.h"
+#include "XLinkTool.h"

 #define MVLOG_UNIT_NAME xLink
 #include "mvLog.h"
@@ -89,15 +89,13 @@ typedef struct {
    localSem_t eventSemaphores[MAXIMUM_SEMAPHORES];
 } xLinkSchedulerState_t;

-extern char* TypeToStr(int type);
-
 #if (defined(_WIN32) || defined(_WIN64))
 static void* __cdecl eventSchedulerRun(void* ctx);
 #else
 static void* eventSchedulerRun(void*);
 #endif
 //These will be common for all, Initialized only once
-struct dispatcherControlFunctions* glControlFunc;
+DispatcherControlFunctions* glControlFunc;
 int numSchedulers;
 xLinkSchedulerState_t schedulerState[MAX_SCHEDULERS];
 sem_t addSchedulerSem;
@@ -118,11 +116,6 @@ static int unrefSem(sem_t* sem,  xLinkSchedulerState_t* curr) {
    while (temp < curr->eventSemaphores + MAXIMUM_SEMAPHORES) {
        if (&temp->sem == sem) {
            temp->refs--;
-            if (temp->refs == 0) {
-                curr->semaphores--;
-                ASSERT_X_LINK(sem_destroy(&temp->sem) != -1);
-                temp->refs = -1;
-            }
            return 1;
        }
        temp++;
@@ -136,7 +129,7 @@ static sem_t* getCurrentSem(pthread_t threadId, xLinkSchedulerState_t* curr, int

    localSem_t* sem = curr->eventSemaphores;
    while (sem < curr->eventSemaphores + MAXIMUM_SEMAPHORES) {
-        if (pthread_t_compare(sem->threadId, threadId) && sem->refs > 0) {
+        if (pthread_t_compare(sem->threadId, threadId) && sem->refs >= 0) {
            sem->refs += inc_ref;
            return &sem->sem;
        }
@@ -149,36 +142,51 @@ static sem_t* createSem(xLinkSchedulerState_t* curr)
 {
    ASSERT_X_LINK_R(curr != NULL, NULL);

-
    sem_t* sem = getCurrentSem(pthread_self(), curr, 0);
-    if (sem) // it already exists, error
+    if (sem) {// it already exists, error
        return NULL;
-    else
-    {
-        if (curr->semaphores < MAXIMUM_SEMAPHORES) {
-            localSem_t* temp = curr->eventSemaphores;
-            while (temp < curr->eventSemaphores + MAXIMUM_SEMAPHORES) {
-                if (temp->refs < 0) {
+    }
+
+    if (curr->semaphores <= MAXIMUM_SEMAPHORES) {
+        localSem_t* temp = curr->eventSemaphores;
+
+        while (temp < curr->eventSemaphores + MAXIMUM_SEMAPHORES) {
+            if (temp->refs < 0 || curr->semaphores == MAXIMUM_SEMAPHORES) {
+                if (curr->semaphores == MAXIMUM_SEMAPHORES && !temp->refs) {
+                    ASSERT_X_LINK(sem_destroy(&temp->sem) != -1);
+                    curr->semaphores --;
+                    temp->refs = -1;
+#if (defined(_WIN32) || defined(_WIN64))
+                    memset(&temp->threadId, 0, sizeof(temp->threadId));
+#else
+                    temp->threadId = 0;
+#endif
+                }
+
+                if (temp->refs == -1) {
                    sem = &temp->sem;
-                    if (temp->refs == -1) {
-                        if (sem_init(sem, 0, 0))
-                            perror("Can't create semaphore\n");
+                    if (sem_init(sem, 0, 0)){
+                        mvLog(MVLOG_ERROR, "Error: Can't create semaphore\n");
+                        return NULL;
                    }
                    curr->semaphores++;
                    temp->refs = 1;
                    temp->threadId = pthread_self();
-
                    break;
                }
-                temp++;
            }
-            if (!sem)
-                return NULL;
+            temp++;
+        }
+        if (!sem) {
+            return NULL; //shouldn't happen
        }
-        else
-            return NULL;
-        return sem;
    }
+    else {
+        mvLog(MVLOG_ERROR, "Error: cached semaphores %d exceeds the MAXIMUM_SEMAPHORES %d", curr->semaphores, MAXIMUM_SEMAPHORES);
+        return NULL;
+    }
+
+    return sem;
 }

 #if (defined(_WIN32) || defined(_WIN64))
@@ -382,11 +390,8 @@ static xLinkEvent_t* addNextQueueElemToProc(xLinkSchedulerState_t* curr,
    }
    mvLog(MVLOG_DEBUG, "Received event %s %d", TypeToStr(event->header.type), o);
    ev = &eventP->packet;
-    if (eventP->sem) {
-        if ((XLinkError_t)unrefSem(eventP->sem,  curr) == X_LINK_ERROR) {
-            mvLog(MVLOG_WARN, "Failed to unref sem");
-        }
-    }
+
+    (void)curr;
    eventP->sem = sem;
    eventP->packet = *event;
    eventP->origin = o;
@@ -406,7 +411,7 @@ static xLinkEventPriv_t* dispatcherGetNextEvent(xLinkSchedulerState_t* curr)
 {
    ASSERT_X_LINK_R(curr != NULL, NULL);

-    if (XLinkWaitSem(&curr->notifyDispatcherSem)) {
+    if (sem_wait(&curr->notifyDispatcherSem)) {
        mvLog(MVLOG_ERROR,"can't post semaphore\n");
    }

@@ -464,10 +469,10 @@ static int dispatcherReset(xLinkSchedulerState_t* curr)
 {
    ASSERT_X_LINK(curr != NULL);
 #ifdef __PC__
-    CHECK_MUTEX_SUCCESS_RC(pthread_mutex_lock(&reset_mutex), 1);
+    XLINK_RET_IF_RC(pthread_mutex_lock(&reset_mutex), 1);

    if(!isAvailableScheduler(curr)) {
-        CHECK_MUTEX_SUCCESS(pthread_mutex_unlock(&reset_mutex));
+        XLINK_CHECK_CALL(pthread_mutex_unlock(&reset_mutex));
        return 1;
    }
 #endif
@@ -498,7 +503,7 @@ static int dispatcherReset(xLinkSchedulerState_t* curr)

 #ifdef __PC__
    closeDeviceFdAndResetScheduler(curr);
-    CHECK_MUTEX_SUCCESS(pthread_mutex_unlock(&reset_mutex));
+    XLINK_CHECK_CALL(pthread_mutex_unlock(&reset_mutex));
 #else
    glControlFunc->closeDeviceFd(&curr->deviceHandle);
    curr->schedulerId = -1;
@@ -706,6 +711,7 @@ static xLinkSchedulerState_t* findCorrespondingScheduler(void* xLinkFD)

    return NULL;
 }
+
 ///////////////// External Interface //////////////////////////
 /*Adds a new event with parameters and returns event id*/
 xLinkEvent_t* dispatcherAddEvent(xLinkEventOrigin_t origin, xLinkEvent_t *event)
@@ -717,7 +723,7 @@ xLinkEvent_t* dispatcherAddEvent(xLinkEventOrigin_t origin, xLinkEvent_t *event)
        return NULL;
    }
    mvLog(MVLOG_DEBUG, "Receiving event %s %d\n", TypeToStr(event->header.type), origin);
-    if (XLinkWaitSem(&curr->addEventSem)) {
+    if (sem_wait(&curr->addEventSem)) {
        mvLog(MVLOG_ERROR,"can't wait semaphore\n");
        return NULL;
    }
@@ -753,7 +759,8 @@ xLinkEvent_t* dispatcherAddEvent(xLinkEventOrigin_t origin, xLinkEvent_t *event)
    return ev;
 }

-int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle, unsigned int timeout)
+
+int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle)
 {
    xLinkSchedulerState_t* curr = findCorrespondingScheduler(deviceHandle->xLinkFD);
    ASSERT_X_LINK(curr != NULL);
@@ -762,11 +769,9 @@ int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle, unsigned int
    if (id == NULL) {
        return -1;
    }
-#ifndef __PC__
-    (void)timeout;
-    return XLinkWaitSem(id);
-#else
-    int rc = XLinkWaitSemUserMode(id, timeout);
+
+    int rc = sem_wait(id);
+#ifdef __PC__
    if (rc) {
        xLinkEvent_t event = {0};
        event.header.type = XLINK_RESET_REQ;
@@ -774,13 +779,17 @@ int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle, unsigned int
        mvLog(MVLOG_ERROR,"waiting is timeout, sending reset remote event");
        dispatcherAddEvent(EVENT_LOCAL, &event);
        id = getCurrentSem(pthread_self(), curr, 0);
-        if (id == NULL || XLinkWaitSemUserMode(id, timeout)) {
+        if (id == NULL || sem_wait(id)) {
            dispatcherReset(curr);
        }
    }
+#endif
+
+    if ((XLinkError_t)unrefSem(id, curr) == X_LINK_ERROR) {
+        mvLog(MVLOG_WARN, "Failed to unref sem");
+    }

    return rc;
-#endif
 }

 int dispatcherUnblockEvent(eventId_t id, xLinkEventType_t type, streamId_t stream, void* xLinkFD)
@@ -905,7 +914,7 @@ int dispatcherStart(xLinkDeviceHandle_t* deviceHandle)
    }
 #endif

-    XLinkWaitSem(&addSchedulerSem);
+    sem_wait(&addSchedulerSem);
    mvLog(MVLOG_DEBUG,"%s() starting a new thread - schedulerId %d \n", __func__, idx);
    int sc = pthread_create(&schedulerState[idx].xLinkThreadId,
                            &attr,
@@ -942,7 +951,7 @@ int dispatcherStart(xLinkDeviceHandle_t* deviceHandle)
    return 0;
 }

-int dispatcherInitialize(struct dispatcherControlFunctions* controlFunc) {
+int dispatcherInitialize(DispatcherControlFunctions* controlFunc) {
    // create thread which will communicate with the pc

    int i;
@@ -978,16 +987,42 @@ int dispatcherClean(void* xLinkFD)
    xLinkSchedulerState_t* curr = findCorrespondingScheduler(xLinkFD);
    ASSERT_X_LINK(curr != NULL);

-    CHECK_MUTEX_SUCCESS_RC(pthread_mutex_lock(&reset_mutex), 1);
+    XLINK_RET_IF_RC(pthread_mutex_lock(&reset_mutex), 1);
    if(!isAvailableScheduler(curr)) {
-        CHECK_MUTEX_SUCCESS(pthread_mutex_unlock(&reset_mutex));
+        XLINK_CHECK_CALL(pthread_mutex_unlock(&reset_mutex));
        return 1;
    }
    mvLog(MVLOG_INFO, "Start Clean Dispatcher...");
    closeDeviceFdAndResetScheduler(curr);
    mvLog(MVLOG_INFO, "Clean Dispatcher Successfully...");
-    CHECK_MUTEX_SUCCESS(pthread_mutex_unlock(&reset_mutex));
+    XLINK_CHECK_CALL(pthread_mutex_unlock(&reset_mutex));
    return 0;
 }

+char* TypeToStr(int type)
+{
+    switch(type)
+    {
+        case XLINK_WRITE_REQ:     return "XLINK_WRITE_REQ";
+        case XLINK_READ_REQ:      return "XLINK_READ_REQ";
+        case XLINK_READ_REL_REQ:  return "XLINK_READ_REL_REQ";
+        case XLINK_CREATE_STREAM_REQ:return "XLINK_CREATE_STREAM_REQ";
+        case XLINK_CLOSE_STREAM_REQ: return "XLINK_CLOSE_STREAM_REQ";
+        case XLINK_PING_REQ:         return "XLINK_PING_REQ";
+        case XLINK_RESET_REQ:        return "XLINK_RESET_REQ";
+        case XLINK_REQUEST_LAST:     return "XLINK_REQUEST_LAST";
+        case XLINK_WRITE_RESP:   return "XLINK_WRITE_RESP";
+        case XLINK_READ_RESP:     return "XLINK_READ_RESP";
+        case XLINK_READ_REL_RESP: return "XLINK_READ_REL_RESP";
+        case XLINK_CREATE_STREAM_RESP: return "XLINK_CREATE_STREAM_RESP";
+        case XLINK_CLOSE_STREAM_RESP:  return "XLINK_CLOSE_STREAM_RESP";
+        case XLINK_PING_RESP:  return "XLINK_PING_RESP";
+        case XLINK_RESET_RESP: return "XLINK_RESET_RESP";
+        case XLINK_RESP_LAST:  return "XLINK_RESP_LAST";
+        default:
+            break;
+    }
+    return "";
+}
+
 /* end of file */
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcher.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcher.h
@@ -22,25 +22,27 @@ typedef int (*getRespFunction) (xLinkEvent_t*,
 xLinkEvent_t* dispatcherAddEvent(xLinkEventOrigin_t origin,
                                    xLinkEvent_t *event);

-int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle, unsigned int timeout);
+int dispatcherWaitEventComplete(xLinkDeviceHandle_t* deviceHandle);
 int dispatcherUnblockEvent(eventId_t id,
                            xLinkEventType_t type,
                            streamId_t stream,
                            void* xlinkFD);

-struct dispatcherControlFunctions {
-                                int (*eventSend) (xLinkEvent_t*);
-                                int (*eventReceive) (xLinkEvent_t*);
-                                getRespFunction localGetResponse;
-                                getRespFunction remoteGetResponse;
-                                void (*closeLink) (void* fd, int fullClose);
-                                void (*closeDeviceFd) (xLinkDeviceHandle_t* deviceHandle);
-                                };
+typedef struct {
+    int (*eventSend) (xLinkEvent_t*);
+    int (*eventReceive) (xLinkEvent_t*);
+    getRespFunction localGetResponse;
+    getRespFunction remoteGetResponse;
+    void (*closeLink) (void* fd, int fullClose);
+    void (*closeDeviceFd) (xLinkDeviceHandle_t* deviceHandle);
+} DispatcherControlFunctions;

-int dispatcherInitialize(struct dispatcherControlFunctions* controlFunc);
+int dispatcherInitialize(DispatcherControlFunctions* controlFunc);
 int dispatcherStart(xLinkDeviceHandle_t* deviceHandle);
 int dispatcherClean(void* xLinkFD);

+char* TypeToStr(int type);
+
 #ifdef __cplusplus
 }
 #endif
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcherImpl.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcherImpl.c
@@ -0,0 +1,781 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <string.h>
+#include "stdlib.h"
+
+#include "mvMacros.h"
+#include "XLinkTool.h"
+#include "XLinkPlatform.h"
+#include "XLinkDispatcherImpl.h"
+#include "XLinkPrivateFields.h"
+#include "XLinkDispatcher.h"
+
+#ifdef MVLOG_UNIT_NAME
+#undef MVLOG_UNIT_NAME
+#define MVLOG_UNIT_NAME xLink
+#endif
+#include "mvLog.h"
+#include "mvStringUtils.h"
+
+#ifndef XLINK_USB_DATA_TIMEOUT
+#define XLINK_USB_DATA_TIMEOUT 10000
+#endif
+
+// ------------------------------------
+// Helpers declaration. Begin.
+// ------------------------------------
+
+static int isStreamSpaceEnoughFor(streamDesc_t* stream, uint32_t size);
+static int is_semaphore_initialized(const streamDesc_t *stream);
+
+static streamPacketDesc_t* getPacketFromStream(streamDesc_t* stream);
+static int releasePacketFromStream(streamDesc_t* stream, uint32_t* releasedSize);
+static int addNewPacketToStream(streamDesc_t* stream, void* buffer, uint32_t size);
+
+static int handleIncomingEvent(xLinkEvent_t* event);
+
+#ifdef __PC__
+static void setEventFailed(xLinkEvent_t * event );
+#endif
+
+static int getNextAvailableStreamIndex(xLinkDesc_t* link);
+
+static void deallocateStream(streamDesc_t* stream);
+static streamId_t allocateNewStream(void* fd,
+                                    const char* name,
+                                    uint32_t writeSize,
+                                    uint32_t readSize,
+                                    streamId_t forcedId);
+
+
+// ------------------------------------
+// Helpers declaration. End.
+// ------------------------------------
+
+
+
+// ------------------------------------
+// XLinkDispatcherImpl.h implementation. Begin.
+// ------------------------------------
+
+//adds a new event with parameters and returns event id
+int dispatcherEventSend(xLinkEvent_t *event)
+{
+    mvLog(MVLOG_DEBUG, "%s, size %d, streamId %d.\n", TypeToStr(event->header.type), event->header.size, event->header.streamId);
+#ifdef __PC__
+    int rc = XLinkWrite(&event->deviceHandle, &event->header, sizeof(event->header), XLINK_USB_DATA_TIMEOUT);
+#else
+    int rc = XLinkWrite(&event->deviceHandle, &event->header, sizeof(event->header), 0);
+#endif
+
+    if(rc < 0)
+    {
+        mvLog(MVLOG_ERROR,"Write failed (header) (err %d) | event %s\n", rc, TypeToStr(event->header.type));
+        return rc;
+    }
+    if (event->header.type == XLINK_WRITE_REQ)
+    {
+        //write requested data
+        rc = XLinkWrite(&event->deviceHandle, event->data,
+                        event->header.size, XLINK_USB_DATA_TIMEOUT);
+        if(rc < 0) {
+            mvLog(MVLOG_ERROR,"Write failed %d\n", rc);
+#ifndef __PC__
+            return rc;
+#endif
+        }
+    }
+    // this function will send events to the remote node
+    return 0;
+}
+
+int dispatcherEventReceive(xLinkEvent_t* event){
+    static xLinkEvent_t prevEvent = {0};
+#ifdef __PC__
+    int sc = XLinkRead(&event->deviceHandle, &event->header, sizeof(event->header), 0);
+#else
+    int sc = XLinkRead(&event->deviceHandle, &event->header, sizeof(event->header), XLINK_USB_DATA_TIMEOUT);
+#endif
+
+    mvLog(MVLOG_DEBUG,"Incoming event %p: %s %d %p prevEvent: %s %d %p\n",
+          event,
+          TypeToStr(event->header.type),
+          (int)event->header.id,
+          event->deviceHandle.xLinkFD,
+          TypeToStr(prevEvent.header.type),
+          (int)prevEvent.header.id,
+          prevEvent.deviceHandle.xLinkFD);
+
+
+    if(sc < 0) {
+        xLinkDesc_t* link = getLink(&event->deviceHandle.xLinkFD);
+        if (event->header.type == XLINK_RESET_RESP || link == NULL) {
+            return sc;
+        } else if (link->hostClosedFD) {
+            //host intentionally closed usb, finish normally
+            event->header.type = XLINK_RESET_RESP;
+            return 0;
+        }
+    }
+
+    if(sc < 0) {
+        mvLog(MVLOG_ERROR,"%s() Read failed %d\n", __func__, (int)sc);
+        return sc;
+    }
+
+    if (prevEvent.header.id == event->header.id &&
+        prevEvent.header.type == event->header.type &&
+        prevEvent.deviceHandle.xLinkFD == event->deviceHandle.xLinkFD)
+    {
+        mvLog(MVLOG_FATAL,"Duplicate id detected. \n");
+    }
+
+    prevEvent = *event;
+    if (handleIncomingEvent(event) != 0) {
+        mvLog(MVLOG_WARN,"Failed to handle incoming event");
+    }
+
+    if(event->header.type == XLINK_RESET_REQ)
+    {
+        if(event->deviceHandle.protocol == X_LINK_PCIE) {
+            mvLog(MVLOG_DEBUG,"XLINK_RESET_REQ received - doing nothing, we dont want to reset device");
+        }
+        else {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
+//this function should be called only for remote requests
+int dispatcherLocalEventGetResponse(xLinkEvent_t* event, xLinkEvent_t* response)
+{
+    streamDesc_t* stream;
+    response->header.id = event->header.id;
+    mvLog(MVLOG_DEBUG, "%s\n",TypeToStr(event->header.type));
+    switch (event->header.type){
+        case XLINK_WRITE_REQ:
+            //in case local tries to write after it issues close (writeSize is zero)
+            stream = getStreamById(event->deviceHandle.xLinkFD, event->header.streamId);
+
+#ifdef __PC__
+            if(!stream){
+                mvLog(MVLOG_DEBUG, "stream %d has been closed!\n", event->header.streamId);
+                setEventFailed(event);
+                break;
+            }
+#else
+            ASSERT_X_LINK(stream);
+#endif
+
+            if (stream->writeSize == 0)
+            {
+                event->header.flags.bitField.nack = 1;
+                event->header.flags.bitField.ack = 0;
+                // return -1 to don't even send it to the remote
+                releaseStream(stream);
+                return -1;
+            }
+            event->header.flags.bitField.ack = 1;
+            event->header.flags.bitField.nack = 0;
+            event->header.flags.bitField.localServe = 0;
+
+            if(!isStreamSpaceEnoughFor(stream, event->header.size)){
+                mvLog(MVLOG_FATAL,"local NACK RTS. stream '%s' is full (event %d)\n", stream->name, event->header.id);
+                event->header.flags.bitField.block = 1;
+                event->header.flags.bitField.localServe = 1;
+                // TODO: easy to implement non-blocking read here, just return nack
+                mvLog(MVLOG_WARN, "Blocked event would cause dispatching thread to wait on semaphore infinitely\n");
+            }else{
+                event->header.flags.bitField.block = 0;
+                stream->remoteFillLevel += event->header.size;
+                stream->remoteFillPacketLevel++;
+                mvLog(MVLOG_DEBUG,"S%d: Got local write of %ld , remote fill level %ld out of %ld %ld\n",
+                      event->header.streamId, event->header.size, stream->remoteFillLevel, stream->writeSize, stream->readSize);
+            }
+            releaseStream(stream);
+            break;
+        case XLINK_READ_REQ:
+            stream = getStreamById(event->deviceHandle.xLinkFD, event->header.streamId);
+#ifdef __PC__
+            if(!stream){
+                mvLog(MVLOG_DEBUG, "stream %d has been closed!\n", event->header.streamId);
+                setEventFailed(event);
+                break;
+            }
+#else
+            ASSERT_X_LINK(stream);
+#endif
+            streamPacketDesc_t* packet = getPacketFromStream(stream);
+            if (packet){
+                //the read can be served with this packet
+                event->data = packet;
+                event->header.flags.bitField.ack = 1;
+                event->header.flags.bitField.nack = 0;
+                event->header.flags.bitField.block = 0;
+            }
+            else{
+                event->header.flags.bitField.block = 1;
+                // TODO: easy to implement non-blocking read here, just return nack
+            }
+            event->header.flags.bitField.localServe = 1;
+            releaseStream(stream);
+            break;
+        case XLINK_READ_REL_REQ:
+            stream = getStreamById(event->deviceHandle.xLinkFD, event->header.streamId);
+            ASSERT_X_LINK(stream);
+            uint32_t releasedSize = 0;
+            releasePacketFromStream(stream, &releasedSize);
+            event->header.size = releasedSize;
+            releaseStream(stream);
+            break;
+        case XLINK_CREATE_STREAM_REQ:
+            break;
+        case XLINK_CLOSE_STREAM_REQ:
+            stream = getStreamById(event->deviceHandle.xLinkFD, event->header.streamId);
+
+            ASSERT_X_LINK(stream);
+            if (stream->remoteFillLevel != 0){
+                stream->closeStreamInitiated = 1;
+                event->header.flags.bitField.block = 1;
+                event->header.flags.bitField.localServe = 1;
+            }else{
+                event->header.flags.bitField.block = 0;
+                event->header.flags.bitField.localServe = 0;
+            }
+            releaseStream(stream);
+            break;
+        case XLINK_RESET_REQ:
+            mvLog(MVLOG_DEBUG,"XLINK_RESET_REQ - do nothing\n");
+            break;
+        case XLINK_PING_REQ:
+        case XLINK_WRITE_RESP:
+        case XLINK_READ_RESP:
+        case XLINK_READ_REL_RESP:
+        case XLINK_CREATE_STREAM_RESP:
+        case XLINK_CLOSE_STREAM_RESP:
+        case XLINK_PING_RESP:
+            break;
+        case XLINK_RESET_RESP:
+            //should not happen
+            event->header.flags.bitField.localServe = 1;
+            break;
+        default:
+            ASSERT_X_LINK(0);
+    }
+    return 0;
+}
+
+//this function should be called only for remote requests
+int dispatcherRemoteEventGetResponse(xLinkEvent_t* event, xLinkEvent_t* response)
+{
+    streamDesc_t* stream;
+    response->header.id = event->header.id;
+    response->header.flags.raw = 0;
+    mvLog(MVLOG_DEBUG, "%s\n",TypeToStr(event->header.type));
+
+    switch (event->header.type)
+    {
+        case XLINK_WRITE_REQ:
+            //let remote write immediately as we have a local buffer for the data
+            response->header.type = XLINK_WRITE_RESP;
+            response->header.size = event->header.size;
+            response->header.streamId = event->header.streamId;
+            response->header.flags.bitField.ack = 1;
+            response->deviceHandle = event->deviceHandle;
+
+            // we got some data. We should unblock a blocked read
+            int xxx = dispatcherUnblockEvent(-1,
+                                             XLINK_READ_REQ,
+                                             response->header.streamId,
+                                             event->deviceHandle.xLinkFD);
+            (void) xxx;
+            mvLog(MVLOG_DEBUG,"unblocked from stream %d %d\n",
+                  (int)response->header.streamId, (int)xxx);
+            break;
+        case XLINK_READ_REQ:
+            break;
+        case XLINK_READ_REL_REQ:
+            response->header.flags.bitField.ack = 1;
+            response->header.flags.bitField.nack = 0;
+            response->header.type = XLINK_READ_REL_RESP;
+            response->deviceHandle = event->deviceHandle;
+            stream = getStreamById(event->deviceHandle.xLinkFD,
+                                   event->header.streamId);
+            ASSERT_X_LINK(stream);
+            stream->remoteFillLevel -= event->header.size;
+            stream->remoteFillPacketLevel--;
+
+            mvLog(MVLOG_DEBUG,"S%d: Got remote release of %ld, remote fill level %ld out of %ld %ld\n",
+                  event->header.streamId, event->header.size, stream->remoteFillLevel, stream->writeSize, stream->readSize);
+            releaseStream(stream);
+
+            dispatcherUnblockEvent(-1, XLINK_WRITE_REQ, event->header.streamId,
+                                   event->deviceHandle.xLinkFD);
+            //with every released packet check if the stream is already marked for close
+            if (stream->closeStreamInitiated && stream->localFillLevel == 0)
+            {
+                mvLog(MVLOG_DEBUG,"%s() Unblock close STREAM\n", __func__);
+                int xxx = dispatcherUnblockEvent(-1,
+                                                 XLINK_CLOSE_STREAM_REQ,
+                                                 event->header.streamId,
+                                                 event->deviceHandle.xLinkFD);
+                (void) xxx;
+            }
+            break;
+        case XLINK_CREATE_STREAM_REQ:
+            response->header.flags.bitField.ack = 1;
+            response->header.type = XLINK_CREATE_STREAM_RESP;
+            //write size from remote means read size for this peer
+            response->header.streamId = allocateNewStream(event->deviceHandle.xLinkFD,
+                                                          event->header.streamName,
+                                                          0, event->header.size,
+                                                          INVALID_STREAM_ID);
+
+            if (response->header.streamId == INVALID_STREAM_ID) {
+                response->header.flags.bitField.ack = 0;
+                response->header.flags.bitField.sizeTooBig = 1;
+                break;
+            }
+
+            response->deviceHandle = event->deviceHandle;
+            mv_strncpy(response->header.streamName, MAX_STREAM_NAME_LENGTH,
+                       event->header.streamName, MAX_STREAM_NAME_LENGTH - 1);
+            response->header.size = event->header.size;
+            mvLog(MVLOG_DEBUG,"creating stream %x\n", (int)response->header.streamId);
+            break;
+        case XLINK_CLOSE_STREAM_REQ:
+        {
+            response->header.type = XLINK_CLOSE_STREAM_RESP;
+            response->header.streamId = event->header.streamId;
+            response->deviceHandle = event->deviceHandle;
+
+            streamDesc_t* stream = getStreamById(event->deviceHandle.xLinkFD,
+                                                 event->header.streamId);
+            if (!stream) {
+                //if we have sent a NACK before, when the event gets unblocked
+                //the stream might already be unavailable
+                response->header.flags.bitField.ack = 1; //All is good, we are done
+                response->header.flags.bitField.nack = 0;
+                mvLog(MVLOG_DEBUG,"%s() got a close stream on aready closed stream\n", __func__);
+            } else {
+                if (stream->localFillLevel == 0)
+                {
+                    response->header.flags.bitField.ack = 1;
+                    response->header.flags.bitField.nack = 0;
+
+                    deallocateStream(stream);
+                    if (!stream->writeSize) {
+                        stream->id = INVALID_STREAM_ID;
+                        stream->name[0] = '\0';
+                    }
+                }
+                else
+                {
+                    mvLog(MVLOG_DEBUG,"%s():fifo is NOT empty returning NACK \n", __func__);
+                    response->header.flags.bitField.nack = 1;
+                    stream->closeStreamInitiated = 1;
+                }
+
+                releaseStream(stream);
+            }
+            break;
+        }
+        case XLINK_PING_REQ:
+            response->header.type = XLINK_PING_RESP;
+            response->header.flags.bitField.ack = 1;
+            response->deviceHandle = event->deviceHandle;
+            sem_post(&pingSem);
+            break;
+        case XLINK_RESET_REQ:
+            mvLog(MVLOG_DEBUG,"reset request - received! Sending ACK *****\n");
+            response->header.flags.bitField.ack = 1;
+            response->header.flags.bitField.nack = 0;
+            response->header.type = XLINK_RESET_RESP;
+            response->deviceHandle = event->deviceHandle;
+            // need to send the response, serve the event and then reset
+            break;
+        case XLINK_WRITE_RESP:
+            break;
+        case XLINK_READ_RESP:
+            break;
+        case XLINK_READ_REL_RESP:
+            break;
+        case XLINK_CREATE_STREAM_RESP:
+        {
+            // write_size from the response the size of the buffer from the remote
+            response->header.streamId = allocateNewStream(event->deviceHandle.xLinkFD,
+                                                          event->header.streamName,
+                                                          event->header.size,0,
+                                                          event->header.streamId);
+#ifndef __PC__
+            ASSERT_X_LINK_R(response->header.streamId != INVALID_STREAM_ID, X_LINK_ERROR);
+#endif
+            response->deviceHandle = event->deviceHandle;
+            break;
+        }
+        case XLINK_CLOSE_STREAM_RESP:
+        {
+            streamDesc_t* stream = getStreamById(event->deviceHandle.xLinkFD,
+                                                 event->header.streamId);
+
+            if (!stream){
+                response->header.flags.bitField.nack = 1;
+                response->header.flags.bitField.ack = 0;
+                break;
+            }
+            stream->writeSize = 0;
+            if (!stream->readSize) {
+                response->header.flags.bitField.nack = 1;
+                response->header.flags.bitField.ack = 0;
+                stream->id = INVALID_STREAM_ID;
+                stream->name[0] = '\0';
+                break;
+            }
+            releaseStream(stream);
+            break;
+        }
+        case XLINK_PING_RESP:
+            break;
+        case XLINK_RESET_RESP:
+            break;
+        default:
+            ASSERT_X_LINK(0);
+    }
+    return 0;
+}
+
+void dispatcherCloseLink(void* fd, int fullClose)
+{
+    xLinkDesc_t* link = getLink(fd);
+
+    if (!link) {
+        mvLog(MVLOG_WARN, "Dispatcher link is null");
+        return;
+    }
+
+    if (!fullClose) {
+        link->peerState = XLINK_DOWN;
+        return;
+    }
+
+#ifndef __PC__
+    link->peerState = X_LINK_COMMUNICATION_NOT_OPEN;
+#else
+    link->peerState = XLINK_NOT_INIT;
+#endif
+
+    link->id = INVALID_LINK_ID;
+    link->deviceHandle.xLinkFD = NULL;
+    link->nextUniqueStreamId = 0;
+
+    for (int index = 0; index < XLINK_MAX_STREAMS; index++) {
+        streamDesc_t* stream = &link->availableStreams[index];
+        if (!stream) {
+            continue;
+        }
+
+        while (getPacketFromStream(stream) || stream->blockedPackets) {
+            releasePacketFromStream(stream, NULL);
+        }
+
+        if (is_semaphore_initialized(stream)) {
+            sem_destroy(&stream->sem);
+            stream->name[0] = '\0';
+        }
+
+        stream->id = INVALID_STREAM_ID;
+    }
+}
+
+void dispatcherCloseDeviceFd(xLinkDeviceHandle_t* deviceHandle)
+{
+    XLinkPlatformCloseRemote(deviceHandle);
+}
+
+// ------------------------------------
+// XLinkDispatcherImpl.h implementation. End.
+// ------------------------------------
+
+
+
+// ------------------------------------
+// Helpers implementation. Begin.
+// ------------------------------------
+
+int isStreamSpaceEnoughFor(streamDesc_t* stream, uint32_t size)
+{
+    if(stream->remoteFillPacketLevel >= XLINK_MAX_PACKETS_PER_STREAM ||
+       stream->remoteFillLevel + size > stream->writeSize){
+        mvLog(MVLOG_DEBUG, "S%d: Not enough space in stream '%s' for %ld: PKT %ld, FILL %ld SIZE %ld\n",
+              stream->id, stream->name, size, stream->remoteFillPacketLevel, stream->remoteFillLevel, stream->writeSize);
+        return 0;
+    }
+    else
+        return 1;
+}
+
+int is_semaphore_initialized(const streamDesc_t *stream) {
+    return stream && strnlen(stream->name, MAX_STREAM_NAME_LENGTH) != 0;
+}
+
+
+streamPacketDesc_t* getPacketFromStream(streamDesc_t* stream)
+{
+    streamPacketDesc_t* ret = NULL;
+    if (stream->availablePackets)
+    {
+        ret = &stream->packets[stream->firstPacketUnused];
+        stream->availablePackets--;
+        CIRCULAR_INCREMENT(stream->firstPacketUnused,
+                           XLINK_MAX_PACKETS_PER_STREAM);
+        stream->blockedPackets++;
+    }
+    return ret;
+}
+
+int releasePacketFromStream(streamDesc_t* stream, uint32_t* releasedSize)
+{
+    streamPacketDesc_t* currPack = &stream->packets[stream->firstPacket];
+    if(stream->blockedPackets == 0){
+        mvLog(MVLOG_ERROR,"There is no packet to release\n");
+        return 0; // ignore this, although this is a big problem on application side
+    }
+
+    stream->localFillLevel -= currPack->length;
+    mvLog(MVLOG_DEBUG, "S%d: Got release of %ld , current local fill level is %ld out of %ld %ld\n",
+          stream->id, currPack->length, stream->localFillLevel, stream->readSize, stream->writeSize);
+
+    deallocateData(currPack->data,
+                   ALIGN_UP_INT32((int32_t)currPack->length, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+
+    CIRCULAR_INCREMENT(stream->firstPacket, XLINK_MAX_PACKETS_PER_STREAM);
+    stream->blockedPackets--;
+    if (releasedSize) {
+        *releasedSize = currPack->length;
+    }
+    return 0;
+}
+
+int addNewPacketToStream(streamDesc_t* stream, void* buffer, uint32_t size) {
+    if (stream->availablePackets + stream->blockedPackets < XLINK_MAX_PACKETS_PER_STREAM)
+    {
+        stream->packets[stream->firstPacketFree].data = buffer;
+        stream->packets[stream->firstPacketFree].length = size;
+        CIRCULAR_INCREMENT(stream->firstPacketFree, XLINK_MAX_PACKETS_PER_STREAM);
+        stream->availablePackets++;
+        return 0;
+    }
+    return -1;
+}
+
+int handleIncomingEvent(xLinkEvent_t* event) {
+    //this function will be dependent whether this is a client or a Remote
+    //specific actions to this peer
+    mvLog(MVLOG_DEBUG, "%s, size %u, streamId %u.\n", TypeToStr(event->header.type), event->header.size, event->header.streamId);
+    void* buffer ;
+    streamDesc_t* stream ;
+    int sc = 0 ;
+    switch (event->header.type){
+        case XLINK_WRITE_REQ:
+            /*If we got here, we will read the data no matter what happens.
+              If we encounter any problems we will still read the data to keep
+              the communication working but send a NACK.*/
+            stream = getStreamById(event->deviceHandle.xLinkFD, event->header.streamId);
+            ASSERT_X_LINK(stream);
+
+            stream->localFillLevel += event->header.size;
+            mvLog(MVLOG_DEBUG,"S%d: Got write of %ld, current local fill level is %ld out of %ld %ld\n",
+                  event->header.streamId, event->header.size, stream->localFillLevel, stream->readSize, stream->writeSize);
+
+            buffer = allocateData(ALIGN_UP(event->header.size, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+            if (buffer == NULL){
+                mvLog(MVLOG_FATAL,"out of memory\n");
+                ASSERT_X_LINK(0);
+            }
+            sc = XLinkRead(&event->deviceHandle, buffer, event->header.size, XLINK_USB_DATA_TIMEOUT);
+            if(sc < 0){
+                mvLog(MVLOG_ERROR,"%s() Read failed %d\n", __func__, (int)sc);
+                deallocateData(buffer, ALIGN_UP(event->header.size, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+                ASSERT_X_LINK(0);
+            }
+
+            event->data = buffer;
+            if (addNewPacketToStream(stream, buffer, event->header.size)){
+                mvLog(MVLOG_WARN,"No more place in stream. release packet\n");
+                deallocateData(buffer, ALIGN_UP(event->header.size, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+                event->header.flags.bitField.ack = 0;
+                event->header.flags.bitField.nack = 1;
+                ASSERT_X_LINK(0);
+            }
+            releaseStream(stream);
+            break;
+        case XLINK_READ_REQ:
+            break;
+        case XLINK_READ_REL_REQ:
+            break;
+        case XLINK_CREATE_STREAM_REQ:
+            break;
+        case XLINK_CLOSE_STREAM_REQ:
+            break;
+        case XLINK_PING_REQ:
+            break;
+        case XLINK_RESET_REQ:
+            break;
+        case XLINK_WRITE_RESP:
+            break;
+        case XLINK_READ_RESP:
+            break;
+        case XLINK_READ_REL_RESP:
+            break;
+        case XLINK_CREATE_STREAM_RESP:
+            break;
+        case XLINK_CLOSE_STREAM_RESP:
+            break;
+        case XLINK_PING_RESP:
+            break;
+        case XLINK_RESET_RESP:
+            break;
+        default:
+            ASSERT_X_LINK(0);
+    }
+    //adding event for the scheduler. We let it know that this is a remote event
+    dispatcherAddEvent(EVENT_REMOTE, event);
+    return 0;
+}
+
+#ifdef __PC__
+void setEventFailed(xLinkEvent_t * event )
+{
+    event->header.flags.bitField.localServe = 1;
+    event->header.flags.bitField.ack = 0;
+    event->header.flags.bitField.nack = 1;
+}
+#endif
+
+int getNextAvailableStreamIndex(xLinkDesc_t* link)
+{
+    if (link == NULL)
+        return -1;
+
+    int idx;
+    for (idx = 0; idx < XLINK_MAX_STREAMS; idx++) {
+        if (link->availableStreams[idx].id == INVALID_STREAM_ID)
+            return idx;
+    }
+
+    mvLog(MVLOG_DEBUG,"%s(): - no next available stream!\n", __func__);
+    return -1;
+}
+
+void deallocateStream(streamDesc_t* stream)
+{
+    if (stream && stream->id != INVALID_STREAM_ID)
+    {
+        if (stream->readSize)
+        {
+            stream->readSize = 0;
+            stream->closeStreamInitiated = 0;
+        }
+
+#ifndef __PC__
+        if (is_semaphore_initialized(stream)) {
+            if(sem_destroy(&stream->sem))
+                perror("Can't destroy semaphore");
+        }
+#endif
+    }
+}
+
+streamId_t allocateNewStream(void* fd,
+                             const char* name,
+                             uint32_t writeSize,
+                             uint32_t readSize,
+                             streamId_t forcedId)
+{
+    streamId_t streamId;
+    streamDesc_t* stream;
+    xLinkDesc_t* link = getLink(fd);
+    ASSERT_X_LINK_R(link != NULL, INVALID_STREAM_ID);
+
+    stream = getStreamByName(link, name);
+
+    if (stream != NULL)
+    {
+        /*the stream already exists*/
+        if ((writeSize > stream->writeSize && stream->writeSize != 0) ||
+            (readSize > stream->readSize && stream->readSize != 0))
+        {
+            mvLog(MVLOG_ERROR, "%s(): streamName Exists %d\n", __func__, (int)stream->id);
+            return INVALID_STREAM_ID;
+        }
+    }
+    else
+    {
+        int idx = getNextAvailableStreamIndex(link);
+
+        if (idx == -1)
+        {
+            return INVALID_STREAM_ID;
+        }
+        stream = &link->availableStreams[idx];
+        if (forcedId == INVALID_STREAM_ID)
+            stream->id = link->nextUniqueStreamId;
+        else
+            stream->id = forcedId;
+        link->nextUniqueStreamId++; //even if we didnt use a new one, we need to align with total number of  unique streams
+        if (!is_semaphore_initialized(stream)) //if sem_init is called for already initiated sem, behavior is undefined
+        {
+            if(sem_init(&stream->sem, 0, 0))
+                perror("Can't create semaphore\n");
+        }
+        else
+        {
+            mvLog(MVLOG_INFO, "is_semaphore_initialized\n");
+        }
+
+        mv_strncpy(stream->name, MAX_STREAM_NAME_LENGTH,
+                   name, MAX_STREAM_NAME_LENGTH - 1);
+        stream->readSize = 0;
+        stream->writeSize = 0;
+        stream->remoteFillLevel = 0;
+        stream->remoteFillPacketLevel = 0;
+
+        stream->localFillLevel = 0;
+        stream->closeStreamInitiated = 0;
+    }
+    if (readSize && !stream->readSize)
+    {
+        stream->readSize = readSize;
+
+#ifndef __PC__
+        // FIXME: not the best solution but the simplest for now:
+        // it is just for a check; real allocation will be done during receiving an usb package
+        void *buffer = allocateData(ALIGN_UP(readSize, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+        if (buffer == NULL) {
+            mvLog(MVLOG_ERROR,"Cannot create stream. Requested memory = %u", stream->readSize);
+            return INVALID_STREAM_ID;
+        } else {
+            deallocateData(buffer, ALIGN_UP(readSize, __CACHE_LINE_SIZE), __CACHE_LINE_SIZE);
+        }
+#endif
+    }
+    if (writeSize && !stream->writeSize)
+    {
+        stream->writeSize = writeSize;
+    }
+
+    mvLog(MVLOG_DEBUG, "The stream \"%s\"  created, id = %u, readSize = %d, writeSize = %d\n",
+          stream->name, stream->id, stream->readSize, stream->writeSize);
+
+    streamId = stream->id;
+    releaseStream(stream);
+    return streamId;
+}
+
+// ------------------------------------
+// Helpers implementation. Begin.
+// ------------------------------------
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcherImpl.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkDispatcherImpl.h
@@ -0,0 +1,20 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#ifndef _XLINKDISPATCHERIMPL_H
+#define _XLINKDISPATCHERIMPL_H
+
+#define _XLINK_ENABLE_PRIVATE_INCLUDE_
+#include "XLinkPrivateDefines.h"
+
+int dispatcherEventSend (xLinkEvent_t*);
+int dispatcherEventReceive (xLinkEvent_t*);
+int dispatcherLocalEventGetResponse (xLinkEvent_t*,
+                        xLinkEvent_t*);
+int dispatcherRemoteEventGetResponse (xLinkEvent_t*,
+                        xLinkEvent_t*);
+void dispatcherCloseLink (void* fd, int fullClose);
+void dispatcherCloseDeviceFd (xLinkDeviceHandle_t* deviceHandle);
+
+#endif //_XLINKDISPATCHERIMPL_H
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateDefines.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateDefines.h
@@ -172,10 +172,13 @@ typedef struct xLinkEvent_t {
    void* data;
 }xLinkEvent_t;

-int XLinkWaitSem(sem_t* sem);
-int XLinkWaitSemUserMode(sem_t* sem, unsigned int timeout);
-
-const char* XLinkErrorToStr(XLinkError_t rc);
+#define XLINK_INIT_EVENT(event, in_streamId, in_type, in_size, in_data, in_deviceHandle) do { \
+    (event).header.streamId = (in_streamId); \
+    (event).header.type = (in_type); \
+    (event).header.size = (in_size); \
+    (event).data = (in_data); \
+    (event).deviceHandle = (in_deviceHandle); \
+} while(0)

 #ifdef __cplusplus
 }
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateFields.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateFields.c
@@ -0,0 +1,109 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <stdio.h>
+#include <string.h>
+#include "stdlib.h"
+
+#include "XLinkPrivateFields.h"
+#include "XLinkPrivateDefines.h"
+#include "XLinkTool.h"
+
+#ifdef MVLOG_UNIT_NAME
+#undef MVLOG_UNIT_NAME
+#define MVLOG_UNIT_NAME xLink
+#endif
+#include "mvLog.h"
+
+streamId_t getStreamIdByName(xLinkDesc_t* link, const char* name)
+{
+    streamDesc_t* stream = getStreamByName(link, name);
+
+    if (stream) {
+        streamId_t id = stream->id;
+        releaseStream(stream);
+        return id;
+    }
+
+    return INVALID_STREAM_ID;
+}
+
+xLinkDesc_t* getLinkByStreamId(streamId_t streamId)
+{
+    linkId_t id;
+    EXTRACT_IDS(streamId,id);
+
+    return getLinkById(id);
+}
+
+xLinkDesc_t* getLinkById(linkId_t id)
+{
+    int i;
+    for (i = 0; i < MAX_LINKS; i++)
+        if (availableXLinks[i].id == id)
+            return &availableXLinks[i];
+    return NULL;
+}
+
+xLinkDesc_t* getLink(void* fd)
+{
+    int i;
+    for (i = 0; i < MAX_LINKS; i++)
+        if (availableXLinks[i].deviceHandle.xLinkFD == fd)
+            return &availableXLinks[i];
+    return NULL;
+}
+
+streamDesc_t* getStreamById(void* fd, streamId_t id)
+{
+    xLinkDesc_t* link = getLink(fd);
+    ASSERT_X_LINK_R(link != NULL, NULL);
+    int stream;
+    for (stream = 0; stream < XLINK_MAX_STREAMS; stream++) {
+        if (link->availableStreams[stream].id == id) {
+            if (sem_wait(&link->availableStreams[stream].sem)) {
+#ifdef __PC__
+                return NULL;
+#endif
+            }
+            return &link->availableStreams[stream];
+        }
+    }
+    return NULL;
+}
+
+streamDesc_t* getStreamByName(xLinkDesc_t* link, const char* name)
+{
+    ASSERT_X_LINK_R(link != NULL, NULL);
+    int stream;
+    for (stream = 0; stream < XLINK_MAX_STREAMS; stream++) {
+        if (link->availableStreams[stream].id != INVALID_STREAM_ID &&
+            strcmp(link->availableStreams[stream].name, name) == 0) {
+            if (sem_wait(&link->availableStreams[stream].sem)) {
+#ifdef __PC__
+                return NULL;
+#endif
+            }
+            return &link->availableStreams[stream];
+        }
+    }
+    return NULL;
+}
+
+void releaseStream(streamDesc_t* stream)
+{
+    if (stream && stream->id != INVALID_STREAM_ID) {
+        sem_post(&stream->sem);
+    }
+    else {
+        mvLog(MVLOG_DEBUG,"trying to release a semaphore for a released stream\n");
+    }
+}
+
+xLinkState_t getXLinkState(xLinkDesc_t* link)
+{
+    ASSERT_X_LINK_R(link != NULL, XLINK_NOT_INIT);
+    mvLog(MVLOG_DEBUG,"%s() link %p link->peerState %d\n", __func__,link, link->peerState);
+    return link->peerState;
+}
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateFields.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkPrivateFields.h
@@ -0,0 +1,46 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#ifndef _XLINKPRIVATEFIELDS_H
+#define _XLINKPRIVATEFIELDS_H
+
+#include "XLinkDispatcher.h"
+
+// ------------------------------------
+// Global fields declaration. Begin.
+// ------------------------------------
+
+extern XLinkGlobalHandler_t* glHandler; //TODO need to either protect this with semaphor
+                                        //or make profiling data per device
+
+extern xLinkDesc_t availableXLinks[MAX_LINKS];
+extern DispatcherControlFunctions controlFunctionTbl;
+extern sem_t  pingSem; //to b used by myriad
+
+// ------------------------------------
+// Global fields declaration. End.
+// ------------------------------------
+
+
+// ------------------------------------
+// Helpers declaration. Begin.
+// ------------------------------------
+
+streamId_t getStreamIdByName(xLinkDesc_t* link, const char* name);
+xLinkDesc_t* getLinkByStreamId(streamId_t streamId);
+xLinkDesc_t* getLinkById(linkId_t id);
+xLinkDesc_t* getLink(void* fd);
+xLinkState_t getXLinkState(xLinkDesc_t* link);
+
+
+streamDesc_t* getStreamById(void* fd, streamId_t id);
+streamDesc_t* getStreamByName(xLinkDesc_t* link, const char* name);
+
+void releaseStream(streamDesc_t* stream);
+
+// ------------------------------------
+// Helpers declaration. End.
+// ------------------------------------
+
+#endif //PROJECT_XLINKPRIVATEFIELDS_H
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkPublicDefines.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkPublicDefines.h
@@ -28,7 +28,8 @@ typedef enum{
    X_LINK_DEVICE_NOT_FOUND,
    X_LINK_TIMEOUT,
    X_LINK_ERROR,
-    X_LINK_OUT_OF_MEMORY
+    X_LINK_OUT_OF_MEMORY,
+    X_LINK_NOT_IMPLEMENTED
 } XLinkError_t;

 typedef enum{
@@ -103,6 +104,7 @@ typedef struct
    XLinkProtocol_t protocol;
 } XLinkHandler_t;

+const char* XLinkErrorToStr(XLinkError_t rc);

 //Deprecated defines. Begin.

--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkStream.c
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkStream.c
@@ -0,0 +1,299 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include "string.h"
+#include "stdlib.h"
+#include "time.h"
+
+#if (defined(_WIN32) || defined(_WIN64))
+#include "gettime.h"
+#endif
+
+#include "XLink.h"
+#include "XLinkTool.h"
+
+#include "mvMacros.h"
+#include "XLinkPrivateFields.h"
+
+#ifdef MVLOG_UNIT_NAME
+#undef MVLOG_UNIT_NAME
+#define MVLOG_UNIT_NAME xLink
+#endif
+#include "mvLog.h"
+#include "mvStringUtils.h"
+
+// ------------------------------------
+// Helpers declaration. Begin.
+// ------------------------------------
+
+#ifdef __PC__
+static XLinkError_t checkEventHeader(xLinkEventHeader_t header);
+#endif
+
+static float timespec_diff(struct timespec *start, struct timespec *stop);
+static XLinkError_t addEvent(xLinkEvent_t *event);
+static XLinkError_t addEventWithPerf(xLinkEvent_t *event, float* opTime);
+
+// ------------------------------------
+// Helpers declaration. End.
+// ------------------------------------
+
+streamId_t XLinkOpenStream(linkId_t id, const char* name, int stream_write_size)
+{
+    ASSERT_X_LINK(name);
+    XLINK_RET_IF_RC(stream_write_size < 0,
+        X_LINK_ERROR);
+
+    xLinkDesc_t* link = getLinkById(id);
+    mvLog(MVLOG_DEBUG,"%s() id %d link %p\n", __func__, id, link);
+    ASSERT_X_LINK_R(link != NULL, INVALID_STREAM_ID);
+    if (getXLinkState(link) != XLINK_UP) {
+        /*no link*/
+        mvLog(MVLOG_DEBUG,"%s() no link up\n", __func__);
+        return INVALID_STREAM_ID;
+    }
+
+    if(strlen(name) > MAX_STREAM_NAME_LENGTH) {
+        mvLog(MVLOG_WARN,"name too long\n");
+        return INVALID_STREAM_ID;
+    }
+
+    if(stream_write_size > 0)
+    {
+        stream_write_size = ALIGN_UP(stream_write_size, __CACHE_LINE_SIZE);
+
+        xLinkEvent_t event = {0};
+        XLINK_INIT_EVENT(event, INVALID_STREAM_ID, XLINK_CREATE_STREAM_REQ,
+                         stream_write_size, NULL, link->deviceHandle);
+        mv_strncpy(event.header.streamName, MAX_STREAM_NAME_LENGTH,
+                   name, MAX_STREAM_NAME_LENGTH - 1);
+
+        dispatcherAddEvent(EVENT_LOCAL, &event);
+        if (dispatcherWaitEventComplete(&link->deviceHandle))
+            return INVALID_STREAM_ID;
+
+#ifdef __PC__
+        XLinkError_t eventStatus = checkEventHeader(event.header);
+        if (eventStatus != X_LINK_SUCCESS) {
+            mvLog(MVLOG_ERROR, "Got wrong package from device, error code = %s", XLinkErrorToStr(eventStatus));
+            // FIXME: not good solution, but seems the only in the case of such XLink API
+            if (eventStatus == X_LINK_OUT_OF_MEMORY) {
+                return INVALID_STREAM_ID_OUT_OF_MEMORY;
+            } else {
+                return INVALID_STREAM_ID;
+            }
+        }
+#endif
+    }
+    streamId_t streamId = getStreamIdByName(link, name);
+
+#ifdef __PC__
+    if (streamId > 0x0FFFFFFF) {
+        mvLog(MVLOG_ERROR, "Cannot find stream id by the \"%s\" name", name);
+        mvLog(MVLOG_ERROR,"Max streamId reached!");
+        return INVALID_STREAM_ID;
+    }
+#else
+    if (streamId == INVALID_STREAM_ID) {
+        mvLog(MVLOG_ERROR,"Max streamId reached %x!", streamId);
+        return INVALID_STREAM_ID;
+    }
+#endif
+
+    COMBIN_IDS(streamId, id);
+    return streamId;
+}
+
+// Just like open stream, when closeStream is called
+// on the local size we are resetting the writeSize
+// and on the remote side we are freeing the read buffer
+XLinkError_t XLinkCloseStream(streamId_t streamId)
+{
+    xLinkDesc_t* link = getLinkByStreamId(streamId);
+    ASSERT_X_LINK(link != NULL);
+    XLINK_RET_IF_RC(getXLinkState(link) != XLINK_UP,
+        X_LINK_COMMUNICATION_NOT_OPEN);
+
+    xLinkEvent_t event = {0};
+    XLINK_INIT_EVENT(event, streamId, XLINK_CLOSE_STREAM_REQ,
+        0, NULL, link->deviceHandle);
+
+    XLINK_RET_IF(addEvent(&event));
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkWriteData(streamId_t streamId, const uint8_t* buffer,
+                            int size)
+{
+    ASSERT_X_LINK(buffer);
+
+    float opTime = 0;
+    xLinkDesc_t* link = getLinkByStreamId(streamId);
+    ASSERT_X_LINK(link != NULL);
+    XLINK_RET_IF_RC(getXLinkState(link) != XLINK_UP,
+        X_LINK_COMMUNICATION_NOT_OPEN);
+
+    xLinkEvent_t event = {0};
+    XLINK_INIT_EVENT(event, streamId, XLINK_WRITE_REQ,
+        size,(void*)buffer, link->deviceHandle);
+
+    XLINK_RET_IF(addEventWithPerf(&event, &opTime));
+
+    if( glHandler->profEnable)
+    {
+        glHandler->profilingData.totalWriteBytes += size;
+        glHandler->profilingData.totalWriteTime += opTime;
+    }
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkReadData(streamId_t streamId, streamPacketDesc_t** packet)
+{
+    ASSERT_X_LINK(packet);
+
+    float opTime = 0;
+    xLinkDesc_t* link = getLinkByStreamId(streamId);
+    ASSERT_X_LINK(link != NULL);
+    XLINK_RET_IF_RC(getXLinkState(link) != XLINK_UP,
+        X_LINK_COMMUNICATION_NOT_OPEN);
+
+    xLinkEvent_t event = {0};
+    XLINK_INIT_EVENT(event, streamId, XLINK_READ_REQ,
+        0, NULL, link->deviceHandle);
+
+    XLINK_RET_IF(addEventWithPerf(&event, &opTime));
+
+    *packet = (streamPacketDesc_t *)event.data;
+
+    if( glHandler->profEnable)
+    {
+        glHandler->profilingData.totalReadBytes += (*packet)->length;
+        glHandler->profilingData.totalReadTime += opTime;
+    }
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkReleaseData(streamId_t streamId)
+{
+    xLinkDesc_t* link = getLinkByStreamId(streamId);
+    ASSERT_X_LINK(link != NULL);
+    XLINK_RET_IF_RC(getXLinkState(link) != XLINK_UP,
+        X_LINK_COMMUNICATION_NOT_OPEN);
+
+    xLinkEvent_t event = {0};
+    XLINK_INIT_EVENT(event, streamId, XLINK_READ_REL_REQ,
+        0, NULL, link->deviceHandle);
+
+    XLINK_RET_IF(addEvent(&event));
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t XLinkGetFillLevel(streamId_t streamId, int isRemote, int* fillLevel)
+{
+    xLinkDesc_t* link = getLinkByStreamId(streamId);
+    ASSERT_X_LINK(link != NULL);
+    XLINK_RET_IF_RC(getXLinkState(link) != XLINK_UP,
+        X_LINK_COMMUNICATION_NOT_OPEN);
+
+    streamDesc_t* stream =
+        getStreamById(link->deviceHandle.xLinkFD, streamId);
+    ASSERT_X_LINK(stream);
+
+    if (isRemote) {
+        *fillLevel = stream->remoteFillLevel;
+    }
+    else {
+        *fillLevel = stream->localFillLevel;
+    }
+
+    releaseStream(stream);
+    return X_LINK_SUCCESS;
+}
+
+// ------------------------------------
+// Helpers declaration. Begin.
+// ------------------------------------
+
+XLinkError_t checkEventHeader(xLinkEventHeader_t header) {
+    mvLog(MVLOG_DEBUG, "header.flags.bitField: ack:%u, nack:%u, sizeTooBig:%u, block:%u, bufferFull:%u, localServe:%u, noSuchStream:%u, terminate:%u",
+          header.flags.bitField.ack,
+          header.flags.bitField.nack,
+          header.flags.bitField.sizeTooBig,
+          header.flags.bitField.block,
+          header.flags.bitField.bufferFull,
+          header.flags.bitField.localServe,
+          header.flags.bitField.noSuchStream,
+          header.flags.bitField.terminate);
+
+
+    if (header.flags.bitField.ack) {
+        return X_LINK_SUCCESS;
+    } else if (header.flags.bitField.nack) {
+        return X_LINK_COMMUNICATION_FAIL;
+    } else if (header.flags.bitField.sizeTooBig) {
+        return X_LINK_OUT_OF_MEMORY;
+    } else {
+        return X_LINK_ERROR;
+    }
+}
+
+float timespec_diff(struct timespec *start, struct timespec *stop)
+{
+    if ((stop->tv_nsec - start->tv_nsec) < 0) {
+        start->tv_sec = stop->tv_sec - start->tv_sec - 1;
+        start->tv_nsec = stop->tv_nsec - start->tv_nsec + 1000000000;
+    } else {
+        start->tv_sec = stop->tv_sec - start->tv_sec;
+        start->tv_nsec = stop->tv_nsec - start->tv_nsec;
+    }
+
+    return start->tv_nsec/ 1000000000.0 + start->tv_sec;
+}
+
+XLinkError_t addEvent(xLinkEvent_t *event)
+{
+    ASSERT_X_LINK(event);
+
+    xLinkEvent_t* ev = dispatcherAddEvent(EVENT_LOCAL, event);
+    if (ev == NULL) {
+        mvLog(MVLOG_ERROR, "Dispatcher failed on adding event");
+        return X_LINK_ERROR;
+    }
+
+    if (dispatcherWaitEventComplete(&event->deviceHandle)) {
+        return X_LINK_TIMEOUT;
+    }
+
+    if (event->header.flags.bitField.ack != 1) {
+        return X_LINK_COMMUNICATION_FAIL;
+    }
+
+    return X_LINK_SUCCESS;
+}
+
+XLinkError_t addEventWithPerf(xLinkEvent_t *event, float* opTime)
+{
+    ASSERT_X_LINK(opTime);
+
+    struct timespec start, end;
+    clock_gettime(CLOCK_REALTIME, &start);
+
+    XLinkError_t rc = addEvent(event);
+    if(rc != X_LINK_SUCCESS) {
+        return rc;
+    }
+
+    clock_gettime(CLOCK_REALTIME, &end);
+    *opTime = timespec_diff(&start, &end);
+
+    return X_LINK_SUCCESS;
+}
+
+// ------------------------------------
+// Helpers declaration. End.
+// ------------------------------------
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLinkTool.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLinkTool.h
@@ -0,0 +1,116 @@
+// Copyright (C) 2018-2019 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#ifndef _XLINK_TOOL_H
+#define _XLINK_TOOL_H
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#ifdef NDEBUG  // Release configuration
+
+    #ifndef __PC__
+        #define ASSERT_X_LINK(x)   if(!(x)) { exit(EXIT_FAILURE); }
+        #define ASSERT_X_LINK_R(x, r) ASSERT_X_LINK(x)
+    #else
+        #define ASSERT_X_LINK(x)   if(!(x)) { return X_LINK_ERROR; }
+        #define ASSERT_X_LINK_R(x, r)   if(!(x)) { return r; }
+    #endif
+
+#else   // Debug configuration
+
+    #ifndef __PC__
+        #define ASSERT_X_LINK(x)   if(!(x)) { fprintf(stderr, "%s:%d:\n Assertion Failed: %s\n", __FILE__, __LINE__, #x); exit(EXIT_FAILURE); }
+        #define ASSERT_X_LINK_R(x, r)   ASSERT_X_LINK(x)
+    #else
+    #define ASSERT_X_LINK(x) \
+        if(!(x)) {  \
+            mvLog(MVLOG_ERROR, "%s:%d\n\t Assertion Failed: %s", __FILE__, __LINE__, #x); \
+            return X_LINK_ERROR; \
+        }
+    #define ASSERT_X_LINK_R(x, r) \
+        if(!(x)) { \
+            mvLog(MVLOG_ERROR, "%s:%d\n\t Assertion Failed: %s", __FILE__, __LINE__, #x); \
+            return r; \
+        }
+    #endif
+
+#endif //  NDEBUG
+
+#ifndef XLINK_CHECK_CALL
+#define XLINK_CHECK_CALL(call)  { \
+        int error; \
+        if ((error = (call))) { \
+          mvLog(MVLOG_ERROR, "%s failed with error: %d", #call, error); \
+        } \
+    }
+#endif  // XLINK_CHECK_CALL
+
+#ifndef XLINK_RET_IF
+#define XLINK_RET_IF(call)  { \
+        int error; \
+        if ((error = (call))) { \
+          mvLog(MVLOG_ERROR, "%s failed with error: %d", #call, error); \
+          return error; \
+        } \
+    }
+#endif  // XLINK_RET_IF
+
+#ifndef XLINK_RET_IF_RC
+#define XLINK_RET_IF_RC(call, rc)  { \
+        if ((call)) { \
+          mvLog(MVLOG_ERROR, "%s expression failed", #call); \
+          return rc; \
+        } \
+    }
+#endif  // XLINK_RET_IF_RC
+
+#define CIRCULAR_INCREMENT(x, maxVal) \
+    { \
+         x++; \
+         if (x == maxVal) \
+             x = 0; \
+    }
+
+//avoid problems with unsigned. first compare and then give the nuw value
+#define CIRCULAR_DECREMENT(x, maxVal) \
+    { \
+        if (x == 0) \
+            x = maxVal; \
+        else \
+            x--; \
+    }
+
+#define CIRCULAR_INCREMENT_BASE(x, maxVal, base) \
+    { \
+        x++; \
+        if (x == maxVal) \
+            x = base; \
+    }
+
+//avoid problems with unsigned. first compare and then give the nuw value
+#define CIRCULAR_DECREMENT_BASE(x, maxVal, base) \
+    { \
+        if (x == base) \
+            x = maxVal - 1; \
+        else \
+            x--; \
+    }
+
+#define EXTRACT_IDS(streamId, linkId) \
+    { \
+        linkId = (streamId >> 24) & 0XFF; \
+        streamId = streamId & 0xFFFFFF; \
+    }
+
+#define COMBIN_IDS(streamId, linkid) \
+    streamId = streamId | ((linkid & 0xFF) << 24);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif //_XLINK_TOOL_H
--- a/inference-engine/thirdparty/movidius/XLink/shared/XLink_tool.h
+++ b/inference-engine/thirdparty/movidius/XLink/shared/XLink_tool.h
@@ -1,100 +0,0 @@
-// Copyright (C) 2018-2019 Intel Corporation
-// SPDX-License-Identifier: Apache-2.0
-//
-
-#ifndef _XLINK_TOOL_H
-#define _XLINK_TOOL_H
-
-#ifdef __cplusplus
-extern "C"
-{
-#endif
-#ifdef NDEBUG  // Release configuration
-#ifndef __PC__
-            #define ASSERT_X_LINK(x)   if(!(x)) { exit(EXIT_FAILURE); }
-            #define ASSERT_X_LINK_R(x, r) ASSERT_X_LINK(x)
-        #else
-            #define ASSERT_X_LINK(x)   if(!(x)) { return X_LINK_ERROR; }
-            #define ASSERT_X_LINK_R(x, r)   if(!(x)) { return r; }
-        #endif
-#else   // Debug configuration
-
-#ifndef __PC__
-#define ASSERT_X_LINK(x)   if(!(x)) { fprintf(stderr, "%s:%d:\n Assertion Failed: %s\n", __FILE__, __LINE__, #x); exit(EXIT_FAILURE); }
-#define ASSERT_X_LINK_R(x, r) ASSERT_X_LINK(x)
-#else
-#define ASSERT_X_LINK(x)        if(!(x)) {  \
-            mvLog(MVLOG_ERROR, "%s:%d\n\t Assertion Failed: %s", __FILE__, __LINE__, #x);   \
-            return X_LINK_ERROR;    \
-        }
-#define ASSERT_X_LINK_R(x, r)   if(!(x)) {  \
-            mvLog(MVLOG_ERROR, "%s:%d\n\t Assertion Failed: %s", __FILE__, __LINE__, #x);   \
-            return r;               \
-        }
-#endif
-#endif //  NDEBUG
-
-#ifndef CHECK_MUTEX_SUCCESS
-#define CHECK_MUTEX_SUCCESS(call)  {                                \
-        int error;                                                      \
-        if ((error = (call))) {                                         \
-          mvLog(MVLOG_ERROR, "%s failed with error: %d", #call, error); \
-        }                                                               \
-    }
-#endif  // CHECK_MUTEX_SUCCESS
-
-#ifndef CHECK_MUTEX_SUCCESS_RC
-#define CHECK_MUTEX_SUCCESS_RC(call, rc)  {                         \
-        int error;                                                      \
-        if ((error = (call))) {                                         \
-          mvLog(MVLOG_ERROR, "%s failed with error: %d", #call, error); \
-          return rc;                                                    \
-        }                                                               \
-    }
-#endif  // CHECK_MUTEX_SUCCESS_RC
-
-#define CIRCULAR_INCREMENT(x, maxVal) \
-        { \
-             x++; \
-             if (x == maxVal) \
-                 x = 0; \
-        }
-
-//avoid problems with unsigned. first compare and then give the nuw value
-#define CIRCULAR_DECREMENT(x, maxVal) \
-    { \
-        if (x == 0) \
-            x = maxVal; \
-        else \
-            x--; \
-    }
-
-#define CIRCULAR_INCREMENT_BASE(x, maxVal, base) \
-        { \
-            x++; \
-            if (x == maxVal) \
-                x = base; \
-        }
-//avoid problems with unsigned. first compare and then give the nuw value
-#define CIRCULAR_DECREMENT_BASE(x, maxVal, base) \
-    { \
-        if (x == base) \
-            x = maxVal - 1; \
-        else \
-            x--; \
-    }
-
-#define EXTRACT_IDS(streamId, linkId) \
-    { \
-        linkId = (streamId >> 24) & 0XFF; \
-        streamId = streamId & 0xFFFFFF; \
-    }
-
-#define COMBIN_IDS(streamId, linkid) \
-         streamId = streamId | ((linkid & 0xFF) << 24);
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif //_XLINK_TOOL_H
--- a/inference-engine/thirdparty/movidius/mvnc/include/mvnc.h
+++ b/inference-engine/thirdparty/movidius/mvnc/include/mvnc.h
@@ -52,7 +52,6 @@ typedef enum {
                            //major.minor.hotfix.rc
    NC_RW_COMMON_TIMEOUT_MSEC = 2,
    NC_RW_DEVICE_OPEN_TIMEOUT_MSEC = 3,
-    NC_RW_ALLOC_GRAPH_TIMEOUT_MSEC = 4,
    NC_RW_RESET_ALL = 9000,     // resetAll on initialize
 } ncGlobalOption_t;

--- a/inference-engine/thirdparty/movidius/mvnc/src/mvnc_api.c
+++ b/inference-engine/thirdparty/movidius/mvnc/src/mvnc_api.c
@@ -2596,22 +2596,6 @@ ncStatus_t ncGlobalSetOption(ncGlobalOption_t option, const void *data,
        mvLog(MVLOG_ERROR, "Some of the parameters are NULL");
        return NC_INVALID_PARAMETERS;
    }
-    switch (option) {
-        case NC_RW_LOG_LEVEL:
-        case NC_RW_RESET_ALL:
-        case NC_RW_COMMON_TIMEOUT_MSEC:
-        case NC_RW_DEVICE_OPEN_TIMEOUT_MSEC:
-        case NC_RW_ALLOC_GRAPH_TIMEOUT_MSEC: {
-            if (dataLength < sizeof(int)) {
-                mvLog(MVLOG_ERROR, "The dataLength is smaller that required %zu",
-                      sizeof(int));
-                return NC_INVALID_PARAMETERS;
-            }
-            break;
-        }
-        default:
-            break;
-    }

    switch (option) {
    case NC_RW_LOG_LEVEL:
@@ -2651,15 +2635,6 @@ ncStatus_t ncGlobalSetOption(ncGlobalOption_t option, const void *data,
        }
        break;
    }
-    case NC_RW_ALLOC_GRAPH_TIMEOUT_MSEC: {
-        int gTimeout = *(int *) data;
-        XLinkError_t rc = XLinkSetAllocateGraphTimeOutMsec(gTimeout);
-        if (rc) {
-            mvLog(MVLOG_ERROR, "Set global allocate graph timeout failed, rc = %s\n", XLinkErrorToStr(rc));
-            return NC_INVALID_PARAMETERS;
-        }
-        break;
-    }
    default:
        mvLog(MVLOG_ERROR, "No such option");
        return NC_INVALID_PARAMETERS;
--- a/inference-engine/thirdparty/movidius/shared/include/mvLog.h
+++ b/inference-engine/thirdparty/movidius/shared/include/mvLog.h
@@ -28,19 +28,23 @@
 #include <time.h>

 #ifndef _GNU_SOURCE
-#define _GNU_SOURCE // fix for warning: implicit declaration of function ‘pthread_getname_np’
+#define _GNU_SOURCE
 #endif

-#if (defined(_WIN32) || defined(_WIN64))
+#if (defined (WINNT) || defined(_WIN32) || defined(_WIN64) )
 #include "win_pthread.h"
 #else
+#ifndef __shave__	// SHAVE does not support threads
 #include <pthread.h>
 #endif
+#endif

 #ifdef __cplusplus
 extern "C" {
 #endif
+#ifndef __shave__	// SHAVE does not support threads
 extern int pthread_getname_np (pthread_t , char *, size_t);
+#endif
 #ifdef __cplusplus
 }
 #endif
@@ -50,7 +54,7 @@ extern int pthread_getname_np (pthread_t , char *, size_t);
 #include <rtems/bspIo.h>
 #endif

- // Windows-only
+// Windows-only
 #if (defined (WINNT) || defined(_WIN32) || defined(_WIN64) )
 #define __attribute__(x)
 #define FUNCATTR_WEAK static
@@ -65,10 +69,10 @@ extern int pthread_getname_np (pthread_t , char *, size_t);
 #define _MVLOGLEVEL(UNIT_NAME)  mvLogLevel_ ## UNIT_NAME
 #define  MVLOGLEVEL(UNIT_NAME) _MVLOGLEVEL(UNIT_NAME)

-#define STR(x) _STR(x)
-#define _STR(x)  #x
+#define MVLOG_STR(x) _MVLOG_STR(x)
+#define _MVLOG_STR(x)  #x

-#define UNIT_NAME_STR STR(MVLOG_UNIT_NAME)
+#define UNIT_NAME_STR MVLOG_STR(MVLOG_UNIT_NAME)

 #define ANSI_COLOR_RED     "\x1b[31m"
 #define ANSI_COLOR_GREEN   "\x1b[32m"
@@ -100,7 +104,7 @@ extern int pthread_getname_np (pthread_t , char *, size_t);
 #endif

 #ifndef MVLOG_MAXIMUM_THREAD_NAME_SIZE
-#define MVLOG_MAXIMUM_THREAD_NAME_SIZE 20
+#define MVLOG_MAXIMUM_THREAD_NAME_SIZE 16
 #endif

 typedef enum mvLog_t{
@@ -112,27 +116,37 @@ typedef enum mvLog_t{
    MVLOG_LAST,
 } mvLog_t;

+#ifdef __shave__
+__attribute__((section(".laststage")))
+#endif
 static const char mvLogHeader[MVLOG_LAST][30] =
-{
-    MVLOG_DEBUG_COLOR "D:",
-    MVLOG_INFO_COLOR  "I:",
-    MVLOG_WARN_COLOR  "W:",
-    MVLOG_ERROR_COLOR "E:",
-    MVLOG_FATAL_COLOR "F:"
-};
+    {
+        MVLOG_DEBUG_COLOR "D:",
+        MVLOG_INFO_COLOR  "I:",
+        MVLOG_WARN_COLOR  "W:",
+        MVLOG_ERROR_COLOR "E:",
+        MVLOG_FATAL_COLOR "F:"
+    };

-FUNCATTR_WEAK unsigned int __attribute__ ((weak)) MVLOGLEVEL(MVLOG_UNIT_NAME) = MVLOG_LAST; // not set by default
+// #ifdef __shave__
+// __attribute__((section(".laststage")))
+// #endif
+FUNCATTR_WEAK unsigned int __attribute__ ((weak)) MVLOGLEVEL(MVLOG_UNIT_NAME) = MVLOG_INFO;

-FUNCATTR_WEAK unsigned int __attribute__ ((weak)) MVLOGLEVEL(default) = MVLOG_WARN;
+// #ifdef __shave__
+// __attribute__((section(".laststage")))
+// #endif
+FUNCATTR_WEAK unsigned int __attribute__ ((weak)) MVLOGLEVEL(default) = MVLOG_INFO;

+#ifdef __shave__
+__attribute__((section(".laststage")))
+#endif
 static int __attribute__ ((unused))
 logprintf(enum mvLog_t lvl, const char * func, const int line,
-                     const char * format, ...)
+          const char * format, ...)
 {
-    if((MVLOGLEVEL(MVLOG_UNIT_NAME) == MVLOG_LAST && lvl < MVLOGLEVEL(default)))
-        return 0;
-
-    if((MVLOGLEVEL(MVLOG_UNIT_NAME) < MVLOG_LAST && lvl < MVLOGLEVEL(MVLOG_UNIT_NAME)))
+    if(lvl < MVLOGLEVEL(MVLOG_UNIT_NAME) &&
+       lvl < MVLOGLEVEL(default))
        return 0;

    const char headerFormat[] = "%s [%s] [%10" PRId64 "] [%s] %s:%d\t";
@@ -148,16 +162,27 @@ logprintf(enum mvLog_t lvl, const char * func, const int line,
    va_list args;
    va_start (args, format);

-    char threadName[20] = {0};
+    char threadName[MVLOG_MAXIMUM_THREAD_NAME_SIZE] = {0};
+#if defined MA2450 || defined __shave__
+    snprintf(threadName, sizeof(threadName), "ThreadName_N/A");
+#else
    pthread_getname_np(pthread_self(), threadName, sizeof(threadName));
+#endif

 #ifdef __RTEMS__
    if(!rtems_interrupt_is_in_progress())
    {
 #endif
-        fprintf(stdout, headerFormat, mvLogHeader[lvl], UNIT_NAME_STR, timestamp, threadName, func, line);
-        vfprintf(stdout, format, args);
-        fprintf(stdout, "%s\n", ANSI_COLOR_RESET);
+#if defined __sparc__ || defined __PC__
+    fprintf(stdout, headerFormat, mvLogHeader[lvl], UNIT_NAME_STR, timestamp, threadName, func, line);
+    vfprintf(stdout, format, args);
+    fprintf(stdout, "%s\n", ANSI_COLOR_RESET);
+#elif defined __shave__
+    printf(headerFormat, mvLogHeader[lvl], UNIT_NAME_STR, timestamp, threadName, func, line);
+        printf(format, args);
+        printf("%s\n", ANSI_COLOR_RESET);
+
+#endif
 #ifdef __RTEMS__
    }
    else
@@ -177,6 +202,7 @@ logprintf(enum mvLog_t lvl, const char * func, const int line,
 // Set log level for the current unit. Note that the level must be smaller than the global default
 #define mvLogLevelSet(lvl) if(lvl < MVLOG_LAST){ MVLOGLEVEL(MVLOG_UNIT_NAME) = lvl; }
 // Set the global log level. Can be used to prevent modules from hiding messages (enable all of them with a single change)
+// This should be an application setting, not a per module one
 #define mvLogDefaultLevelSet(lvl) if(lvl < MVLOG_LAST){ MVLOGLEVEL(default) = lvl; }

 #endif
--- a/inference-engine/thirdparty/movidius/watchdog/watchdog.cpp
+++ b/inference-engine/thirdparty/movidius/watchdog/watchdog.cpp
@@ -24,7 +24,7 @@
 #include <list>
 #define _XLINK_ENABLE_PRIVATE_INCLUDE_
 #include <XLinkPrivateDefines.h>
-#include "XLink_tool.h"
+#include "XLinkTool.h"

 namespace {

@@ -106,7 +106,7 @@ public:
 private:
    bool sendPingMessage() {
        XLinkError_t rc = X_LINK_SUCCESS;
-        CHECK_MUTEX_SUCCESS_RC(pthread_mutex_lock(&privateDevice.dev_stream_m), false);
+        XLINK_RET_IF_RC(pthread_mutex_lock(&privateDevice.dev_stream_m), false);

        deviceCommand_t config;
        config.type.c1 = CLASS1_WATCHDOG_PING;
@@ -115,7 +115,7 @@ private:
        // xlink ping acknowledge interval shouldn't be more then expected ping interval
        rc = XLinkWriteDataWithTimeout(privateDevice.device_mon_stream_id, (const uint8_t*)&config, sizeof(config), deviceHangTimeout);

-        CHECK_MUTEX_SUCCESS(pthread_mutex_unlock(&privateDevice.dev_stream_m));
+        XLINK_CHECK_CALL(pthread_mutex_unlock(&privateDevice.dev_stream_m));

        if (rc != X_LINK_SUCCESS) {
            mvLog(MVLOG_ERROR, "Failed send ping message: %s", XLinkErrorToStr(rc));
--- a/inference-engine/tools/benchmark_tool/README.md
+++ b/inference-engine/tools/benchmark_tool/README.md
@@ -1,14 +1,14 @@
-# Benchmark Python* Tool
+# Benchmark Python* Application

-This topic demonstrates how to run the Benchmark Python* Tool, which performs inference using convolutional networks. Performance can be measured for two inference modes: synchronous (latency-oriented) and asynchronous (throughput-oriented).
-
-> **NOTE:** This topic describes usage of Python implementation of the Benchmark Tool. For the C++ implementation, refer to [Benchmark C++ Tool](./inference-engine/samples/benchmark_app/README.md).
+This topic demonstrates how to run the Benchmark Application demo, which performs inference using convolutional networks.

 ## How It Works

-Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend on the mode defined with the `-api` command-line parameter.
+Upon start-up, the application reads command-line parameters and loads a network and images/binary files to the Inference Engine
+plugin, which is chosen depending on a specified device. The number of infer requests and execution approach depend
+on the mode defined with the `-api` command-line parameter.

-> **NOTE**: By default, Inference Engine samples, tools and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).
+> **NOTE**: By default, Inference Engine samples and demos expect input with BGR channels order. If you trained your model to work with RGB order, you need to manually rearrange the default channels order in the sample or demo application or reconvert your model using the Model Optimizer tool with `--reverse_input_channels` argument specified. For more information about the argument, refer to **When to Reverse Input Channels** section of [Converting a Model Using General Conversion Parameters](./docs/MO_DG/prepare_model/convert_model/Converting_Model_General.md).

 ### Synchronous API

@@ -33,12 +33,12 @@ For asynchronous mode, the primary metric is throughput in frames per second (FP

 The infer requests are executed asynchronously. Callback is used to wait for previous execution to complete. The application measures all infer requests executions and reports the throughput metric based on batch size and total execution duration.

-## Run the Tool
+## Running
 Notice that the benchmark_app usually produces optimal performance for any device out of the box.

-**So in most cases you don't need to play the app options explicitly and the plain device name is enough**, for example, for CPU:
-```sh
-python3 benchmark_app.py -m <model> -i <input> -d CPU
+**So in most cases you don't need to play the app options explicitly and the plain device name is enough**, e.g.:
+```
+$benchmark_app -m <model> -i <input> -d CPU
 ```

 But it is still may be non-optimal for some cases, especially for very small networks. More details can read in [Introduction to Performance Topics](./docs/IE_DG/Intro_to_Performance.md).
@@ -121,72 +121,37 @@ If a model has only image input(s), please a provide folder with images or a pat
 If a model has some specific input(s) (not images), please prepare a binary file(s), which is filled with data of appropriate precision and provide a path to them as input.
 If a model has mixed input types, input folder should contain all required files. Image inputs are filled with image files one by one. Binary inputs are filled with binary inputs one by one.

-To run the tool, you can use public or Intel's pre-trained models. To download the models, use the OpenVINO [Model Downloader](./tools/downloader/README.md) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).
+To run the demo, you can use public or pre-trained models. To download the pre-trained models, use the OpenVINO [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader) or go to [https://download.01.org/opencv/](https://download.01.org/opencv/).

-> **NOTE**: Before running the tool with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
+> **NOTE**: Before running the demo with a trained model, make sure the model is converted to the Inference Engine format (\*.xml + \*.bin) using the [Model Optimizer tool](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).

-## Examples of Running the Tool
+For example, to do inference of an image using a trained network with multiple outputs on CPU, run the following command:

-This section provides step-by-step instructions on how to run the Benchmark Tool with the `googlenet-v1` public model on CPU or FPGA devices. As an input, the `car.png` file from the `<INSTALL_DIR>/deployment_tools/demo/` directory is used.  
+```
+python3 benchmark_app.py -i <path_to_image>/inputImage.bmp -m <path_to_model>/multiple-output.xml -d CPU
+```

-> **NOTE:** The Internet access is required to execute the following steps successfully. If you have access to the Internet through the proxy server only, please make sure that it is configured in your OS environment.
-
-1. Download the model. Go to the the Model Downloader directory and run the `downloader.py` script with specifying the model name and directory to download the model to:
-   ```sh
-   cd <INSTAL_DIR>/deployment_tools/open_model_zoo/tools/downloader
-   ```
-   ```sh
-   python3 downloader.py --name googlenet-v1 -o <models_dir>
-   ```
-2. Convert the model to the Inference Engine IR format. Go to the Model Optimizer directory and run the `mo.py` script with specifying the path to the model, model format (which must be FP32 for CPU and FPG) and output directory to generate the IR files:
-   ```sh
-   cd <INSTALL_DIR>/deployment_tools/model_optimizer
-   ```
-   ```sh
-   python3 mo.py --input_model <models_dir>/public/googlenet-v1/googlenet-v1.caffemodel --data_type FP32 --output_dir <ir_dir>
-   ```     
-3. Run the tool with specifying the `<INSTALL_DIR>/deployment_tools/demo/car.png` file as an input image, the IR of the `googlenet-v1` model and a device to perform inference on. The following commands demonstrate running the Benchmark Tool in the asynchronous mode on CPU and FPGA devices:
-   
-   * On CPU:
-   ```sh
-    python3 benchmark_app.py -m <ir_dir>/googlenet-v1.xml -d CPU -api async -i <INSTALL_DIR>/deployment_tools/demo/car.png --progress true -b 1
-   ```
-   * On FPGA:
-   ```sh
-   python3 benchmark_app.py -m <ir_dir>/googlenet-v1.xml -d HETERO:FPGA,CPU -api async -i <INSTALL_DIR>/deployment_tools/demo/car.png --progress true -b 1
-   ```
+## Demo Output

 The application outputs number of executed iterations, total duration of execution, latency and throughput.
 Additionally, if you set the `-pc` parameter, the application outputs performance counters.
 If you set `-exec_graph_path`, the application reports executable graph information serialized.

-Below are fragments of sample output for CPU and FPGA devices: 
-* For CPU:
-   ```
-   [Step 8/9] Measuring performance (Start inference asyncronously, 60000 ms duration, 4 inference requests in parallel using 4 streams)
-   Progress: |................................| 100.00%
+```
+[Step 8/9] Measuring performance (Start inference asyncronously, 60000 ms duration, 4 inference requests in parallel using 4 streams)
+Progress: |................................| 100.00%

-   [Step 9/9] Dumping statistics report
-   Progress: |................................| 100.00%
+[Step 9/9] Dumping statistics report
+Progress: |................................| 100.00%

-   Count:      4408 iterations
-   Duration:   60153.52 ms
-   Latency:    51.8244 ms
-   Throughput: 73.28 FPS
-   ```
-* For FPGA:
-   ```
-   [Step 10/11] Measuring performance (Start inference asyncronously, 5 inference requests using 1 streams for CPU, limits: 120000 ms duration)
-   Progress: |................................| 100%
+Count:      4408 iterations
+Duration:   60153.52 ms
+Latency:    51.8244 ms
+Throughput: 73.28 FPS

-   [Step 11/11] Dumping statistics report
-   Count:      98075 iterations
-   Duration:   120011.03 ms
-   Latency:    5.65 ms
-   Throughput: 817.22 FPS
-   ```
+```

 ## See Also
 * [Using Inference Engine Samples](./docs/IE_DG/Samples_Overview.md)
 * [Model Optimizer](./docs/MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md)
-* [Model Downloader](./tools/downloader/README.md)
+* [Model Downloader](https://github.com/opencv/open_model_zoo/tree/2018/model_downloader)
--- a/inference-engine/tools/calibration_tool/statistics_collector/main.cpp
+++ b/inference-engine/tools/calibration_tool/statistics_collector/main.cpp
@@ -178,10 +178,15 @@ int main(int argc, char *argv[]) {
        showUsage();
        return ex.exitCode();
    } catch (const UserExceptions& ex) {
-        slog::err << "Input problems: \n" << ex.what() << slog::endl;
-        showUsage();
-        if (!ex.list().empty())
+        if (ex.list().size() == 1) {
+            slog::err << "Input problem: " << ex.what() << slog::endl;
+            showUsage();
            return ex.list().begin()->exitCode();
+        } else {
+            slog::err << "Input problems: \n" << ex.what() << slog::endl;
+            showUsage();
+            return ex.list().begin()->exitCode();
+        }
    } catch (const std::exception& ex) {
        slog::err << ex.what() << slog::endl;
        return 1;
--- a/inference-engine/tools/calibration_tool/statistics_collector/statistics_processor.cpp
+++ b/inference-engine/tools/calibration_tool/statistics_collector/statistics_processor.cpp
@@ -295,10 +295,7 @@ void StatisticsCollector::fillBlobs(StatisticsCollector* collectorInstance) {
        progress_step = 100lu;
    collectorInstance->_consoleProgress = std::make_shared<ConsoleProgress>(img_number);

-    auto inpuInfo = collectorInstance->_cnn_network->getInputsInfo();
-    if (inpuInfo.empty())
-        THROW_IE_EXCEPTION << "Input info is empty";
-    TensorDesc inputDesc = inpuInfo.begin()->second->getTensorDesc();
+    TensorDesc inputDesc = collectorInstance->_cnn_network->getInputsInfo().begin()->second->getTensorDesc();
    const Precision::ePrecision inputPrecision = inputDesc.getPrecision();

    PreprocessingOptions preprocessingOptions;
--- a/model-optimizer/README.md
+++ b/model-optimizer/README.md
@@ -3,6 +3,12 @@
 Project structure:
 <pre>
    |-- root
+        |-- docs
+            |-- Model_Optimizer_Developer_Guide - md files of documentation for the Model Optimizer
+                |-- img
+                |-- prepare_model
+                    |-- convert_model
+                    |-- customize_model_optimizer
        |-- extensions
            |-- front/caffe
                |-- CustomLayersMapping.xml.example - example of file for registering custom Caffe layers in 2017R3 public
@@ -61,6 +67,22 @@ Model Optimizer requires:
    pip3 install -r requirements.txt
    </pre>

+4. [OPTIONAL] If you use Windows OS, most probably you get python version of `protobuf` library. It is known to be rather slow,
+   and you can use a boosted version of library by building the .egg file (Python package format) yourself,
+   using instructions below (section 'How to boost Caffe model loading') for the target OS and Python, or install it
+   with the pre-built .egg (it is built for Python 3.4, 3.5, 3.6, 3.7):
+    <pre>
+        python3 -m easy_install protobuf-3.6.1-py3.6-win-amd64.egg
+    </pre>
+
+   It overrides the protobuf python package installed by the previous command.
+
+   Set environment variable to enable boost in protobuf performance:
+    <pre>
+        set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp
+    </pre>
+
+
 ## Command-Line Interface (CLI)

 The following short examples are framework-dependent. Please read the complete help
@@ -79,10 +101,6 @@ There are several scripts that convert a model:

 4. <code>mo_tf.py</code> -- dedicated script for TensorFlow models conversion

-5. <code>mo_onnx.py</code> -- dedicated script for ONNX models conversion
-
-6. <code>mo_kaldi.py</code> -- dedicated script for Kaldi models conversion
-
 <code>mo.py</code> can deduce original framework where input model was trained by an extension of
 the model file. Or <code>--framework</code> option can be used for this purpose if model files
 don't have standard extensions (<code>.pb</code> - for TensorFlow models, <code>.params</code> - for MXNet models,
@@ -121,18 +139,8 @@ dedicated entry point <code>mo_mxnet.py</code>:

 > **NOTE**: for TensorFlow* all Placeholder ops are represented as Input layers in the final IR.

-### Convert ONNX* model
-
-The Model Optimizer assumes that you have an ONNX model that was directly downloaded from a public repository or converted from any framework that supports exporting to the ONNX format.
-
-Use the mo_onnx.py script to simply convert a model with the path to the input model .onnx file:
-
-<pre>
-    python3 mo_onnx.py --input_model model-file.onnx
-</pre>
-
-Input channels re-ordering, scaling, subtraction of mean values and other preprocessing features
-are not applied by default. To pass necessary values to Model Optimizer, please run <code>mo.py</code>
+Input channels re-ordering, scaling, subtraction of mean values and other precprocessing features
+are not applied by default. To pass necessary values to Model Optmizer, please run <code>mo.py</code>
 (or <code>mo_tf.py</code>, <code>mo_caffe.py</code>, <code>mo_mxnet.py</code>) with <code>--help</code> and
 examine all available options.

@@ -143,6 +151,9 @@ The whole workflow and more documentation on the structure of IR are documented
 of Inference Engine. Note that sections about running Model Optimizer refer to the old version
 of the tool and can not be applied to the current version of Model Optimizer.

+
+## Setup development environment
+
 ### How to run unit-tests

 1. Run tests with:
@@ -150,5 +161,30 @@ of the tool and can not be applied to the current version of Model Optimizer.
    python -m unittest discover -p "*_test.py" [-s PATH_TO_DIR]
 </pre>

---
-\* Other names and brands may be claimed as the property of others.
+### How to capture unit-tests coverage
+
+1. Run tests with:
+<pre>
+    coverage run -m unittest discover -p "*_test.py" [-s PATH_TO_DIR]
+</pre>
+
+2. Build html report:
+<pre>
+    coverage html
+</pre>
+
+### How to run code linting
+
+1. Run the following command:
+<pre>
+    pylint mo/ mo.py
+</pre>
+
+### How to check requirements dependencies 
+
+1. Run the following command:
+<pre>
+    safety check -r requirements_file
+</pre>
+
+> **NOTE**: here <code>requirements_file</code> is one of the following: <code>requirements.txt</code>, <code>requirements_caffe.txt</code>, <code>requirements_tf.txt</code>, <code>requirements_mxnet.txt</code>, <code>requirements_dev.txt</code>.
--- a/model-optimizer/extensions/back/CreateConstNodes.py
+++ b/model-optimizer/extensions/back/CreateConstNodes.py
@@ -13,9 +13,15 @@
 See the License for the specific language governing permissions and
 limitations under the License.
 """
+import logging as log
+import numpy as np
+
 from mo.back.replacement import BackReplacementPattern
-from mo.front.extractor import update_ie_fields
-from mo.graph.graph import *
+from mo.graph.graph import Node, Graph
+from mo.middle.passes.convert_data_type import data_type_str_to_np
+from mo.ops.const import Const
+from mo.utils.error import Error
+from mo.utils.utils import refer_to_faq_msg


 class CreateConstNodesReplacement(BackReplacementPattern):
@@ -61,20 +67,19 @@ class CreateConstNodesReplacement(BackReplacementPattern):
            if node.has_valid('value'):
                const_node_name = graph.unique_id(node.id + '_const')
                log.debug("Added Const node '{}'".format(const_node_name))
-                graph.add_node(const_node_name, name=const_node_name, type='Const', kind='op', op='Const',
-                               precision="FP32")
-                Node(graph, const_node_name).add_output_port(0)
-                Node(graph, const_node_name).add_input_port(0)
-                update_ie_fields(node.graph.node[const_node_name])
+                const_node = Const(graph, {'name': const_node_name, 'value': node.value}).create_node()
+                const_node.add_input_port(0)
                graph.add_edges_from([(const_node_name, node.id, {'out': 0})])

                copy_data_node_name = graph.unique_id(node.id + '_copy_')
-                graph.add_node(copy_data_node_name, kind='data', precision="FP32", shape=np.array(node.shape),
-                               value=np.array(node.value))
+                graph.add_node(copy_data_node_name, kind='data', shape=node.shape, value=node.value)

                if node.has_valid('force_precision'):
                    Node(graph, copy_data_node_name)['force_precision'] = node.force_precision
-                    Node(graph, const_node_name)['force_precision'] = node.force_precision
+                    const_node['force_precision'] = node.force_precision
+                    const_node.out_port(0).set_data_type(data_type_str_to_np(const_node['force_precision']))
+                else:
+                    const_node.type_infer(const_node)
                graph.add_edges_from([(copy_data_node_name, const_node_name, {'in': 0, 'bin': 'custom'})])
            elif not self._check_that_node_from_body(node):
                log.debug('node = {}'.format(node.graph.node[node.id]))
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Alexey Suhov	890bbb0bdb	removed opencv from benchmark_app; disabled vpu, gna, gpu plugin builds; disabled opencv by default	2019-10-07 16:27:36 +03:00
Alexey Suhov	4c2c520958	create pre-release branch	2019-10-04 20:16:04 +03:00