This pull request introduces a significant rewrite to the Get Started page. The rewrites re-organize the content to add a learning path for new users and provides more links to tutorials and features.
Details:
The same HTML and CSS code is used for the top portion of the page to create the three blue display blocks. Markdown is used to implement the rest of the page.
* Porting #14130
Porting
https://github.com/openvinotoolkit/openvino/pull/14130
This PR addresses the https://jira.devtools.intel.com/browse/CVS-75090 ticket in Jira.
Installation steps in the article have been updated, a troubleshooting section and additional resources have been added.
* reverting the steps
Reverting the installation steps to previous order.
Emphasizing that Step 2 is an example to create the minimal image.
* correcting numbering
Removing the 'global' tabs and preparing a language-agnostic version of the article. Replacing png image with a scalable svg file. Proofreading the article.
* Improving Readability of Further Low-Level Implementation Details
The changes include recreation of the graphics to improve the readability of the article. Minor proofreading corrections have been applied as well.
Fixed the following issues:
Switching between C++ and Python docs for "Shape Inference",
Removed repetitions,
Quote background in bullet list at the beginning of "Multi-device execution",
Broken note directives,
Fixed video player size in "Inference with OpenVINO Runtime",
Standardize Additional Resources throughout Runtime Inference.
* Updating NNCF documentation
* nncf-doc-update-ms
* Adding python files
* Changing ID of Range Supervision
* Minor fixes
Fixing formatting and renaming ID
* Proofreading
Minor corrections and removal of Neural Network Compression Framework article
Co-authored-by: msmykx <101244365+msmykx-intel@users.noreply.github.com>
* Test change
* New change
* Disabled docs for linux
* Added new file to check
* Try to fix CI
* Additional try
* Remove redundant change
* Fixed configuration
* Enabled for .ci changes
* Revert "Added new file to check"
This reverts commit da05ad4bd4.
* Revert "Test change"
This reverts commit 6f670d6112.
* Revert "New change"
This reverts commit efeccd6537.
* Update CI trigger rules
* Code test change
* Revert "Code test change"
This reverts commit 086bde7ca8.
* Test change
* Fixed CI
* Revert "Test change"
This reverts commit c72c9077cd.
* DOCS: Fix in Protecting Model
A small fix for a not working reference link to the schematic in "Experimental: Protecting Deep Learning Model " article
* Update README.md
* DOCS: Fixing Model Representation for 22.2
Fixing the snippets in tabs.
A follow up of:
https://github.com/openvinotoolkit/openvino/pull/12495/
* Update model_representation.md
Changing "See Also" to "Additional Resources"
* Update model_representation.md
* Update model_representation.md
* Update model_representation.md
* Update model_representation.md
* DOCS: Fixing link to MobileNetV1 FPN for 22.2
A small fix for a broken link to MobileNetV1 FPN model in "Quantizing Object Detection Model with Accuracy Control" article
* Update README.md
Fixing broken code block.
* port fix from master
* Revert "port fix from master"
This reverts commit 903abd946a.
* Revert "Revert "port fix from master""
This reverts commit 63e1e944a0.
* DOCS-doc_structure_step_2
- adjustments to the previous change based on feedback
- changes focusing on ModelOptimizer section to mitigate the removal of ONNX and PdPd articles
* remove 2 files we brought back after 22.1
* change hello reshape ssd sample (#12657)
ssdlite_mobilenet_v2 changed to mobilenet-ssd, as per J. Espinoza's request, to fix
84516
* one more correction of mobilnet
* [Frontend, TF FE] Fix RTTI for ConversionExtension on MacOS
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Put only destructor into cpp
* Remove extra white-space
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Revert "Fix experimental detectron do ref impl (#10621)"
This reverts commit d87233863d.
* Disabled Experimental Detectron per agreement with GPU team. Ticket to fix it: 90209
* Add Overview page
* Revert "Add Overview page"
* init (#11985)
* [GPU] Pass convolution unit tests on DG2 (#12056)
* scale -> eltwise
* Proofreading-OV-Runtime (#11658)
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/protecting_model_guide.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/ARM_CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/optimization_guide/dldt_deployment_optimization_common.md
Co-authored-by: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/Device_Plugins.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GPU_RemoteTensor_API.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/HDDL.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/HDDL.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/HDDL.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/MYRIAD.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/MYRIAD.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/MYRIAD.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/ov_dynamic_shapes.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/config_properties.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/config_properties.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/performance_hints.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/deployment/deployment-manager-tool.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Apply suggestions from code review
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/performance_hints.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/preprocessing_details.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/performance_hints.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Update docs/OV_Runtime_UG/deployment/deployment-manager-tool.md
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Apply suggestions from code review
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Apply suggestions from code review
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Apply suggestions from code review
* Update ref links
* Update Getting_performance_numbers.md
* Update deployment_intro.md
* Update preprocessing_details.md
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/deployment/deployment-manager-tool.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update tools/pot/openvino/tools/pot/algorithms/quantization/default/README.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update automatic_batching.md
* Update docs/OV_Runtime_UG/automatic_batching.md
* Update docs/OV_Runtime_UG/ShapeInference.md
* Update deployment-manager-tool.md
* Update deployment-manager-tool.md
* Update docs/OV_Runtime_UG/deployment/deployment-manager-tool.md
* Update automatic_batching.md
* Update automatic_batching.md
* Update docs/OV_Runtime_UG/ShapeInference.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update integrate_with_your_application.md
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/model_representation.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/model_representation.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update integrate_with_your_application.md
* Update docs/OV_Runtime_UG/integrate_with_your_application.md
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update model_representation.md
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update integrate_with_your_application.md
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update Additional_Optimizations.md
Removing redundant information.
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update Additional_Optimizations.md
* Apply suggestions from code review
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update Additional_Optimizations.md
* Update docs/OV_Runtime_UG/model_representation.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/OV_Runtime_UG/layout_overview.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update model_representation.md
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/SaturationIssue.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/openvino/tools/pot/algorithms/quantization/accuracy_aware/README.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/SaturationIssue.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/SaturationIssue.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/SaturationIssue.md
* Update tools/pot/docs/SaturationIssue.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update README.md
* Update README.md
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/Introduction.md
* Update tools/pot/docs/AccuracyAwareQuantizationUsage.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Removing one-liners
Removing introductory sentences from 'Supported Features' sections.
* Update docs/OV_Runtime_UG/openvino_intro.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Update docs/benchmarks/performance_benchmarks_ovms.md
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/Introduction.md
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* Update tools/pot/docs/DefaultQuantizationUsage.md
* Update tools/pot/docs/BestPractices.md
* Update tools/pot/docs/BestPractices.md
* Update tools/pot/docs/AccuracyAwareQuantizationUsage.md
* Update docs/optimization_guide/model_optimization_guide.md
* Update docs/optimization_guide/dldt_deployment_optimization_guide.md
* Update docs/OV_Runtime_UG/supported_plugins/config_properties.md
* Update docs/OV_Runtime_UG/supported_plugins/GNA.md
* Update docs/OV_Runtime_UG/supported_plugins/CPU.md
* Update docs/OV_Runtime_UG/preprocessing_usecase_save.md
* Apply suggestions from code review
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: msmykx <101244365+msmykx-intel@users.noreply.github.com>
Co-authored-by: Piotr Milewski <piotr.milewski@intel.com>
* updated to fuse activation in eltwise_vload8 (#12084)
* [GPU] Fix gather data type issue (#12085) (#12085)
* setting tput as the default performance mode only for AUTO, excluding MULTI plugin. (#12083)
Signed-off-by: ywang2 <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* [C API][COVERITY SCAN]Fix the TAINTED_SCALAR and DEADCODE in Coverity Scan (#12087)
* Fix the Coverity scan issues
* Fix the insecure data handling (TAINTED_SCALAR) issue found in coverity scan
* [hotfix] pytest error of act_act example (#12093)
* [hotfix] pytest error of act_act example
* remove needless import
* NonZero operation: uncomment tests since they can be passed now (#11548)
* NonZero operation: uncomment tests since they can be passed now
# Conflicts:
# src/tests/functional/plugin/cpu/shared_tests_instances/skip_tests_config.cpp
* Unbreak tests once more by changing base class from LayerTestsCommon to SubgraphBaseTest
* Unbreak compilation / style
* Add test case for cache
Co-authored-by: Chenhu Wang <chenhu.wang@intel.com>
* Increase zeroes count for NonZero tests
* Correct the change
* Remove my previous changes and add dynamic shapes / repeatable shapes into the correct file
Co-authored-by: Chenhu Wang <chenhu.wang@intel.com>
* [SAMPLES] Remove unused commandline arguments for speech_sample (#11892)
* GNA SF propagation fix (#11806)
* Fix the uninitialized value issue found in Coverity Scan (#12098)
* [GPU] Assign-6 and ReadValue-6 (#11780)
* Add methods for access to varables information in Program class
* add ReadValue and Assign primitives
* ReadValue and Assign implementations
* Implementation of memory states allocation
* Add output existance check in primitive_inst to avoid crashes if output is set during execution
* Add memory states management functionality in network component
* Integration of memory states feature in inference request component
* Exclude constant path for read_value and assign nodes in cldnn transformations
* Improve memory states test to run on a single inference request
* unit tests for ReadValue and Assign
* single-layer test for ReadValue and Assign
* Add QueryState API implementation
* Add memory state test which covers dynamic batch case
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
* [GNA] Add automatic model splitting for compiled graphs (#12001)
* DOCS-code-reference-css-style-change (#12109)
code formatting changed from blue to black, to distinguish from links
* Virtual destructor for the base class (#12102)
* [GPU] Pass Resample unit tests on DG2 (#12052)
* fix validate_fusings_gpu error
* fix biased scale testcase
* [GPU] Pass lrn unit tests on DG2 (#11986)
* [GPU] Pass reduce unit tests on DG2 (#12086)
* scale to eltwise
* [CPU] Move cpu_dump_check into CPU plugin's tools folder (#12100)
* Move cpu_dump_check into CPU plugin's tools folder
* remove cpu from names
* Update README
* Zlib update to 1.12.2 (#12128)
* [GNA] Reduce impact of sf propagation fix (#12115)
* [GPU] Simplify namespaces in the plugin part (#12121)
* [GNA] Add support for future devices with relaxed capabilities (#12000)
* [GPU] Pass eltwise unit tests on DG2 (#12113)
* check fusion in onednn too
* [GPU] modify fusing condition for reduce (#12119)
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Enable tensor offset to GemmKernelRef for input padding support (#12133)
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [PYTHON][BENCHMARK_APP] Add BGR covert to Gray function (#12118)
* Fix the JIRA 80700 issue. Add BGR covert to Gray function
* Support NCHW and NHWC
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* [CPU] revert pr 11990 and enable brgconv avx512 on SPR by default (#12105)
* polish onednn cc readme (#12114)
* [ONNX] Add operator com.microsoft.Fusedgemm support into frontend/onnx (#11878)
* [GPU] Implement NMS-9 operation (#11890)
* Fix GPU NonMaxSuppression implementation
* Introduce Nms9 single layer tests
* Adapt internal NMS and GPU implementation for NMS9 implementation
* Adapt CPU implementation in GPU for NMS9
* Add blocked layouts support to NMS
* Add unit tests for blocked formats for NMS
* Fix boxes groups size for the small shapes
* Use ocl implementation for blocked layout input
* Fix templates typedefs to pass win build
* Fix second output to set data in correct format
* [POT] optimizer - update usage of IndexSampler (#12146)
* Revert "[GPU] Pass activation unit tests on DG2 (#11969)" (#12167)
This reverts commit 3334e8933c.
* Fix IRDFT for case when axes are in reversed order (#12155)
* [MO] Fix output shape bug in GatherNDDecomposition (#12110)
* [GPU] Add reorder from i32 to f32 for max-pooling/conv/fc which doesn't support i32 (#12137)
* Update pypi.org pages (#12170)
* fix references
* update links
* update the wording to be more clear
* add the error message about Visual studio back
* update links to static html links of 2022.2
* Ubuntu 22.04 support (#11472)
* Ubuntu 22.04 support
* Try to fix setuptools
* Try to fix arm
* Try to add more packages
* Test 2
* test 3
* Turn dependnecies download off
* Fix
* Fix
* Fix
* Fix
* Fix
* test
* Fix
* restore eveything
* Try to restore
* Restore install_openvino_dependencies.sh
* clean-up raspbian
* New if conditions
* Removed excess dependencies
* COsmetic chnages
* Removed autools
* Removed libgkt-2
* Added HDLDL libs
* Test
* Removed some dependnecies
* Better fixes
* Removed some dependencies
* Fixed compilation
* Removed all extra
* [GPU] optimize permute_ref (#12159)
* change memory access pattern of fsv layout for permute
* Fix permute_ref to process F first only when (bf...) => (b...f)
* Refactor
Co-authored-by: si-eun-kim <sieun.kim@intel.com>
* Update of naming of the last operators in the graph (#12139)
* Update opset.md with opset9 (#12169)
* [GPU] integrate persistent caching for onednn (#12094)
* integrate persistant caching for onednn
* add api to save/load binary file.
* Check memory allocation size of network graph (#11911)
+ Add exception handling for out of resource
* TI repetative shape inference (#12178)
* Fixes for system libraries pugixml, tbb (#12206)
* Fixes for system libraries pugixml, tbb
* Added more dependencies for core
* Debian packages: base version (#11387)
* Xp/benchmark app ocl (#12112)
* Add some tip description about enable OpenCL for benchmark_app.
Signed-off-by: xipingya <xiping.yan@intel.com>
* Export doesn't work, we need to add -Dopencl_root_hints=[PATH]/OpenCL-CLHPP/include to cmake command.
Signed-off-by: xipingya <xiping.yan@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
* ONNX: Pass name to the InputEdge (#12177)
* [IE TESTS][CONFORMANCE] Fix OpImplCheck Precision (#12148)
* add new article for using binaries
* [PyOV][DOCS] Python API contribution and developer guide (#12145)
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
* [DOC][CPU] Denormals optimization doc (#12127)
* Use system pugixml where it's possible (#12218)
* Restore FEM to be static instance (#12219)
* Restore FEM to be static instance
* Restore frontend manager in ie_read_network.cpp
* [MO] Fix TopK partial shape inference with dynamic K (#12212)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [CPU] Fixed heap sort bug regarding heapifying (#12221)
* [CPU] Explicitly enable DNNL_VERBOSE only in case of CPU_DEBUG_CAPS (#12108)
* [GNA] Fixed convolutions with shared transpose and un-fuse-able activations after Convolution filter (Renew PR11373) (#12152)
* Commits from PR11373:
Fixed handling of transpose after convolution
[GNA] Fixed calculation of dimensions for ConvolutionFilter and PWL primitives
[GNA] Fixed coverity error and failed tests
* Apply comments
* Update src/plugins/intel_gna/gna_graph_compiler.cpp
Co-authored-by: Marcin Kusmierski <marcin.kusmierski@intel.com>
* Update src/plugins/intel_gna/gna_graph_compiler.cpp
Co-authored-by: Marcin Kusmierski <marcin.kusmierski@intel.com>
* Rollback names
* Separate test data
* Move coverity issue to separate request
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
Co-authored-by: Marcin Kusmierski <marcin.kusmierski@intel.com>
* [GNA] Fix accuracy degradation in compact mode (#12150)
* [TF FE] Handle optional attributes for Convolutional operations (#12230)
* [TF FE] Handle optional attributes for Convolutional operations
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* update the information for pypi.org pages
* [GPU] ROIAlign v9 support (#11899)
* ROIAlign v9 support
* Code changes after review1
* Code changes after review2
* fix of single layer test for Windows
* Since PR #12043 we don't need strong include order of primitive_base.hpp and
impls/implementation map.hpp anymore
* Code changes after review3
* Code changes after review4
* update the verifying checksum step
* Fixed WIndows backslash paths (#12250)
* update install_dir info
* Move GNU build flag to "cmake/developer_package/compile_flags/sdl.cmake" (#12143)
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
* [MO] Fix Mul fusion with dynamic dimension (#12253)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* updates
* update wording for pypi.org
* Fixed newAPI for case if core was removed (#12207)
* Fixed newAPI for case if core was removed
* Fixed code style
* Fixed typo
* Use new API by default
* Create core with template plugin
* Added doxygen comment
* Install user provided TBB as well (#12260)
* Disable loading of v7 reader for new IR versions (#12252)
* Disable loading of v7 reader for new IR versions
* Try to fix CI
* Fixed PDPD frontend
* Fixed error message creation
* Fixes for cases when TBB_DIR env var is set (#12266)
* Fixes for cases when TBB_DIR env var is set
* Don't use make in build_samples.sh script
* [GPU] Get rid of direct layout::size field usages (#12172)
* [GPU] Get rid of direct layout::size field usages to simplify further replacement
* [GPU] Enabled -Wall and resolved compiler complaints
* Update summarize.py (#12175)
* [CPU] Add RDFT and IRDFT operators (#12099)
* [CPU] Add RDFT and IRDFT operators
Tickets: 79178 and 79192
Co-authored-by: Mateusz Bencer <mateusz.bencer@intel.com>
* Remove Interpolate Transposes as it does nothing (#12205)
* [TF FE] Implement LinSpace and BatchMatMul translators (#12271)
* [TF FE] Implement LinSpace and BatchMatMul translators
It helps to convert STN model (from e2e testing) using TensorFlow frontend
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix BatchMatMul translator
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix LinSpace operation translator
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update error message on pypi.org (#12243)
* Add Overview page
* Revert "Add Overview page"
* fix references
* update links
* update the wording to be more clear
* add the error message about Visual studio back
* update links to static html links of 2022.2
* port changes to master
* update description
* update commands and uninstallation
* Add const fold check in operators instead pass (#12189)
* Add const fold check in operators instead pass
- refactor constant fold pass to using ov instead of ngraph
- add constant_folding_is_disabled overload for raw pointer
* Remove Reshape from skip const inferences
in legacy graph transformer
* Const fold test for modified operators
* [GPU] Use int64_t type for axis in softmax (#12287)
* remove obsolete info from source files to avoid confusion
* [DOC] [CPU] Proofreading for grammatical and stylistic corrections (#12288)
* Porting to master - update -readme for CPP and Python benchmark (#12245)
Porting #11961
* Fixed build_samples.sh not to call setupvars.sh for Debian package case (#12309)
* Investigate GNA tests (#12267)
* Test commit
* Revert "Disable loading of v7 reader for new IR versions (#12252)"
This reverts commit cb6ca7bb89.
* Revert "Test commit"
This reverts commit 977b83f2ba.
* [PyOV] Test refactoring (#12248)
* [GNA] Add missing support for batch normalization with weights broadcasting. Add unit tests. (#12301)
* Xiaoxia/onetbb old version (#12303)
* support oneTBB old version
* fix oneTBB version mismatch issues
* fix clang issue
* add 'tbb' path to setupvars.sh and OpenVINOConfig.cmake.in
* Update scripts/setupvars/setupvars.sh
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* simple Windows installer POC (#12308)
* Fixes for cases when TBB_DIR env var is set
* Don't use make in build_samples.sh script
* First version of Windows installer
* WIndows NSIS installer
* [GPU] Fix get_default_params & choose_impl not to dependent on program_node (#12239)
* Getting rid of dependency from get_default_param for typed_program_node
* Fix bug
* Enable two pathes to call choose_impl / does_possible_impl_exists / does_an_impl_exists to be able to use given layout
* Replaced impl factory API to get kernel_impl_param's pointer
* Update for recently added primitives
* Add and apply optional_layout
* fix kernel_param_impl to be handled as unique_ptr
* Applied review comments
* Fix rebase conflict
* Fix CI error
* [CC]Fix CC issue for transformation (#12292)
* Revert "Fixed 3 naming issue"
This reverts commit a92d3cfff5.
* Revert "Fix CC issues for transformation and snippets"
This reverts commit d08a3f5aac.
* Fix NGRAPH_PASS_CALLBACK issue to make it can work
* Fix matcher name missing issue
* [TF FE] Fix conversion of NetVLAD model (#12328)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [MO] Fix broken port numbering for Constant operations (#12318)
* Restore inputs order in IR Reader
* Fix broken port numbering for Constant operations
Co-authored-by: Chetverikov <anton.chetverikov@intel.com>
* [GPU] Align TopK parameters with ngraph (#12278)
* [GPU] Use int64_t type for axis in CumSum (#12306)
* [GPU] Use int64_t type for axis in ScatterElementsUpdate (#12323)
* Bump OMZ submodule to fix pip-conflicts ssues (#12320)
* [PyOV] Enable type casters (#12204)
* add type caster for ov::Layout, enable load method to take pathlibs.Path as arugment
* fix typo
* fix style
* add missing blank line
* add common function to check if py::object is either Path or string
* fix style
* Update src/bindings/python/src/pyopenvino/graph/preprocess/pre_post_process.hpp
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
* add tests, fix style, remove pointer argument overload
* fix style
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
* [GNA] Replace GNA SoftSign by opset9 SoftSign (#12302)
* Replace GNA SoftSign by opset9 SoftSign
* v9 -> opset9
* [GPU] ScatterUpdate axis alignment (#12233)
* [GPU] added is_dynamic methods to program_node and primitive_inst. Minor refactoring (#12322)
* updates
* [GPU] Remove dependency to typed_program_node from calc_output_layout (#12378)
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Use static pointers to frontend libraries (#12235)
* Add static shared_objects map in FEM
- add unit tests for frontend lib close
- not use static FEM in ie network reader
- add main for gtest which can use manifest file to filter tests
* Move library pointers map to manger impl
- add to manger impl method to make frontend from loaded plugin
* Add shutdown function to ov namespace
it cleans the static resources
* Revert changes related to linking mian for tests
* Add python binding to ov::openvino_shutdown
* Renamed shutdown method and added to legacy C++ API
(cherry picked from commit a8395bd207)
* Added C bindings
(cherry picked from commit d2c9ddc263)
* Move frontend lib close test to ieFunctTest
- moved to not introduced new test binary and modification on CI
the frontend tests use dynamic linked frontend lib which is load
on test application start and mask lib close tests
- remove gtest_main_manifest as not required now
- add ov::shutdown test to expect application crash
* Fix lib_close test
- remove not get_disabled_tests from utils
- revert CMake file formating
* Fix get model path in lib close tests
* Skip frontend lib close tests if static lib build
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
* Decompose NormalizeL2 on GPU (#12361)
* [TF FE] Implement translators for TensorFlow ConvBackpropInput operations (#12356)
* [TF FE] Implement ConvBackPropInput translators
Now the translators supports dynamic input_sizes attribute and different padding modes
including EXPLICIT mode
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix clang-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback and fix build issues
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback: check for input size
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix retrieving explicit_padding attribute
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code style
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Add debug log showing the result transformation callback (#12365)
* [AUGRU] AUGRUCell/Sequence op specification (#12162)
* [GPU] Add exception handling for calc_output_layout (#12393)
* Add exception handling for calc_output_layout
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Apply comment to error handler
Signed-off-by: Andrew Park <andrew.park@intel.com>
* [GPU]get data type of conv weights from node.weights() when network is internal (#12232)
* get data type of convolution weights from node.weights() when network is internal
* use only instance.node.weights().get_output_layout().data_type
* fix typo
* add unit test for the case
* Update pre_replace_deconv to support output_shape for transposed conv (#12335)
Signed-off-by: Andrew Park <andrew.park@intel.com>
* Improved OpenVINO debian packages (#12385)
* [GPU] implement lru_cache(#12349) (#12349)
* Fix memory leak issue
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
* DOCS-fix_maths_formatting (#12402)
mathematical equation formatting issue fixed in POT readme for range supervision
* [GPU] Pass concat unit tests on DG2 (#12142)
* check optimized
* skip kernel compile when optimized
* GroupedGatherElimination short circuit (#12380)
* Disable GroupedGatherElimination in case of scalar inputs containing indices
* clang format
* [MO, POT] Top up upper bounds for TensorFlow and NumPy modules in all requirement files (#12191)
* [MO] Relax MO upper-bound requirements for TensorFlow and NumPy
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Just debug numpy version
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Pin upper-bounds for NumPy and TensorFlow modules in all reqs files
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update submodule dependency for open_model_zoo
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Install numpy module first
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update NumPy version in POT setup.py
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Extend telemetry tests with a set of possible solutions for events
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Update NumPy module version for layer tests
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [GPU] Added common impl for optionals (#12366)
* [LPT] Correct a check for whether model is quantized (#12364)
Look inside subgraph operations, such as TensorIterator, Loop, If, etc
* Update doc for AUTO and AUTO_BATCH (#12265)
* Update doc for AUTO and AUTO_BATCH
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* fix: incorrect fq type (#12234)
Co-authored-by: Wonju Lee <wonju.lee@intel.com>
* Implement workaround to convert non-frozen models using new TensorFlow frontend (#12386)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Revert "Merge branch 'master' into add-install-binaries-22/2"
This reverts commit f4d6f04636, reversing
changes made to e505e739e2.
* update comments
* update comments
* Update docs/install_guides/installing-openvino-from-archive-windows.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* update OpenCV installation
* Update docs/install_guides/uninstalling-openvino.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* Update docs/install_guides/uninstalling-openvino.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* Update docs/install_guides/uninstalling-openvino.md
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* update uninstall wording
* add C++ redistributable to pypi.org pages
* update pypi.org pages and opencv for macOS
* update whats next
* add a note about long paths on Windows
* fix errors
* update CMake dependency
* fix formatting
* apply the same changes from Ilya's comments
* update uninstall, remove dev from pkg names
* update C++ requirements according to Ilya's requests
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Signed-off-by: Andrew Park <andrew.park@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Yan, Xiping <xiping.yan@intel.com>
Co-authored-by: Felix Dohyun Kim <tuxedcat@gmail.com>
Co-authored-by: Sebastian Golebiewski <sebastianx.golebiewski@intel.com>
Co-authored-by: Maciej Smyk <maciejx.smyk@intel.com>
Co-authored-by: Karol Blaszczak <karol.blaszczak@intel.com>
Co-authored-by: msmykx <101244365+msmykx-intel@users.noreply.github.com>
Co-authored-by: Piotr Milewski <piotr.milewski@intel.com>
Co-authored-by: Eddy Kim <eddy.kim@intel.com>
Co-authored-by: Paul Youngsoo Ahn <paul.y.ahn@intel.com>
Co-authored-by: Wang, Yang <yang4.wang@intel.com>
Co-authored-by: Chen Peter <peter.chen@intel.com>
Co-authored-by: RICKIE777 <ruiqi.yang@intel.com>
Co-authored-by: Bonhun Koo <bonhun.koo@intel.com>
Co-authored-by: avoskoboinyk-lohika <avoskoboinyk@lohika.com>
Co-authored-by: Chenhu Wang <chenhu.wang@intel.com>
Co-authored-by: Marcin Kusmierski <marcin.kusmierski@intel.com>
Co-authored-by: Szymon Irzabek <szymon.jakub.irzabek@intel.com>
Co-authored-by: Yaroslav Torzuk <yaroslav.torzuk2@altran.com>
Co-authored-by: Oleksii Khovan <okhovan@lohika.com>
Co-authored-by: Tomasz Dołbniak <tomasz.dolbniak@intel.com>
Co-authored-by: Tingqian Li <tingqian.li@intel.com>
Co-authored-by: Vladimir Paramuzov <vladimir.paramuzov@intel.com>
Co-authored-by: Krzysztof Bruniecki <krzysztof.bruniecki@intel.com>
Co-authored-by: Min, Byungil <byungil.min@intel.com>
Co-authored-by: Andrew Kwangwoong Park <andrew.park@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
Co-authored-by: Luo Cheng <cheng.luo@intel.com>
Co-authored-by: zihan wu <zihan.wu@intel.com>
Co-authored-by: sheng.gui@intel.com <guisheng315@sina.com>
Co-authored-by: Tetiana Gubanova <tgubanova@lohika.com>
Co-authored-by: Mateusz Bencer <mateusz.bencer@intel.com>
Co-authored-by: Przemyslaw Wysocki <przemyslaw.wysocki@intel.com>
Co-authored-by: Kelvin Choi <kelvin.choi@intel.com>
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Taylor Yeonbok Lee <taylor.lee@intel.com>
Co-authored-by: si-eun-kim <sieun.kim@intel.com>
Co-authored-by: Katarzyna Mitrus <katarzyna.mitrus@intel.com>
Co-authored-by: Sungeun Kim <sungeun.kim@intel.com>
Co-authored-by: Jade Cho <jade.cho@intel.com>
Co-authored-by: Evgenya Stepyreva <evgenya.stepyreva@intel.com>
Co-authored-by: Xiping Yan <xiping.yan@intel.com>
Co-authored-by: Artur Kulikowski <artur.kulikowski@intel.com>
Co-authored-by: Irina Efode <irina.efode@intel.com>
Co-authored-by: Jan Iwaszkiewicz <jan.iwaszkiewicz@intel.com>
Co-authored-by: River Li <river.li@intel.com>
Co-authored-by: Roman Kazantsev <roman.kazantsev@intel.com>
Co-authored-by: Chen Xu <chen.xu@intel.com>
Co-authored-by: Egor Duplenskii <egor.duplensky@gmail.com>
Co-authored-by: Nadezhda Ageeva <nadezhda.ageeva@intel.com>
Co-authored-by: Elizaveta Lobanova <elizaveta.lobanova@intel.com>
Co-authored-by: Konstantin Beluchenko <kostiantyn.bieliuchenko@altran.com>
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com>
Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
Co-authored-by: Roman Lyamin <Roman.Lyamin@intel.com>
Co-authored-by: almilosz <108654258+almilosz@users.noreply.github.com>
Co-authored-by: Sun Xiaoxia <xiaoxia.sun@intel.com>
Co-authored-by: Maxim Vafin <maxim.vafin@intel.com>
Co-authored-by: Chetverikov <anton.chetverikov@intel.com>
Co-authored-by: Alina Kladieva <alina.kladieva@intel.com>
Co-authored-by: Bartek Szmelczynski <bartosz.szmelczynski@intel.com>
Co-authored-by: Wilson Seok <wilson.seok@intel.com>
Co-authored-by: Inhyuk Jo <andy.inhyuk.jo@intel.com>
Co-authored-by: Wonju Lee <wonju.lee@intel.com>
Co-authored-by: Helena Kloosterman <helena.kloosterman@intel.com>
* [TF FE] Add Transpose Sinking for Prelu operation
Now it covers a case with a scalar slope.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Add unit-tests for Transpose sinking of Prelu
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix non-scalar slope case
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
According to the specification we must have the same type for block_shape and crops inputs
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Make model reshape and track batch (#12736)
* CVS-89672 Make model reshape and track batch
* Minor refactoring
* Changed mechanism of constant replacement to more mature
* Update src/common/transformations/include/transformations/smart_reshape/lstm_states_broadcast.hpp
* Update src/common/transformations/src/transformations/smart_reshape/lstm_states_broadcast.cpp
* Comments resolving
* Style and getting rid of asserts
* style
* Apply suggestions from code review
* Fix GPU NonMaxSuppression implementation
* Introduce Nms9 single layer tests
* Adapt internal NMS and GPU implementation for NMS9 implementation
* Adapt CPU implementation in GPU for NMS9
* Add blocked layouts support to NMS
* Add unit tests for blocked formats for NMS
* Fix boxes groups size for the small shapes
* Use ocl implementation for blocked layout input
* Fix templates typedefs to pass win build
* Fix second output to set data in correct format
Co-authored-by: Tetiana Gubanova <tgubanova@lohika.com>
* [TF FE] Add Transpose Sinking for additional unary-wise Operations
It helps to fix performance degradation for MobileNet models
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Add LogicalNot for Transpose sinking
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Support dynamic rank support for Convolutional and Pooling operations (#12661)
* [TF FE] Add dynamic rank support for Convolutional and Pooling operations
Refactor DepthwiseConv2D, AvgPool, and FusedBatchNorm operations
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue with rvalue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix build issue with climit
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Skip duplication of Parameter nodes
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Revert changes in StridedSlice and add check for AvgPool operation type
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Revert the rest of changes for StridedSlice
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix translator for AvgPool: add pad mode
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Introduce helper default_op_checks
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Refactor translators for Resize operations and correct Pooling (#12721)
* [TF FE] Refactor translators for Resize operations and correct Pooling
It allows to convert magenta_arbitrary-image-stylization model
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Align TF FE tranlator for Resize with legacy frontend
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Do minor fix for MaxPool
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Disable GroupedGatherElimination in case of scalar inputs containing indices
* clang format
Co-authored-by: Tomasz Dołbniak <tomasz.dolbniak@intel.com>
* Add overrided method to generating vector of strings
* Trim the value from the the left and right
* Add test to verify that output names are correctly read from IR
* Use spaces instead of tabs
* Add C++ tests for read model contains outputs with whitespaces
* Fix test for add output
* Remove python test
* Cherry-pick U22 adoption in github actions
* More fixes for shellcheck
* More fixes for shellcheck
* Update .github/workflows/py_checks.yml
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* [TF FE] Handle optional attributes for Convolutional operations (#12230)
* [TF FE] Handle optional attributes for Convolutional operations
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Implement LinSpace and BatchMatMul translators (#12271)
* [TF FE] Implement LinSpace and BatchMatMul translators
It helps to convert STN model (from e2e testing) using TensorFlow frontend
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix BatchMatMul translator
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix LinSpace operation translator
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code style rules
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Fix conversion of NetVLAD model (#12328)
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Implement translators for TensorFlow ConvBackpropInput operations (#12356)
* [TF FE] Implement ConvBackPropInput translators
Now the translators supports dynamic input_sizes attribute and different padding modes
including EXPLICIT mode
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix clang-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code-style issue
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback and fix build issues
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Apply code-review feedback: check for input size
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix retrieving explicit_padding attribute
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Fix code style
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Fix StridedSlice translator for new_axis vector size longer input rank (#12442)
* [TF FE] Fix StridedSlice translator for new_axis vector longer input rank
Currently, new_axis vector is cut by input rank that is correct and leads to the loss of new axes.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Use int64 type in mask_to_vector function
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Refactor translators for Conv2d and Conv3d (#12444)
It allows to convert CNN-Transformer model. Padding was previously incorrect.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Implement conversion for Attention OCR model (#12428)
* [TF FE] Implement conversion for Attention OCR model
The following scope of work is done to make Attention OCR convertable:
1. Refactored translators for BiasAdd, Slice, and ArgMax operations. Add translation for StopGradient operation.
2. The previous traversing algorithm to compute topological sorted nodes list was incorrect. Now it is implemented based on topologically_sorted function from core/graph_util.hpp.
3. The unsupported data types are now preliminary converted to undefined type for the purpose of to have them cut off.
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* [TF FE] Refactor MaxPool operation translator for xj_feature model (#12485)
* [TF FE] Refactor MaxPool operation translator for xj_feature model
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
* Correct MaxPoolV2 since it has three inputs
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
Signed-off-by: Kazantsev, Roman <roman.kazantsev@intel.com>
+ benchmark cache_dir option takes longer than cl_cache_dir env in loading network.
+ For clDNN execution, benchmark cache_dir created onednn_engine if just ONEDNN_ENABLE config is ON.
+ Creation of onednn_engine in ocl_engine is changed to on-demand.
Signed-off-by: Min, Byungil <byungil.min@intel.com>
Signed-off-by: Min, Byungil <byungil.min@intel.com>
* Revert "[C API] Enable hello_nv12_input_classification samples for C APIs of OV API 2.0 (#12031)"
This reverts commit 70d967ffb6.
* Revert "Add hello_classification_ov_c test (#11933)"
This reverts commit ebeb0a3802.
* Revert "Refine ov_partial_shape for OV API 2.0 C interface (#11891)"
This reverts commit ce5b2c6a45.
* Revert "Enable unit test for OV 2.0 C API (#11828)"
This reverts commit c4fdcafa70.
* Revert "OV 2.0 C API (#11700)"
This reverts commit 8faf8f2d89.
* Revert "Fixed 3 naming issue"
This reverts commit a92d3cfff5.
* Revert "Fix CC issues for transformation and snippets"
This reverts commit d08a3f5aac.
* Fix NGRAPH_PASS_CALLBACK issue to make it can work
* Fix matcher name missing issue
* Add static shared_objects map in FEM
- add unit tests for frontend lib close
- not use static FEM in ie network reader
- add main for gtest which can use manifest file to filter tests
* Move library pointers map to manger impl
- add to manger impl method to make frontend from loaded plugin
* Add shutdown function to ov namespace
it cleans the static resources
* Revert changes related to linking mian for tests
* Add python binding to ov::openvino_shutdown
* Renamed shutdown method and added to legacy C++ API
(cherry picked from commit a8395bd207)
* Added C bindings
(cherry picked from commit d2c9ddc263)
* Move frontend lib close test to ieFunctTest
- moved to not introduced new test binary and modification on CI
the frontend tests use dynamic linked frontend lib which is load
on test application start and mask lib close tests
- remove gtest_main_manifest as not required now
- add ov::shutdown test to expect application crash
* Fix lib_close test
- remove not get_disabled_tests from utils
- revert CMake file formating
* Fix get model path in lib close tests
* Skip frontend lib close tests if static lib build
Co-authored-by: Ilya Churaev <ilya.churaev@intel.com>
Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
* Fixed WIndows backslash paths (#12250)
* Install user provided TBB as well (#12260)
* Fixes for cases when TBB_DIR env var is set (#12266)
* Fixes for cases when TBB_DIR env var is set
* Don't use make in build_samples.sh script
* Xiaoxia/onetbb old version (#12303)
* support oneTBB old version
* fix oneTBB version mismatch issues
* fix clang issue
* add 'tbb' path to setupvars.sh and OpenVINOConfig.cmake.in
* Update scripts/setupvars/setupvars.sh
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* Trying to fix CVS-85530 (#12455)
Co-authored-by: Sun Xiaoxia <xiaoxia.sun@intel.com>
Co-authored-by: Shen, Wanglei <wanglei.shen@intel.com>
* Update doc for AUTO and AUTO_BATCH
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update per the comments
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Move default hint to THROUGHPUT section
Signed-off-by: Chen Peter <peter.chen@intel.com>
* Update docs/OV_Runtime_UG/automatic_batching.md
Co-authored-by: Yuan Xu <yuan1.xu@intel.com>
* Fixed newAPI for case if core was removed
* Fixed code style
* Fixed typo
* Use new API by default
* Create core with template plugin
* Added doxygen comment
Co-authored-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
* fix references
* update links
* update the wording to be more clear
* add the error message about Visual studio back
* update links to static html links of 2022.2
* change memory access pattern of fsv layout for permute
* Fix permute_ref to process F first only when (bf...) => (b...f)
* Refactor
Co-authored-by: si-eun-kim <sieun.kim@intel.com>
@@ -34,24 +34,24 @@ OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference.
- Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud
This open-source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics.
It supports pre-trained models from the [Open Model Zoo], along with 100+ open
This open-source version includes several components: namely [Model Optimizer], [OpenVINO™ Runtime], [Post-Training Optimization Tool], as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inference on Intel® CPUs and Intel® Processor Graphics.
It supports pre-trained models from [Open Model Zoo], along with 100+ open
source and public models in popular formats such as TensorFlow, ONNX, PaddlePaddle, MXNet, Caffe, Kaldi.
### Components
* [OpenVINO™ Runtime] - is a set of C++ libraries with C and Python bindings providing a common API to deliver inference solutions on the platform of your choice.
* [core](https://github.com/openvinotoolkit/openvino/tree/master/src/core) - provides the base API for model representation and modification.
* [inference](https://github.com/openvinotoolkit/openvino/tree/master/src/inference) - provides an API to infer models on device.
* [transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/transformations) - contains the set of common transformations which are used in OpenVINO plugins.
* [low precision transformations](https://github.com/openvinotoolkit/openvino/tree/master/src/common/low_precision_transformations) - contains the set of transformations which are used in low precision models
* [bindings](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings) - contains all awailable OpenVINO bindings which are maintained by OpenVINO team.
* [c](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/c) - provides C API for OpenVINO™ Runtime
* [python](https://github.com/openvinotoolkit/openvino/tree/master/src/bindings/python) - Python API for OpenVINO™ Runtime
* [Plugins](https://github.com/openvinotoolkit/openvino/tree/master/src/plugins) - contains OpenVINO plugins which are maintained in open-source by OpenVINO team. For more information please taje a look to the [list of supported devices](#supported-hardware-matrix).
* [Frontends](https://github.com/openvinotoolkit/openvino/tree/master/src/frontends) - contains available OpenVINO frontends which allow to read model from native framework format.
* [core](./src/core) - provides the base API for model representation and modification.
* [inference](./src/inference) - provides an API to infer models on the device.
* [transformations](./src/common/transformations) - contains the set of common transformations which are used in OpenVINO plugins.
* [low precision transformations](./src/common/low_precision_transformations) - contains the set of transformations that are used in low precision models
* [bindings](./src/bindings) - contains all available OpenVINO bindings which are maintained by the OpenVINO team.
* [c](./src/bindings/c) - C API for OpenVINO™ Runtime
* [python](./src/bindings/python) - Python API for OpenVINO™ Runtime
* [Plugins](./src/plugins) - contains OpenVINO plugins which are maintained in open-source by the OpenVINO team. For more information, take a look at the [list of supported devices](#supported-hardware-matrix).
* [Frontends](./src/frontends) - contains available OpenVINO frontends that allow reading models from the native framework format.
* [Model Optimizer] - is a cross-platform command-line tool that facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal execution on end-point target devices.
* [Post-Training Optimization Tool] - is designed to accelerate the inference of deep learning models by applying special methods without model retraining or fine-tuning, for example, post-training 8-bit quantization.
* [Samples] - applications on C, C++ and Python languages which shows basic use cases of OpenVINO usages.
* [Samples] - applications in C, C++ and Python languages that show basic OpenVINO use cases.
## Supported Hardware matrix
@@ -69,37 +69,37 @@ The OpenVINO™ Runtime can infer models on different hardware devices. This sec
<td>Auto batch plugin performs on-the-fly automatic batching (i.e. grouping inference requests together) to improve device utilization, with no programming effort from the user</td>
<td>Multi plugin enables simultaneous inference of the same model on several devices in parallel</td>
</tr>
</tbody>
@@ -140,11 +140,11 @@ By contributing to the project, you agree to the license and copyright terms the
### User documentation
The latest documentation for OpenVINO™ Toolkit is availabe [here](https://docs.openvino.ai/). This documentation contains detailed information about all OpenVINO components and provides all important information which could be needed if you create an application which is based on binary OpenVINO distribution or own OpenVINO version without source code modification.
The latest documentation for OpenVINO™ Toolkit is available [here](https://docs.openvino.ai/). This documentation contains detailed information about all OpenVINO components and provides all the important information you may need to create an application based on binary OpenVINO distribution or own OpenVINO version without source code modification.
### Developer documentation
[Developer documentation](#todo-add) contains information about architectural decisions which are applied inside the OpenVINO components. This documentation has all necessary information which could be needed in order to contribute to OpenVINO.
[Developer documentation](./docs/dev/index.md) contains information about architectural decisions which are applied inside the OpenVINO components. This documentation has all necessary information which could be needed in order to contribute to OpenVINO.
## Tutorials
@@ -161,15 +161,15 @@ The list of OpenVINO tutorials:
## System requirements
The full information about system requirements depends on platform and available in section `System requirement` on dedicated pages:
Please take a look to [OpenVINO Wiki](https://github.com/openvinotoolkit/openvino/wiki#how-to-build) to get more information about OpenVINO build process.
See the [OpenVINO Wiki](https://github.com/openvinotoolkit/openvino/wiki#how-to-build) to get more information about the OpenVINO build process.
## How to contribute
@@ -177,13 +177,13 @@ See [CONTRIBUTING](./CONTRIBUTING.md) for details. Thank you!
## Get a support
Please report questions, issues and suggestions using:
* [Neural Network Compression Framework (NNCF)](https://github.com/openvinotoolkit/nncf) - a suite of advanced algorithms for model inference optimization including quantization, filter pruning, binarization and sparsity
* [OpenVINO™ Training Extensions (OTE)](https://github.com/openvinotoolkit/training_extensions) - convenient environment to train Deep Learning models and convert them using OpenVINO for optimized inference.
* [OpenVINO™ Model Server (OVMS)](https://github.com/openvinotoolkit/model_server) - a scalable, high-performance solution for serving deep learning models optimized for Intel architectures
* [DL Workbench](https://docs.openvino.ai/nightly/workbench_docs_Workbench_DG_Introduction.html) - An alternative, web-based version of OpenVINO designed to make production of pretrained deep learning models significantly easier.
* [Computer Vision Annotation Tool (CVAT)](https://github.com/openvinotoolkit/cvat) - an online, interactive video and image annotation tool for computer vision purposes.
* [DL Workbench](https://docs.openvino.ai/2022.2/workbench_docs_Workbench_DG_Introduction.html) - an alternative, web-based version of OpenVINO designed to facilitate optimization and compression of pre-trained deep learning models.
* [Computer Vision Annotation Tool (CVAT)](https://github.com/opencv/cvat) - an online, interactive video and image annotation tool for computer vision purposes.
* [Dataset Management Framework (Datumaro)](https://github.com/openvinotoolkit/datumaro) - a framework and CLI tool to build, transform, and analyze datasets.
---
\* Other names and brands may be claimed as the property of others.
[Open Model Zoo]:https://github.com/openvinotoolkit/open_model_zoo
Once you have a model that meets both OpenVINO™ and your requirements, you can choose among several ways of deploying it with your application:
* [Run inference and develop your app with OpenVINO™ Runtime](../OV_Runtime_UG/openvino_intro.md).
* [Deploy your application locally](../OV_Runtime_UG/deployment/deployment_intro.md).
* [Deploy your model online with the OpenVINO Model Server](@ref ovms_what_is_openvino_model_server).
* [Deploy your application locally](../OV_Runtime_UG/deployment/deployment_intro.md).
* [Deploy your model with OpenVINO Model Server](@ref ovms_what_is_openvino_model_server).
* [Deploy your application for the TensorFlow framework with OpenVINO Integration](./openvino_ecosystem_ovtf.md).
> **NOTE**: Note that [running inference in OpenVINO Runtime](../OV_Runtime_UG/openvino_intro.md) is the most basic form of deployment. Before moving forward, make sure you know how to create a proper Inference configuration.
Deep Learning Workbench (DL Workbench) is an official OpenVINO™ graphical interface designed to make the production of pretrained deep learning Computer Vision and Natural Language Processing models significantly easier.
Minimize the inference-to-deployment workflow timing for neural models right in your browser: import a model, analyze its performance and accuracy, visualize the outputs, optimize and make the final model deployment-ready in a matter of minutes. DL Workbench takes you through the full OpenVINO™ workflow, providing the opportunity to learn about various toolkit components.
DL Workbench enables you to get a detailed performance assessment, explore inference configurations, and obtain an optimized model ready to be deployed on various Intel® configurations, such as client and server CPU, Intel® Processor Graphics (GPU), Intel® Movidius™ Neural Compute Stick 2 (NCS 2), and Intel® Vision Accelerator Design with Intel® Movidius™ VPUs.
DL Workbench also provides the [JupyterLab environment](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Jupyter_Notebooks.html#doxid-workbench-docs-workbench-d-g-jupyter-notebooks) that helps you quick start with OpenVINO™ API and command-line interface (CLI). Follow the full OpenVINO workflow created for your model and learn about different toolkit components.
DL Workbench helps achieve your goals depending on the stage of your deep learning journey.
If you are a beginner in the deep learning field, the DL Workbench provides you with
learning opportunities:
* Learn what neural networks are, how they work, and how to examine their architectures.
* Learn the basics of neural network analysis and optimization before production.
* Get familiar with the OpenVINO™ ecosystem and its main components without installing it on your system.
If you have enough experience with neural networks, DL Workbench provides you with a
convenient web interface to optimize your model and prepare it for production:
* Measure and interpret model performance.
* Tune the model for enhanced performance.
* Analyze the quality of your model and visualize output.
## General Workflow
The diagram below illustrates the typical DL Workbench workflow. Click to see the full-size image:

Get a quick overview of the workflow in the DL Workbench User Interface:

## OpenVINO™ Toolkit Components
The intuitive web-based interface of the DL Workbench enables you to easily use various
OpenVINO™ toolkit components:
Component | Description
|------------------|------------------|
| [Open Model Zoo](https://docs.openvinotoolkit.org/latest/omz_tools_downloader.html)| Get access to the collection of high-quality pre-trained deep learning [public](https://docs.openvinotoolkit.org/latest/omz_models_group_public.html) and [Intel-trained](https://docs.openvinotoolkit.org/latest/omz_models_group_intel.html) models trained to resolve a variety of different tasks.
| [Model Optimizer](https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_Deep_Learning_Model_Optimizer_DevGuide.html) |Optimize and transform models trained in supported frameworks to the IR format. <br>Supported frameworks include TensorFlow\*, Caffe\*, Kaldi\*, MXNet\*, and ONNX\* format.
| [Benchmark Tool](https://docs.openvinotoolkit.org/latest/openvino_inference_engine_tools_benchmark_tool_README.html)| Estimate deep learning model inference performance on supported devices.
| [Accuracy Checker](https://docs.openvinotoolkit.org/latest/omz_tools_accuracy_checker.html)| Evaluate the accuracy of a model by collecting one or several metric values.
| [Post-Training Optimization Tool](https://docs.openvinotoolkit.org/latest/pot_README.html)| Optimize pretrained models with lowering the precision of a model from floating-point precision(FP32 or FP16) to integer precision (INT8), without the need to retrain or fine-tune models. |
OpenVINO Runtime offers multiple inference modes to allow optimum hardware utilization under different conditions. The most basic one is a single-device mode, which defines just one device responsible for the entire inference workload. It supports a range of Intel hardware by means of plugins embedded in the Runtime library, each set up to offer the best possible performance. For a complete list of supported devices and instructions on how to use them, refer to the [guide on inference devices](../OV_Runtime_UG/supported_plugins/Device_Plugins.md).
The remaining modes assume certain levels of automation in selecting devices for inference. Using them in the deployed solution may potentially increase its performance and portability. The automated modes are:
Every deep learning workflow begins with obtaining a model. You can choose to prepare a custom one, use a ready-made solution and adjust it to your needs, or even download and run a pre-trained network from an online database, such as OpenVINO's [Open Model Zoo](../model_zoo.md).
This section describes how to obtain and prepare your modelfor work with OpenVINO to get the best inference results:
* [Browse a database of models for use in your projects](../model_zoo.md).
[OpenVINO™ supports several model formats](../MO_DG/prepare_model/convert_model/supported_model_formats.md) and allows to convert them to it's own, OpenVINO IR, providing a tool dedicated to this task.
[Model Optimizer](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md) reads the original model and creates the OpenVINO IR model (.xml and .bin files) so that inference can ultimately be performed without delays due to format conversion. Optionally, Model Optimizer can adjust the model to be more suitable for inference, for example, by [alternating input shapes](../MO_DG/prepare_model/convert_model/Converting_Model.md), [embedding preprocessing](../MO_DG/prepare_model/Additional_Optimizations.md) and [cutting training parts off](../MO_DG/prepare_model/convert_model/Cutting_Model.md).
The approach to fully convert a model is considered the default choice, as it allows the full extent of OpenVINO features. The OpenVINO IR model format is used by other conversion and preparation tools, such as the Post-Training Optimization Tool, for further optimization of the converted model.
Conversion is not required for ONNX and PaddlePaddle models, as OpenVINO provides C++ and Python APIs for importing them to OpenVINO Runtime directly. It provides a convenient way to quickly switch from framework-based code to OpenVINO-based code in your inference application.
This section describes the how to obtain and prepare your model for work with OpenVINO to get the best inference results:
* [See the supported formats and how to use them in your project](../MO_DG/prepare_model/convert_model/supported_model_formats.md)
* [Convert different model formats to the OpenVINO IR format](../MO_DG/Deep_Learning_Model_Optimizer_DevGuide.md).
* [Automate model-related tasks with Model Downloader and additional OMZ Tools](https://docs.openvino.ai/latest/omz_tools_downloader.html).
To begin with, you may want to [browse a database of models for use in your projects](../model_zoo.md).
A suite of advanced algorithms for Neural Network inference optimization with minimal accuracy drop. NNCF applies quantization, filter pruning, binarization and sparsity algorithms to PyTorch and TensorFlow models during training.
# How to Implement Custom GPU Operations {#openvino_docs_Extensibility_UG_GPU}
To enable operations not supported by OpenVINO out of the box, you may need an extension for an OpenVINO operation set, and a custom kernel for the device you will target. This page describes custom kernel support for the GPU device.
To enable operations not supported by OpenVINO™ out of the box, you may need an extension for OpenVINO operation set, and a custom kernel for the device you will target. This article describes custom kernel support for the GPU device.
The GPU codepath abstracts many details about OpenCL. You need to provide the kernel code in OpenCL C and an XML configuration file that connects the kernel and its parameters to the parameters of the operation.
@@ -8,7 +8,6 @@ There are two options for using the custom operation configuration file:
* Include a section with your kernels into the automatically-loaded `<lib_path>/cldnn_global_custom_kernels/cldnn_global_custom_kernels.xml` file.
* Call the `ov::Core::set_property()` method from your application with the `"CONFIG_FILE"` key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
`Kernel` node contains all kernel source code configuration.
The `Kernel` node contains all kernel source code configuration.
**Sub-nodes**: `Source` (1+), `Define` (0+)
### Source Node and Sub-Node Structure
`Source` node points to a single OpenCL source file.
The `Source` node points to a single OpenCL source file.
| Attribute Name | \# |Description|
|-----|-----|-----|
| `filename` | (1) | Name of the file containing OpenCL source code. Note that the path is relative to your executable. Multiple source nodes will have their sources concatenated in order. |
| `filename` | (1) | Name of the file containing OpenCL source code. The path is relative to your executable. Multiple source nodes will have their sources concatenated in order. |
**Sub-nodes**: None
### Define Node and Sub-Node Structure
`Define` node configures a single `#‍define` instruction to be added to
The `Define` node configures a single `#‍define` instruction to be added to
the sources during compilation (JIT).
| Attribute Name | \# | Description |
|------|-------|------|
| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well, which is taken as a string. |
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. |
| `default` | (0/1) | The default value to be used if the specified parameters are missing from the operation in the IR. |
| `name` | (1) | The name of the defined JIT. For static constants, this can include the value as well, which is taken as a string. |
| `param` | (0/1) | This parameter value is used as the value of this JIT definition. |
| `type` | (0/1) | The parameter type. Accepted values: `int`, `float`, and `int[]`, `float[]` for arrays. |
| `default` | (0/1) | The default value to be used if the specified parameters are missing from the operation in the OpenVINO IR. |
**Sub-nodes:** None
@@ -90,37 +89,37 @@ The resulting JIT has the following form:
### Buffers Node and Sub-Node Structure
`Buffers` node configures all input/output buffers for the OpenCL entry
The `Buffers` node configures all input/output buffers for the OpenCL entry
function. No buffers node structure exists.
**Sub-nodes:**`Data` (0+), `Tensor` (1+)
### Data Node and Sub-Node Structure
`Data` node configures a single input with static data, for example,
The `Data` node configures a single input with static data, for example,
weights or biases.
| Attribute Name | \# | Description |
|----|-----|------|
| `name` | (1) | Name of a blob attached to an operation in the IR |
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to |
| `name` | (1) | Name of a blob attached to an operation in the OpenVINO IR. |
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
**Sub-nodes**: None
### Tensor Node and Sub-Node Structure
`Tensor` node configures a single input or output tensor.
The `Tensor` node configures a single input or output tensor.
| Attribute Name | \# | Description |
|------|-------|-------|
| `arg-index` | (1) | 0-based index in the entry function arguments to be bound to. |
| `type` | (1) | `input` or `output` |
| `port-index` | (1) | 0-based index in the operation input/output ports in the IR |
| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB`, and same values in all lowercase. Default value: `BFYX` |
| `port-index` | (1) | 0-based index in the operation input/output ports in the OpenVINO IR |
| `format` | (0/1) | Data layout declaration for the tensor. Accepted values: `BFYX`, `BYXF`, `YXFB`, `FYXB`(also in lowercase). The default value: `BFYX` |
### CompilerOptions Node and Sub-Node Structure
`CompilerOptions` node configures the compilation flags for the OpenCL
The `CompilerOptions` node configures the compilation flags for the OpenCL
sources.
| Attribute Name | \# | Description |
@@ -131,20 +130,20 @@ sources.
### WorkSizes Node and Sub-Node Structure
`WorkSizes` node configures the global/local work sizes to be used when
The `WorkSizes` node configures the global/local work sizes to be used when
queuing an OpenCL program for execution.
| Attribute Name | \# | Description |
|-----|------|-----|
| `global`<br>`local` | (0/1)<br>(0/1) | An array of up to three integers or formulas for defining OpenCL work-sizes to be used during execution.<br> The formulas can use the values of the B,F,Y,X dimensions and contain the operators: +,-,/,\*,%. All operators are evaluated in integer arithmetic. <br>Default value: `global=”B*F*Y*X” local=””` |
| `dim` | (0/1) | A tensor to take the work-size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. Default value: `output` |
| `dim` | (0/1) | A tensor to take the work-size from. Accepted values: `input N`, `output`, where `N` is an index of input tensor starting with 0. The default value: `output` |
**Sub-nodes**: None
## Example Configuration File
The following code sample provides an example configuration file in XML
format. For information on the configuration file structure, see
format. For information on the configuration file structure, see the
@@ -170,22 +169,22 @@ For an example, see [Example Kernel](#example-kernel).
| Name | Value |
|---|---|
| `NUM_INPUTS` | Number of the input tensors bound to this kernel |
| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel |
| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array |
| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel |
| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array |
| `<TENSOR>_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX` |
| `NUM_INPUTS` | Number of the input tensors bound to this kernel. |
| `GLOBAL_WORKSIZE` | An array of global work sizes used to execute this kernel. |
| `GLOBAL_WORKSIZE_SIZE` | The size of the `GLOBAL_WORKSIZE` array. |
| `LOCAL_WORKSIZE` | An array of local work sizes used to execute this kernel. |
| `LOCAL_WORKSIZE_SIZE` | The size of the `LOCAL_WORKSIZE` array. |
| `<TENSOR>_DIMS`| An array of the tensor dimension sizes. Always ordered as `BFYX`. |
| `<TENSOR>_DIMS_SIZE`| The size of the `<TENSOR>_DIMS` array.|
| `<TENSOR>_TYPE`| The datatype of the tensor: `float`, `half`, or `char`|
| `<TENSOR>_TYPE`| The datatype of the tensor: `float`, `half`, or `char`. |
| `<TENSOR>_FORMAT_<TENSOR_FORMAT>` | The format of the tensor, BFYX, BYXF, YXFB , FYXB, or ANY. The format is concatenated to the defined name. You can use the tensor format to define codepaths in your code with `#‍ifdef/#‍endif`. |
| `<TENSOR>_LOWER_PADDING` | An array of padding elements used for the tensor dimensions before they start. Always ordered as BFYX.|
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array |
| `<TENSOR>_LOWER_PADDING_SIZE` | The size of the `<TENSOR>_LOWER_PADDING` array. |
| `<TENSOR>_UPPER_PADDING` | An array of padding elements used for the tensor dimensions after they end. Always ordered as BFYX. |
| `<TENSOR>_UPPER_PADDING_SIZE` | The size of the `<TENSOR>_UPPER_PADDING` array |
| `<TENSOR>_PITCHES` | The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.|
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array |
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element, bypassing the lower padding. |
| `<TENSOR>_UPPER_PADDING_SIZE` | The size of the `<TENSOR>_UPPER_PADDING` array. |
| `<TENSOR>_PITCHES` | The offset (in elements) between adjacent elements in each dimension. Always ordered as BFYX.|
| `<TENSOR>_PITCHES_SIZE`| The size of the `<TENSOR>_PITCHES` array. |
| `<TENSOR>_OFFSET`| The number of elements from the start of the tensor to the first valid element, bypassing the lower padding. |
All `<TENSOR>` values are automatically defined for every tensor
bound to this operation, such as `INPUT0`, `INPUT1`, and `OUTPUT0`, as shown
Custom operations, that is those not included in the list, are not recognized by OpenVINO™ out-of-the-box. The need for a custom operation may appear in two main cases:
Custom operations, which are not included in the list, are not recognized by OpenVINO out-of-the-box. The need for custom operation may appear in two cases:
1. A regular framework operation that is new or rarely used, which is why it hasn’t been implemented in OpenVINO yet.
1. A new or rarely used regular framework operation is not supported in OpenVINO yet.
2. A new user operation that was created for some specific model topology by a model author using framework extension capabilities.
2. A new user operation that was created for some specific model topology by the author of the model using framework extension capabilities.
Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations, allowing you to plug in your own implementation for them. OpenVINO™ Extensibility API lets you add support for those custom operations and use one implementation for Model Optimizer and OpenVINO™ Runtime.
Importing models with such operations requires additional steps. This guide illustrates the workflow for running inference on models featuring custom operations. This allows plugging in your own implementation for them. OpenVINO Extensibility API enables adding support for those custom operations and using one implementation for Model Optimizer and OpenVINO Runtime.
Defining a new custom operation basically consist of two parts:
Defining a new custom operation basically consists of two parts:
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). How to implement execution kernels for [GPU](./GPU_Extensibility.md) and [VPU](./VPU_Extensibility.md) is described in separate guides.
1. Definition of operation semantics in OpenVINO, the code that describes how this operation should be inferred consuming input tensor(s) and producing output tensor(s). The implementation of execution kernels for [GPU](./GPU_Extensibility.md) and [VPU](./VPU_Extensibility.md) is described in separate guides.
2. Mapping rule that facilitates conversion of framework operation representation to OpenVINO defined operation semantics.
The first part is required for inference, the second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part, the next sections will describe them in detail.
The first part is required for inference. The second part is required for successful import of a model containing such operations from the original framework model format. There are several options to implement each part. The following sections will describe them in detail.
## Definition of Operation Semantics
If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. Refer to the latest OpenVINO operation set, when deciding feasibility of such decomposition. You can use any valid combination of exiting operations. The next section of this document describes the way to map a custom operation.
If the custom operation can be mathematically represented as a combination of exiting OpenVINO operations and such decomposition gives desired performance, then low-level operation implementation is not required. When deciding feasibility of such decomposition refer to the latest OpenVINO operation set. You can use any valid combination of exiting operations. How to map a custom operation is described in the next section of this document.
If such decomposition is not possible or appears too bulky with a large number of constituent operations that do not perform well, then a new class for the custom operation should be implemented, as described in the [Custom Operation Guide](add_openvino_ops.md).
If such decomposition is not possible or appears too bulky with lots of consisting operations that are not performing well, then a new class for the custom operation should be implemented as described in the [Custom Operation Guide](add_openvino_ops.md).
Prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise try to decompose the operation first as described above and then after verifying correctness of inference and resulting performance, optionally invest to implementing bare metal C++ implementation.
You might prefer implementing a custom operation class if you already have a generic C++ implementation of operation kernel. Otherwise, try to decompose the operation first, as described above. Then, after verifying correctness of inference and resulting performance, you may move on to optional implementation of Bare Metal C++.
## Mapping from Framework Operation
Depending on model format used for import, mapping of custom operation is implemented differently, choose one of:
Mapping of custom operation is implemented differently, depending on model format used for import. You may choose one of the following:
1. If model is represented in ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with Model Optimizer`--extensions` option or when model is imported directly to OpenVINO run-time using read_model method. Python API is also available for run-time model importing.
1. If a model is represented in the ONNX (including models exported from Pytorch in ONNX) or PaddlePaddle formats, then one of the classes from [Frontend Extension API](frontend_extensions.md) should be used. It consists of several classes available in C++ which can be used with the`--extensions` option in Model Optimizer or when a model is imported directly to OpenVINO runtime using the `read_model` method. Python API is also available for runtime model import.
2. If model is represented in TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only.
2. If a model is represented in the TensorFlow, Caffe, Kaldi or MXNet formats, then [Model Optimizer Extensions](../MO_DG/prepare_model/customize_model_optimizer/Customize_Model_Optimizer.md) should be used. This approach is available for model conversion in Model Optimizer only.
Existing of two approaches simultaneously is explained by two different types of frontends used for model conversion in OpenVINO: new frontends (ONNX, PaddlePaddle) and legacy frontends (TensorFlow, Caffe, Kaldi and Apache MXNet). Model Optimizer can use both front-ends in contrast to the direct import of model with `read_model` method which can use new frontends only. Follow one of the appropriate guides referenced above to implement mappings depending on framework frontend.
If you are implementing extensions for ONNX or PaddlePaddle new frontends and plan to use Model Optimizer`--extension` option for model conversion, then the extensions should be
If you are implementing extensions for new ONNX or PaddlePaddle frontends and plan to use the`--extensions` option in Model Optimizer for model conversion, then the extensions should be:
1. Implemented in C++ only
1. Implemented in C++ only.
2. Compiled as a separate shared library (see details how to do that later in this guide).
2. Compiled as a separate shared library (see details on how to do this further in this guide).
You cannot write new frontend extensions using Python API if you plan to use them with Model Optimizer.
Model Optimizer does not support new frontend extensions written in Python API.
Remaining part of this guide uses Frontend Extension API applicable for new frontends.
Remaining part of this guide describes application of Frontend Extension API for new frontends.
## Registering Extensions
A custom operation class and a new mapping frontend extension class object should be registered to be usable in OpenVINO runtime.
> **NOTE**: This documentation is written based on the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/docs/template_extension/new), which demonstrates extension development details based on minimalistic `Identity` operation that is a placeholder for your real custom operation. You can review the complete code, which is fully compliable, to see how it works.
> **NOTE**: This documentation is derived from the [Template extension](https://github.com/openvinotoolkit/openvino/tree/master/src/core/template_extension/new), which demonstrates the details of extension development. It is based on minimalistic `Identity` operation that is a placeholder for your real custom operation. Review the complete, fully compilable code to see how it works.
To load the extensions to the `ov::Core` object, use the `ov::Core::add_extension` method, this method allows to load library with extensions or extensions from the code.
Use the `ov::Core::add_extension` method to load the extensions to the `ov::Core` object. This method allows loading library with extensions or extensions from the code.
### Load extensions to core
### Load Extensions to Core
Extensions can be loaded from code with `ov::Core::add_extension` method:
Extensions can be loaded from a code with the `ov::Core::add_extension` method:
@sphinxtabset
@@ -92,7 +91,7 @@ Extensions can be loaded from code with `ov::Core::add_extension` method:
@endsphinxtabset
`Identity` is custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is enough to enable reading IR which uses `Identity` extension operation emitted by Model Optimizer. To be able to load original model directly to the runtime, you need to add also a mapping extension:
The `Identity` is a custom operation class defined in [Custom Operation Guide](add_openvino_ops.md). This is sufficient to enable reading OpenVINO IR which uses the `Identity` extension operation emitted by Model Optimizer. In order to load original model directly to the runtime, add a mapping extension:
@sphinxdirective
@@ -110,32 +109,34 @@ Extensions can be loaded from code with `ov::Core::add_extension` method:
@endsphinxdirective
When Python API is used there is no way to implement a custom OpenVINO operation. Also, even if custom OpenVINO operation is implemented in C++ and loaded to the runtime through a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. Use C++ shared library approach to implement both operations semantics and framework mapping in this case.
When Python API is used, there is no way to implement a custom OpenVINO operation. Even if custom OpenVINO operation is implemented in C++ and loaded into the runtime by a shared library, there is still no way to add a frontend mapping extension that refers to this custom operation. In this case, use C++ shared library approach to implement both operations semantics and framework mapping.
You still can use Python for operation mapping and decomposition in case if operations from the standard OpenVINO operation set is used only.
Python can still be used to map and decompose operations when only operations from the standard OpenVINO operation set are used.
### Create library with extensions
### Create a Library with Extensions
You need to create extension library in the following cases:
- Convert model with custom operations in Model Optimizer
- Load model with custom operations in Python application. It is applicable for both framework model and IR.
- Loading models with custom operations in tools that support loading extensions from a library, for example `benchmark_app`.
An extension library should be created in the following cases:
If you want to create an extension library, for example in order to load these extensions to the Model Optimizer, you need to do next steps:
Create an entry point for extension library. OpenVINO™ provides an `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO™ Extensions.
This macro should have a vector of all OpenVINO™ Extensions as an argument.
- Conversion of a model with custom operations in Model Optimizer.
- Loading a model with custom operations in a Python application. This applies to both framework model and OpenVINO IR.
- Loading models with custom operations in tools that support loading extensions from a library, for example the `benchmark_app`.
Based on that, the declaration of an extension class can look as follows:
To create an extension library, for example, to load the extensions into Model Optimizer, perform the following:
1. Create an entry point for extension library. OpenVINO provides the `OPENVINO_CREATE_EXTENSIONS()` macro, which allows to define an entry point to a library with OpenVINO Extensions.
This macro should have a vector of all OpenVINO Extensions as an argument.
Based on that, the declaration of an extension class might look like the following:
To enable operations not supported by OpenVINO™ out of the box, you need a custom extension for Model Optimizer, a custom nGraph operation set, and a custom kernel for the device you will target. This page describes custom kernel support for one the VPU, the Intel® Neural Compute Stick 2 device, which uses the MYRIAD device plugin.
> **NOTES:**
> * OpenCL\* custom layer support is available in the preview mode.
> **NOTE:**
> * OpenCL custom layer support is available in the preview mode.
> * This section assumes you are familiar with developing kernels using OpenCL.
To customize your topology with an OpenCL layer, carry out the tasks described on this page:
1. Write and compile your OpenCL code with the standalone offline OpenCL compiler (`clc`).
@@ -13,9 +14,9 @@ To customize your topology with an OpenCL layer, carry out the tasks described o
> **NOTE**: OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE* processor only, is redistributed with OpenVINO.
OpenCL support is provided by ComputeAorta* and is distributed under a license agreement between Intel® and Codeplay* Software Ltd.
The OpenCL toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only, so first compile OpenCL C code using the standalone `clc` compiler. You can find the compiler binary at `<INSTALL_DIR>/tools/cl_compiler`.
> **NOTE**: OpenCL compiler, targeting Intel® Neural Compute Stick 2 for the SHAVE processor only, is redistributed with OpenVINO.
OpenCL support is provided by ComputeAorta and is distributed under a license agreement between Intel® and Codeplay Software Ltd.
The OpenCL toolchain for the Intel® Neural Compute Stick 2 supports offline compilation only. Start with compiling OpenCL C code, using the standalone `clc` compiler. You can find the compiler binary at `<INSTALL_DIR>/tools/cl_compiler`.
> **NOTE**: By design, custom OpenCL layers support any OpenCL kernels written assuming OpenCL version 1.2. It also supports half float extension and is optimized for this type, because it is a native type for Intel® Movidius™ VPUs.
1. Prior to running a compilation, make sure that the following variables are set:
@@ -63,7 +64,7 @@ Each custom layer is described with the `CustomLayer` node. It has the following
- Node `Source` must contain the following attributes:
- `filename` – The path to a compiled binary relative to the XML configuration file.
- Sub-node `Parameters` – Describes parameters bindings. For more information, see the description below.
- Sub-node `WorkSizes` – Describes local and global work group sizes and the source for dimension deduction as a pair `direction,port`. In the example above, the work group is described relatively to the dimension of the input tensor that comes through port 0 in the IR. `global` and `local` work group configurations support any simple math expressions with +,-,\*,/, and () from `B`(batch), `Y`(height), `X`(width) and `F`(channels).
- Sub-node `WorkSizes` – Describes local and global work group sizes and the source for dimension deduction as a pair `direction,port`. In the example above, the work group is described relatively to the dimension of the input tensor that comes through port 0 in the OpenVINO IR. Work group configurations, namely `global` and `local` support any simple math expressions with +,-,\*,/, and () from `B`(batch), `Y`(height), `X`(width) and `F`(channels).
- Sub-node `Where` – Allows to customize bindings with the `key="value"` attribute. For example, to substitute only 3x3 convolutions, write `<Wherekernel="3,3"/>` in the binding xml.
Parameter description supports `Tensor` of one of tensor types such as `input`, `output`, `input_buffer`, `output_buffer` or `data`, `Scalar`, or `Data` nodes and has the following format:
@@ -77,7 +78,7 @@ Each custom layer is described with the `CustomLayer` node. It has the following
- `type` – Node type: `input_buffer` or `output_buffer`. Use the appropriate type to bind multiple kernels that correspond to different stages of the same layer.
- `port-index` – The unique identifier to bind by.
- `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings.
- `size` – Amount of bytes needed. Current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and might be expended in the future.
- `size` – Amount of bytes needed. Current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and might be extended in the future.
Here is an example of multi-stage MVN layer binding:
```xml
@@ -107,7 +108,7 @@ Each custom layer is described with the `CustomLayer` node. It has the following
- Each `Tensor` node that has the type `data` must contain the following attributes:
- Each `Tensor` node that has the `data` type must contain the following attributes:
- `source` – A name of the blob as it is in the IR. Typical example is `weights` for convolution.
- `format` – Specifies the channel order in the tensor. Optional conversion layers are generated if the custom layer format is not.
```xml
@@ -133,7 +134,7 @@ Each custom layer is described with the `CustomLayer` node. It has the following
- Each `Data` node must contain the following attributes:
- `arg-name` – The name of a kernel parameter in the kernel signature.
- `type` – Node type. Currently, `local_data` is the only supported value, which defines buffer allocated in fast local on-chip memory. It is limited to 100KB for all `__local` and
`__private` arrays defined inside the kernel as well as all `__local` parameters passed to the kernel. Note that a manual-DMA extension requires double buffering.
`__private` arrays defined inside the kernel as well as all `__local` parameters passed to the kernel. A manual-DMA extension requires double buffering.
If the custom layer is detected to run out of local memory, the inference fails.
- `dim` – The dim source with the same `direction,port` format used for `WorkSizes` bindings.
- `size` – Amount of bytes needed. The current expression syntax supports only expression over dimensions of over selected input/output tensor or constants and may be extended in the future.
@@ -158,14 +159,13 @@ Each custom layer is described with the `CustomLayer` node. It has the following
## Pass Configuration File to OpenVINO™ Runtime
> **NOTE**: If both native and custom layer implementations are present, the custom kernel has a priority over the native one.
Before loading the network that features the custom layers, provide a separate configuration file and load it using the ov::Core::set_property() method with the "CONFIG_KEY" key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
Before loading the network that features the custom layers, provide a separate configuration file and load it using the `ov::Core::set_property()` method. Use the "CONFIG_KEY" key and the configuration file name as a value before loading the network that uses custom operations to the plugin:
@snippet docs/snippets/vpu/custom_op.cpp part0
## Optimizing Kernels with OpenCL for VPU (Intel® Neural Compute Stick 2)
This section provides optimization guidelines on writing custom layers with OpenCL for VPU devices. Knowledge about general OpenCL
programming model and OpenCL kernel language is assumed and not a subject of this section. The OpenCL model mapping to VPU is described in the table below.
This section provides optimization guidelines on writing custom layers with OpenCL for VPU devices. Knowledge about general OpenCL programming model and OpenCL kernel language is assumed and not a subject of this section. The OpenCL model mapping to VPU is described in the table below.
| OpenCL Model | VPU Mapping|
|-----|----|
@@ -175,41 +175,33 @@ programming model and OpenCL kernel language is assumed and not a subject of thi
| Global memory | Mapped to DDR, used to pass execution preserved parameters for inputs, outputs, and blobs |
| Work group | Executed on a single SHAVE core iterating over multiple work items |
Note that by the OpenCL specification, the work group execution order is not specified. This means that it is your
responsibility to ensure that race conditions among work groups are not introduced. Custom layer runtime spits evenly
work grid among available compute resources and executes them in an arbitrary order. This static scheduling approach works best if the load is evenly spread out across work groups, which is a typical case for Deep Learning kernels. The following guidelines are recommended to use for work group partitioning:
The work group execution order is not defined in the OpenCL specifications. This means it is your responsibility to ensure that race conditions among work groups are not introduced. Custom layer runtime distributes work grid evenly among available compute resources and executes them in an arbitrary order. This static scheduling approach works best if the load is evenly spread out across work groups, which is a typical case for Deep Learning kernels. The following guidelines are recommended to use for work group partitioning:
1. Split work evenly across work groups.
1. Distribute work evenly across work groups.
2. Adjust work group granularity to maintain equal workload for all compute codes.
3. Set the maximum number of cores using the `max-shaves` attribute for the `CustomLayer` node. This keeps more resources for the rest of topology. It is also useful if the kernel scalability reached its limits, which may happen while optimizing memory bound kernels or kernels with poor parallelization.
4. Try an alternate data layout (`BFXY`/`BYXF`) for the kernel if it improves work group partitioning or data access patterns.
Consider not just specific layer boost, but full topology performance because data conversion layers would be automatically inserted
as appropriate.
4. Try an alternate data layout (`BFXY`/`BYXF`) for the kernel to see if it improves work group partitioning or data access patterns.
Consider not just specific layer boost, but also full topology performance because data conversion layers will be automatically inserted as appropriate.
Offline OpenCL compiler (`clc`) features automatic vectorization over `get_global_id(0)` usage, if uniform access is detected.
For example, the kernel below could be automatically vectorized:
However, this work-group based vectorizer (WGV) conflicts with the default LLVM vectorizer based on superword level parallelism
(SLP) for the current compiler version. Manual vectorization is recommended to provide the best performance for non-uniform code
patterns. WGV works if and only if vector types are not used in the code.
However, this work-group based vectorizer (WGV) conflicts with the default LLVM vectorizer based on superword level parallelism (SLP) for the current compiler version. Manual vectorization is recommended to provide the best performance for non-uniform code patterns. WGV works if and only if vector types are not used in the code.
Here is a short list of optimization tips:
1. Help auto-vectorizer ensure non-aliasing pointers for kernel parameters by putting `restrict` where possible.
- This can give a performance boost, especially for kernels with unrolling, like `ocl_grn` from the example below.
- Place `restrict` markers for kernels with manually vectorized codes. In the `ocl_grn` kernel below, the unrolled version without `restrict` is up to 20% slower than the most optimal one, which combines unrolling and `restrict`.
2. Put `#‍pragma unroll N` to your loop header. The compiler does not trigger unrolling by default, so it is your responsibility to
annotate the code with pragmas as appropriate. The `ocl_grn` version with `#‍pragma unroll 4` is up to 50% faster, most of which comes from unrolling the first loop, because LLVM, in general, is better in scheduling 3-stage loops (load-compute-store), while the fist loop
`variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]);` is only 2-stage (load-compute). Pay
attention to unrolling such cases first. Unrolling factor is loop-dependent. Choose the smallest number that
still improves performance as an optimum between the kernel size and execution speed. For this specific kernel, changing the unroll factor from `4` to `6` results in the same performance, so unrolling factor equal to 4 is an optimum. For Intel® Neural Compute Stick 2, unrolling is conjugated with the automatic software pipelining for load, store, and compute stages:
1. Help auto-vectorizer ensure non-aliasing pointers for kernel parameters by putting the `restrict` markers where possible.
- This can give a performance boost, especially for kernels with unrolling, like the `ocl_grn` from the example below.
- Place `restrict` markers for kernels with manually vectorized codes. In the `ocl_grn` kernel below, the unrolled version without the `restrict` is up to 20% slower than the most optimal one, which combines both unrolling and `restrict`.
2. Put `#‍pragma unroll N` to your loop header. The compiler does not trigger unrolling by default, so it is your responsibility to annotate the code with pragmas as appropriate. The `ocl_grn` version with `#‍pragma unroll 4` is up to 50% faster, most of which comes from unrolling the first loop, because LLVM, in general, is better in scheduling 3-stage loops (load-compute-store), while the first loop
The `variance += (float)(src_data[c*H*W + y*W + x] * src_data[c*H*W + y*W + x]);` is only 2-stage (load-compute). Pay attention to unrolling such cases first. Unrolling factor is loop-dependent. Choose the smallest number that still improves performance as an optimum between the kernel size and execution speed. For this specific kernel, changing the unroll factor from `4` to `6` results in the same performance, so unrolling factor equal to 4 is an optimum. For Intel Neural Compute Stick 2, unrolling is conjugated with the automatic software pipelining for load, store, and compute stages:
Both versions perform the same, but the second one has more complex code.
3. If it is easy to predict the work group size, you can also use the `reqd_work_group_size` kernel attribute to ask the compiler
to unroll the code up to the local size of the work group. Note that if the kernel is actually executed with the
different work group configuration, the result is undefined.
3. If it is easy to predict the work group size, use the `reqd_work_group_size` kernel attribute to ask the compiler to unroll the code up to the local size of the work group. If the kernel is actually executed with the different work group configuration, the result is undefined.
4. Prefer to use the `half` compute if it keeps reasonable accuracy. 16-bit float is a native type for Intel® Neural Compute Stick 2, most of the functions `half_*` are mapped to a single hardware instruction.
4. Prefer to use the `half` compute if it keeps reasonable accuracy. A 16-bit float is a native type for Intel Neural Compute Stick 2, most of the `half_*` functions are mapped to a single hardware instruction.
Use the standard `native_*` function for the rest of types.
5. Prefer to use the `convert_half` function over `vstore_half` if conversion to 32-bit float is required. `convert_half` is mapped to a single hardware instruction. For the `cvtf32f16` kernel above, the line `outImage[idx] = convert_half(inImage[idx]*scale+bais);` is eight times slower than the code with `vstore_half`.
5. Prefer to use the `convert_half` function over the `vstore_half` if conversion to 32-bit float is required. The `convert_half` function is mapped to a single hardware instruction. For the `cvtf32f16` kernel above, the `outImage[idx] = convert_half(inImage[idx]*scale+bias);` code is eight times slower than the code with `vstore_half`.
6. Mind early exits. Early exit can be extremely costly for the current version of the `clc` compiler due to conflicts with the
auto-vectorizer. The generic advice would be to setup local size by `x` dimension equal to inputs or/and outputs width.
If it is impossible to define the work grid that exactly matches inputs or/and outputs to eliminate checks, for example,
`if (get_global_id(0) >= width) return`, use line-wise kernel variant with manual vectorization.
6. Be aware of early exits, as they can be extremely costly for the current version of the `clc` compiler due to conflicts with the auto-vectorizer. It is recommended to setup local size by `x` dimension equal to inputs or/and outputs width. If it is impossible to define the work grid that exactly matches inputs or/and outputs to eliminate checks, for example, `if (get_global_id(0) >= width) return`, use line-wise kernel variant with manual vectorization.
The kernel example below demonstrates the impact of early exits on kernel performance.
```cpp
// Initial version
@@ -302,8 +289,8 @@ The kernel example below demonstrates the impact of early exits on kernel perfor
}
```
This `reorg` kernel is auto-vectorizable, but an input for YOLO v2 topology is `NCHW=<1,64,26,26>` and it is not multiple of vector width, which is `8` for `half` data type. As a result, the Inference Engine does not select the auto-vectorized kernel.
To compare performance of auto-vectorized and scalar version of the kernel, change the input size to`NCHW=<1,64,26,32>`. This enables the auto-vectorized version to be selected by the Inference Engine and can give you about 30% uplift.
Since the auto-vectorized version is faster, it makes sense to enable it for the YOLO v2 topology input size by setting the local size multiple of vector, for example, 32, and adjust global sizes accordingly. As a result, the execution work grid exceeds actual input dimension, so out-of-bound checks should be inserted. See the updated kernel version below:
To compare performance of auto-vectorized and scalar version of the kernel, change the input size to`NCHW=<1,64,26,32>`. This enables the auto-vectorized version to be selected by the Inference Engine and can give you about 30% uplift.
Since the auto-vectorized version is faster, it is recommended to enable it for the YOLO v2 topology input size by setting the local size multiple of vector, for example, `32`, and adjust global sizes accordingly. As a result, the execution work grid exceeds actual input dimension, so out-of-bound checks should be inserted. See the updated kernel version below:
```cpp
// Version with out-of-bound checks added
__kernel void reorg(const __global half* restrict src, __global half* restrict out, int W, int stride)
@@ -324,7 +311,7 @@ Since the auto-vectorized version is faster, it makes sense to enable it for the
This code performs the same as the initial kernel above (scalar) due to branching overhead. If you replace min/max expression `w = min(w, W-1);` with `if (w >= W) return;`, runtime increases up to 2x against to code without branching (initial version).<br>
This code performs the same as the initial kernel above (scalar) due to branching overhead. If the `w = min(w, W-1);` min/max expression is replaced with the `if (w >= W) return;`, runtime increases up to 2x against to code without branching (initial version).<br>
If branching is inevitable for your element-based kernel, it is recommended to change the scheme to line-based. See the kernel variant below:
This decreases the execution time up to 40% against the best performing vectorized kernel without early exits (initial version).
7. Reuse computations among work items by using line-based kernels or sharing values though `__local` memory.
8. Improve data access locality. Most of custom kernels are memory bound while convolution and fully connected layers are hardware-implemented. The code below demonstrates a further optimized version of the `reorg` kernel unrolled by `stride`:
7. Reuse computations among work items by using line-based kernels or sharing values through the `__local` memory.
8. Improve data access locality. Most of custom kernels are memory bound while convolution and fully connected layers are hardware-implemented. The code below demonstrates a further optimized version of the `reorg` kernel unrolled by the `stride`:
`scr` data in this case loaded only once. As the result, the cycle count drops up to 45% against the line-wise version.
The `scr` data in this case is loaded only once. As the result, the cycle count drops up to 45% against the line-wise version.
9. Copy data from `__dlobal` to `__local` or `__private` memory if the data is accessed more than once. Access to
`__dlobal` memory is orders of magnitude slower than access to `__local`/`__private` due to statically scheduled pipeline, which
stalls completely on memory access without any prefetch. The same recommendation is applicable for scalar load/store
from/to a `__blobal` pointer since work-group copying could be done in a vector fashion.
9. Copy data from the `__dlobal` to the `__local` or `__private` memory if the data is accessed more than once. Access to the `__dlobal` memory is orders of magnitude slower than access to the `__local`/`__private` due to statically scheduled pipeline, which stalls completely on memory access without any prefetch. The same recommendation is applicable for scalar load/store from/to the `__blobal` pointer since work-group copying could be done in a vector fashion.
10. Use a manual DMA extension. Local (on-chip) memory throughput is up to 24x higher than DDR throughput. Starting from OpenVINO™ 2020.1, VPU OpenCL features manual-DMA kernel extension to copy sub-tensor used by work group into local memory and performing compute without DDR evolved. Here is the simple GRN kernel implementation that runs over DDR. Local size is in the form (width of the input tensor, 1, 1) to define a large enough work group to get code automatically vectorized and unrolled, while global size is (width of the input tensor, height of the input tensor, 1):
10. Use a manual DMA extension. Local (on-chip) memory throughput is up to 24x higher than DDR throughput. Since the OpenVINO 2020.1 release, VPU OpenCL features manual-DMA kernel extension to copy sub-tensor used by a work group into local memory and performing compute without DDR evolved. Here is the simple GRN kernel implementation that runs over DDR. Local size is in the form (width of the input tensor, 1, 1) to define a large enough work group to get code automatically vectorized and unrolled, while global size is (width of the input tensor, height of the input tensor, 1):
```cpp
__kernel void grn_NCHW(
__global const half* restrict src_data,
@@ -398,7 +382,7 @@ from/to a `__blobal` pointer since work-group copying could be done in a vector
}
```
This kernel can be rewritten to introduce special data binding `__dma_preload` and `__dma_postwrite intrinsics`. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName`, and `__dma_postwrite_kernelName`. `__dma_preload_kernelName` for a particular work group `n` is guaranteed to be executed before the `n`-th work group itself, while `__dma_postwrite_kernelName` is guaranteed to be executed after a corresponding work group. You can define one of those functions that are intended to be used to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA.
This kernel can be rewritten to introduce the `__dma_preload` and `__dma_postwrite intrinsics` special data binding. This means that instead of one kernel, a group of three kernels should be implemented: `kernelName`, `__dma_preload_kernelName`, and `__dma_postwrite_kernelName`. The `__dma_preload_kernelName` kernel for a particular work group `n` is guaranteed to be executed before the `n`-th work group itself, while the `__dma_postwrite_kernelName` is guaranteed to be executed after a corresponding work group. One of those functions may be defined to copy data from-to `__global` and `__local` memory. The syntactics requires exact functional signature match. The example below illustrates how to prepare your kernel for manual-DMA.
```cpp
__kernel void __dma_preload_grn_NCHW(
@@ -557,9 +541,9 @@ __kernel void grn_NCHW(
}
```
Note the `get_local_size` and `get_local_id` usage inside the kernel. 21x speedup is expected for a kernel on enet-curbs setup because it was completely limited by memory usage.
> **NOTE**: The `get_local_size` and `get_local_id` usage inside the kernel. 21x speedup is expected for a kernel on enet-curbs setup since it is completely limited by memory usage.
An alternative method to using DMA is to use work item copy extension. Those functions are executed inside a kernel and requires work groups equal to single work item.
An alternative method to using DMA is to use work item copy extension. Those functions are executed inside a kernel and require work groups equal to single work item.
Here is the list of supported work item functions:
OpenVINO™ Runtime has three main transformation types:
@@ -91,7 +91,7 @@ Transformation library has two internal macros to support conditional compilatio
When developing a transformation, you need to follow these transformation rules:
###1. Friendly Names
###1. Friendly Names
Each `ov::Node` has an unique name and a friendly name. In transformations we care only about friendly name because it represents the name from the model.
To avoid losing friendly name when replacing node with other node or subgraph, set the original friendly name to the latest node in replacing subgraph. See the example below.
@@ -100,7 +100,7 @@ To avoid losing friendly name when replacing node with other node or subgraph, s
In more advanced cases, when replaced operation has several outputs and we add additional consumers to its outputs, we make a decision how to set friendly name by arrangement.
###2. Runtime Info
###2. Runtime Info
Runtime info is a map `std::map<std::string, ov::Any>` located inside `ov::Node` class. It represents additional attributes in `ov::Node`.
These attributes can be set by users or by plugins and when executing transformation that changes `ov::Model` we need to preserve these attributes as they will not be automatically propagated.
@@ -111,9 +111,9 @@ Currently, there is no mechanism that automatically detects transformation types
When transformation has multiple fusions or decompositions, `ov::copy_runtime_info` must be called multiple times for each case.
**Note**: copy_runtime_info removes rt_info from destination nodes. If you want to keep it, you need to specify them in source nodes like this: copy_runtime_info({a, b, c}, {a, b})
> **NOTE**: `copy_runtime_info` removes `rt_info` from destination nodes. If you want to keep it, you need to specify them in source nodes like this: `copy_runtime_info({a, b, c}, {a, b})`
###3. Constant Folding
###3. Constant Folding
If your transformation inserts constant sub-graphs that need to be folded, do not forget to use `ov::pass::ConstantFolding()` after your transformation or call constant folding directly for operation.
The example below shows how constant subgraph can be constructed.
@@ -140,8 +140,8 @@ In transformation development process:
## Using pass manager <a name="using_pass_manager"></a>
`ov::pass::Manager` is a container class that can store the list of transformations and execute them. The main idea of this class is to have high-level representation for grouped list of transformations.
It can register and apply any [transformation pass](#transformations_types) on model.
In addition, `ov::pass::Manager` has extended debug capabilities (find more information in the [how to debug transformations](#how_to_debug_transformations) section).
It can register and apply any [transformation pass](#transformations-types) on model.
In addition, `ov::pass::Manager` has extended debug capabilities (find more information in the [how to debug transformations](#how-to-debug-transformations) section).
The example below shows basic usage of `ov::pass::Manager`
@@ -151,7 +151,7 @@ Another example shows how multiple matcher passes can be united into single Grap
## How to debug transformations <a name="how_to_debug_transformations"></a>
## How to debug transformations <a name="how-to-debug-transformations"></a>
If you are using `ngraph::pass::Manager` to run sequence of transformations, you can get additional debug capabilities by using the following environment variables:
@@ -160,7 +160,7 @@ OV_PROFILE_PASS_ENABLE=1 - enables performance measurement for each transformati
OV_ENABLE_VISUALIZE_TRACING=1 - enables visualization after each transformation. By default, it saves dot and svg files.
```
> **Note**: Make sure that you have dot installed on your machine; otherwise, it will silently save only dot file without svg file.
> **NOTE**: Make sure that you have dot installed on your machine; otherwise, it will silently save only dot file without svg file.
# Build Plugin Using CMake* {#openvino_docs_ie_plugin_dg_plugin_build}
# Build Plugin Using CMake {#openvino_docs_ie_plugin_dg_plugin_build}
Inference Engine build infrastructure provides the Inference Engine Developer Package for plugin development.
@@ -57,7 +57,6 @@ A common plugin consists of the following components:
To build a plugin and its tests, run the following CMake scripts:
- Root `CMakeLists.txt`, which finds the Inference Engine Developer Package using the `find_package` CMake command and adds the `src` and `tests` subdirectories with plugin sources and their tests respectively:
```cmake
cmake_minimum_required(VERSION3.13)
@@ -82,21 +81,15 @@ if(ENABLE_TESTS)
endif()
endif()
```
> **NOTE**: The default values of the `ENABLE_TESTS`, `ENABLE_FUNCTIONAL_TESTS` options are shared via the Inference Engine Developer Package and they are the same as for the main DLDT build tree. You can override them during plugin build using the command below:
> **NOTE**: The default values of the `ENABLE_TESTS`, `ENABLE_FUNCTIONAL_TESTS` options are shared via the Inference Engine Developer Package and they are the same as for the main DLDT build tree. You can override them during plugin build using the command below:
> **NOTE**: The `IE::funcSharedTests` static library with common functional Inference Engine Plugin tests is imported via the Inference Engine Developer Package.
> **NOTE**: The `IE::funcSharedTests` static library with common functional Inference Engine Plugin tests is imported via the Inference Engine Developer Package.
This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the [Compile tool](../_inference_engine_tools_compile_tool_README.html)).
This function is the only way to get configuration values when a network is imported and compiled by other developers and tools (for example, the [Compile tool](@ref openvino_inference_engine_tools_compile_tool_README).
The next step in plugin library implementation is the [Synchronous Inference Request](@ref openvino_docs_ie_plugin_dg_infer_request) class.
@@ -81,7 +81,7 @@ The function accepts a const shared pointer to `ov::Model` object and performs t
1. Deep copies a const object to a local object, which can later be modified.
2. Applies common and plugin-specific transformations on a copied graph to make the graph more friendly to hardware operations. For details how to write custom plugin-specific transformation, please, refer to [Writing OpenVINO™ transformations](@ref openvino_docs_transformations) guide. See detailed topics about network representation:
* [Intermediate Representation and Operation Sets](../_docs_MO_DG_IR_and_opsets.html)
* [Intermediate Representation and Operation Sets](@ref openvino_docs_MO_DG_IR_and_opsets)
2.**Single layer tests** (`single_layer_tests` sub-folder). This groups of tests checks that a particular single layer can be inferenced on a device. An example of test instantiation based on test definition from `IE::funcSharedTests` library:
- From the declaration of convolution test class we can see that it's a parametrized GoogleTest based class with the `convLayerTestParamsSet` tuple of parameters:
3.**Sub-graph tests** (`subgraph_tests` sub-folder). This group of tests is designed to tests small patterns or combination of layers. E.g. when a particular topology is being enabled in a plugin e.g. TF ResNet-50, there is no need to add the whole topology to test tests. In opposite way, a particular repetitive subgraph or pattern can be extracted from `ResNet-50` and added to the tests. The instantiation of the sub-graph tests is done in the same way as for single layer tests.
- **Scale** as `(output_high - output_low) / (levels-1)`
- **Zero-point** as `-output_low / (output_high - output_low) * (levels-1)`
**Note**: During the quantization process the values `input_low`, `input_high`, `output_low`, `output_high` are selected so that to map a floating-point zero exactly to an integer value (zero-point) and vice versa.
> **NOTE**: During the quantization process the values `input_low`, `input_high`, `output_low`, `output_high` are selected so that to map a floating-point zero exactly to an integer value (zero-point) and vice versa.
## Quantization specifics and restrictions
In general, OpenVINO can represent and execute quantized models from different sources. However, the Post-training Optimization Tool (POT)
> **Note:** the same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different.
> **NOTE**: the same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different.
> **Note:** the same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different
> **NOTE**: the same type of attribute instances can be created in different transformations. This approach is the result of the transformation single-responsibility principle. For example, `Precision` attribute instances are created in `MarkupCanBeQuantized` and `MarkupPrecisions` transformations, but the reasons for their creation are different
Common markup transformations can be decomposed into simpler utility markup transformations. The order of Markup utility transformations is not important:
@@ -46,4 +46,4 @@ Changes in the example model after main transformation:
- dequantization operations.
* Dequantization operations were moved via precision preserved (`concat1` and `concat2`) and quantized (`convolution2`) operations.
> **Note:** the left branch (branch #1) does not require per-tensor quantization. As a result, the `fakeQuantize1`output interval is [0, 255]. But quantized `convolution2` requires per-tensor quantization on the right branch (branch #2). Then all connected `FakeQuantize` interval operations (`fakeQuantize1` and `fakeQuantize2`) are aligned to have per-tensor quantization after the concatenation (`concat2`) operation.
> **NOTE**: the left branch (branch #1) does not require per-tensor quantization. As a result, the `fakeQuantize1`output interval is [0, 255]. But quantized `convolution2` requires per-tensor quantization on the right branch (branch #2). Then all connected `FakeQuantize` interval operations (`fakeQuantize1` and `fakeQuantize2`) are aligned to have per-tensor quantization after the concatenation (`concat2`) operation.
@@ -41,7 +34,7 @@ where IR is a pair of files describing the model:
*<code>.bin</code> - Contains the weights and biases binary data.
The generated IR can be additionally optimized for inference by [Post-training optimization](../../tools/pot/docs/Introduction.md)
The OpenVINO IR can be additionally optimized for inference by [Post-training optimization](../../tools/pot/docs/Introduction.md)
> that applies post-training quantization methods.
> **TIP**: You can also work with Model Optimizer in OpenVINO™ [Deep Learning Workbench (DL Workbench)](https://docs.openvino.ai/latest/workbench_docs_Workbench_DG_Introduction.html), which is a web-based tool with GUI for optimizing, fine-tuning, analyzing, visualizing, and comparing performance of deep learning models.
Input data for inference can be different from the training dataset and requires additional preprocessing before inference.
To accelerate the whole pipeline including preprocessing and inference, Model Optimizer provides special parameters such as `--mean_values`,
`--scale_values`, `--reverse_input_channels`, and `--layout`. Based on these parameters, Model Optimizer generates IR with additionally
`--scale_values`, `--reverse_input_channels`, and `--layout`. Based on these parameters, Model Optimizer generates OpenVINO IR with additionally
inserted sub-graphs to perform the defined preprocessing. This preprocessing block can perform mean-scale normalization of input data,
reverting data along channel dimension, and changing the data layout.
See the following sections for details on the parameters, or the [Overview of Preprocessing API](../../OV_Runtime_UG/preprocessing_overview.md) for the same functionality in OpenVINO Runtime.
for more information.
## Specifying Layout
@@ -58,10 +58,12 @@ for example, `[0, 1]` or `[-1, 1]`. Sometimes, the mean values (mean images) are
There are two cases of how the input data preprocessing is implemented.
* The input preprocessing operations are a part of a model.
In this case, the application does not perform a separate preprocessing step: everything is embedded into the model itself. Model Optimizer will generate the IR with required preprocessing operations, and no `mean` and `scale` parameters are required.
In this case, the application does not perform a separate preprocessing step: everything is embedded into the model itself. Model Optimizer will generate the OpenVINO IR format with required preprocessing operations, and no `mean` and `scale` parameters are required.
* The input preprocessing operations are not a part of a model and the preprocessing is performed within the application which feeds the model with input data.
In this case, information about mean/scale values should be provided to the Model Optimizer to embed it to the generated IR.
In this case, information about mean/scale values should be provided to Model Optimizer to embed it to the generated OpenVINO IR format.
Model Optimizer provides command-line parameters to specify the values: `--mean_values`, `--scale_values`, `--scale`.
Using these parameters, Model Optimizer embeds the corresponding preprocessing block for mean-value normalization of the input data
and optimizes this block so that the preprocessing takes negligible time for inference.
Sometimes, input images for your application can be of the RGB (or BGR) format and the model is trained on images of the BGR (or RGB) format,
which is in the opposite order of color channels. In this case, it is important to preprocess the input images by reverting the color channels before inference.
To embed this preprocessing step into IR, Model Optimizer provides the `--reverse_input_channels` command-line parameter to shuffle the color channels.
To embed this preprocessing step into OpenVINO IR, Model Optimizer provides the `--reverse_input_channels` command-line parameter to shuffle the color channels.
The `--reverse_input_channels` parameter can be used to preprocess the model input in the following cases:
* Only one dimension in the input shape has a size equal to 3.
@@ -84,7 +87,7 @@ The `--reverse_input_channels` parameter can be used to preprocess the model inp
Using the `--reverse_input_channels` parameter, Model Optimizer embeds the corresponding preprocessing block for reverting
the input data along channel dimension and optimizes this block so that the preprocessing takes only negligible time for inference.
For example, the following command launches the Model Optimizer for the TensorFlow AlexNet model and embeds the `reverse_input_channel` preprocessing block into IR:
For example, the following command launches Model Optimizer for the TensorFlow AlexNet model and embeds the `reverse_input_channel` preprocessing block into OpenVINO IR:
```sh
mo --input_model alexnet.pb --reverse_input_channels
@@ -9,7 +9,7 @@ When evaluating the performance of a model with OpenVINO Runtime, it is required
- Track operations that occur outside OpenVINO Runtime (such as video decoding) separately.
> **NOTE**: Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly. For more information, refer to [Embedding the Pre-processing](Additional_Optimizations.md) and [General Runtime Optimizations](../../optimization_guide/dldt_deployment_optimization_common).
> **NOTE**: Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly. For more information, refer to [Embedding the Pre-processing](Additional_Optimizations.md) and [General Runtime Optimizations](../../optimization_guide/dldt_deployment_optimization_common.md).
## Tip 2: Try to Get Credible Data
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.